Re: Unicode Control characters

Subject :Re: Unicode Control characters
From :Keisuke Miyako
Date :Wednesday, November 5, 2014 at 9:13 AM
Link :https://kb.4d.com/resources/inug?msgid=GmailId14980bbd87790d18
Hello,



in general I don't like the term gremlins because I don't know what it

means technically.

if you mean control characters then I see no reason why they can be a

problem except for char(0) especially on Windows.





this is because char(0) is a string terminator in C and though 4D keeps

track of the string length most of the time,

some APIs in C can be susceptible to buffer over-runs.



for me the more annoying thing is NFD normalization in Unicode.

I know copy-paste from PDF Unicode will result in NFD, whereas typing in a

field produces NFC.

I don't think Word is NFD, though, so I would consider paste from Word to

be safe.



the problem with mixing NFD and NFC is that you can end up with the exact

same length,

one being longer than the other.



if you HIGHLIGHT TEXT such string,

the position would be wrongly computed for every NFD sequence that could

have been compressed using NFC.



now you may be wondering, what is NFD, NFC, and how should I care?

that is actually my point,

you really shouldn't have to care at all, most of the time.



if you want to be really defensive against NFD paste,

then you could do



SET TEXT TO PASTEBOARD($text)

$text:=Get text from pasteboard



to quickly convert NFD to NFC.



miyako



On 2014/11/06 0:14, "Bernd Fröhlich" wrote:



>Users pasting text from word documents.











**********************************************************************

See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!



4D Internet Users Group (4D iNUG)

FAQ: http://lists.4d.com/faqnug.html

Archive: http://lists.4d.com/archives.html

Options: https://lists.4d.com/mailman/options/4d_tech

Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx

**********************************************************************

Subject :Re: Unicode Control characters
From :BerndFröhlich
Date :Wednesday, November 5, 2014 at 8:14 AM
Link :https://kb.4d.com/resources/inug?msgid=GmailId1498085e1e9c1bce
Chip Scheide wrote:



> Where do these come from, besides intentionally inserting them into

> text?

> copy paste from web sites? users hitting the keyboard with their

> foreheads, repeatedly, and hitting random characters? somewhere else?



Users pasting text from word documents.



> I am wondering how much "protection" is needed around various text

> manipulation and comparison commands to take account of control

> characters?



Good question.

In pre-Unicode times it was fairly easy to "zap-gremlins".

Now with unicode that is nearly impossible. (Please correct me if I´m wrong)



Greetings from Germany,

Bernd Fröhlich

**********************************************************************

See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!



4D Internet Users Group (4D iNUG)

FAQ: http://lists.4d.com/faqnug.html

Archive: http://lists.4d.com/archives.html

Options: https://lists.4d.com/mailman/options/4d_tech

Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx

**********************************************************************

Subject :Re: Unicode Control characters
From :Chip Scheide
Date :Wednesday, November 5, 2014 at 7:57 AM
Link :https://kb.4d.com/resources/inug?msgid=GmailId1498075b12e2de5e
Thanks



Really my question was more along the lines of:

do we **really** need to do this sort of thing for text?

or

What is the likelihood of actually encountering "wild" unicode control

characters.

(wild as opposed to self inflicted)



Chip



On Wed, 5 Nov 2014 15:50:55 +0100, Arnaud de Montard wrote:

>

>> Le 5 nov. 2014 à 15:25, Chip Scheide <4dOnly@xxx.xxx> a écrit :

>>

>> for example do I need to "wrap" '=' for text?

>> something like:

>>

>> mytextisEqual(Text1;Text2)

>

> Hi Chip,

> I use this:

>

> +++

> //Strequal (str1;str2) -> bool

> //strict comparison

> CBOOLEAN($0)

> CTEXT($1)

> CTEXT($2)

> CBOOLEAN($outb)

> If (False)

> CBOOLEAN(Strequal ;$0)

> CTEXT(Strequal ;$1)

> CTEXT(Strequal ;$2)

> End if

> //

> $outb:=False //pessimistic

> Case of

> : (Not(Asserted(Count parameters>1;Current method name+" 2 params

> expected")))

> : (Length($1)#Length($2))

> : (Length($1)=0)

> $outb:=True //both are empty

> Else

> $outb:=(Position($1;$2;1;*)=1)

> End case

> $0:=$outb

> //

> +++

>

> One could use regex too, but since the * parameter added in v11,

> Position is faster (and simpler!)

>

> --

> Arnaud de Montard

>

>

>

>

> **********************************************************************

> See how easy it is to extend your 4D solutions to Web and mobile. New

> opportunities await you with 4D v14!

>

> 4D Internet Users Group (4D iNUG)

> FAQ: http://lists.4d.com/faqnug.html

> Archive: http://lists.4d.com/archives.html

> Options: https://lists.4d.com/mailman/options/4dtech

> Unsub: mailto:4DTech-Unsubscribe@xxx.xxx

> **********************************************************************

>

**********************************************************************

See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!



4D Internet Users Group (4D iNUG)

FAQ: http://lists.4d.com/faqnug.html

Archive: http://lists.4d.com/archives.html

Options: https://lists.4d.com/mailman/options/4d_tech

Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx

**********************************************************************

Subject :Re: Unicode Control characters
From :Arnaud de Montard
Date :Wednesday, November 5, 2014 at 7:50 AM
Link :https://kb.4d.com/resources/inug?msgid=GmailId149806ff7bd4c07d


> Le 5 nov. 2014 à 15:25, Chip Scheide <4dOnly@xxx.xxx> a écrit :

>

> for example do I need to "wrap" '=' for text?

> something like:

>

> mytextisEqual(Text1;Text2)



Hi Chip,

I use this:



+++

//Strequal (str1;str2) -> bool

//strict comparison

CBOOLEAN($0)

CTEXT($1)

CTEXT($2)

CBOOLEAN($outb)

If (False)

CBOOLEAN(Strequal ;$0)

CTEXT(Strequal ;$1)

CTEXT(Strequal ;$2)

End if

//

$outb:=False //pessimistic

Case of

: (Not(Asserted(Count parameters>1;Current method name+" 2 params expected")))

: (Length($1)#Length($2))

: (Length($1)=0)

$outb:=True //both are empty

Else

$outb:=(Position($1;$2;1;*)=1)

End case

$0:=$outb

//

+++



One could use regex too, but since the * parameter added in v11, Position is faster (and simpler!)



--

Arnaud de Montard









**********************************************************************

See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!



4D Internet Users Group (4D iNUG)

FAQ: http://lists.4d.com/faqnug.html

Archive: http://lists.4d.com/archives.html

Options: https://lists.4d.com/mailman/options/4d_tech

Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx

**********************************************************************