Re: Unicode Control characters
Hello,
in general I don't like the term gremlins because I don't know what it
means technically.
if you mean control characters then I see no reason why they can be a
problem except for char(0) especially on Windows.
this is because char(0) is a string terminator in C and though 4D keeps
track of the string length most of the time,
some APIs in C can be susceptible to buffer over-runs.
for me the more annoying thing is NFD normalization in Unicode.
I know copy-paste from PDF Unicode will result in NFD, whereas typing in a
field produces NFC.
I don't think Word is NFD, though, so I would consider paste from Word to
be safe.
the problem with mixing NFD and NFC is that you can end up with the exact
same length,
one being longer than the other.
if you HIGHLIGHT TEXT such string,
the position would be wrongly computed for every NFD sequence that could
have been compressed using NFC.
now you may be wondering, what is NFD, NFC, and how should I care?
that is actually my point,
you really shouldn't have to care at all, most of the time.
if you want to be really defensive against NFD paste,
then you could do
SET TEXT TO PASTEBOARD($text)
$text:=Get text from pasteboard
to quickly convert NFD to NFC.
miyako
On 2014/11/06 0:14, "Bernd Fröhlich" wrote:
>Users pasting text from word documents.
**********************************************************************
See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx
**********************************************************************
Chip Scheide wrote:
> Where do these come from, besides intentionally inserting them into
> text?
> copy paste from web sites? users hitting the keyboard with their
> foreheads, repeatedly, and hitting random characters? somewhere else?
Users pasting text from word documents.
> I am wondering how much "protection" is needed around various text
> manipulation and comparison commands to take account of control
> characters?
Good question.
In pre-Unicode times it was fairly easy to "zap-gremlins".
Now with unicode that is nearly impossible. (Please correct me if I´m wrong)
Greetings from Germany,
Bernd Fröhlich
**********************************************************************
See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx
**********************************************************************
Thanks
Really my question was more along the lines of:
do we **really** need to do this sort of thing for text?
or
What is the likelihood of actually encountering "wild" unicode control
characters.
(wild as opposed to self inflicted)
Chip
On Wed, 5 Nov 2014 15:50:55 +0100, Arnaud de Montard wrote:
>
>> Le 5 nov. 2014 à 15:25, Chip Scheide <4dOnly@xxx.xxx> a écrit :
>>
>> for example do I need to "wrap" '=' for text?
>> something like:
>>
>> mytextisEqual(Text1;Text2)
>
> Hi Chip,
> I use this:
>
> +++
> //Strequal (str1;str2) -> bool
> //strict comparison
> CBOOLEAN($0)
> CTEXT($1)
> CTEXT($2)
> CBOOLEAN($outb)
> If (False)
> CBOOLEAN(Strequal ;$0)
> CTEXT(Strequal ;$1)
> CTEXT(Strequal ;$2)
> End if
> //
> $outb:=False //pessimistic
> Case of
> : (Not(Asserted(Count parameters>1;Current method name+" 2 params
> expected")))
> : (Length($1)#Length($2))
> : (Length($1)=0)
> $outb:=True //both are empty
> Else
> $outb:=(Position($1;$2;1;*)=1)
> End case
> $0:=$outb
> //
> +++
>
> One could use regex too, but since the * parameter added in v11,
> Position is faster (and simpler!)
>
> --
> Arnaud de Montard
>
>
>
>
> **********************************************************************
> See how easy it is to extend your 4D solutions to Web and mobile. New
> opportunities await you with 4D v14!
>
> 4D Internet Users Group (4D iNUG)
> FAQ: http://lists.4d.com/faqnug.html
> Archive: http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4dtech
> Unsub: mailto:4DTech-Unsubscribe@xxx.xxx
> **********************************************************************
>
**********************************************************************
See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx
**********************************************************************
> Le 5 nov. 2014 à 15:25, Chip Scheide <4dOnly@xxx.xxx> a écrit :
>
> for example do I need to "wrap" '=' for text?
> something like:
>
> mytextisEqual(Text1;Text2)
Hi Chip,
I use this:
+++
//Strequal (str1;str2) -> bool
//strict comparison
CBOOLEAN($0)
CTEXT($1)
CTEXT($2)
CBOOLEAN($outb)
If (False)
CBOOLEAN(Strequal ;$0)
CTEXT(Strequal ;$1)
CTEXT(Strequal ;$2)
End if
//
$outb:=False //pessimistic
Case of
: (Not(Asserted(Count parameters>1;Current method name+" 2 params expected")))
: (Length($1)#Length($2))
: (Length($1)=0)
$outb:=True //both are empty
Else
$outb:=(Position($1;$2;1;*)=1)
End case
$0:=$outb
//
+++
One could use regex too, but since the * parameter added in v11, Position is faster (and simpler!)
--
Arnaud de Montard
**********************************************************************
See how easy it is to extend your 4D solutions to Web and mobile. New opportunities await you with 4D v14!
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:4D_Tech-Unsubscribe@xxx.xxx
**********************************************************************