Tech Tip: Utility method to split sentences in text
PRODUCT: 4D | VERSION: 15.1 | PLATFORM: Mac & Win
Published On: July 19, 2016
Below is an utility method that will split a given text into its sentences. Each sentence will be places as an element in a given array.
// Split given text into sentences. Returns array of sentences // $1 - Text // $2 - Pointer to a text array to contain sentences C_TEXT($1;$text) C_POINTER($2;$arrSentences) C_LONGINT($pos;$start;$textLen;$len;$i;$size) C_TEXT($b;$s) C_BOOLEAN($found) ARRAY LONGINT($arrSentenceStart;0) If (Count parameters>=2) $text:=$1 $arrSentences:=$2 $textLen:=Length($text) $b:=Char(92) //backslash character // "(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?|\!)\s" $regex:="(?<!"+$b+"w"+$b+"."+$b+"w.)(?<![A-Z][a-z]"+$b+".)(?<="+$b+".|"+$b+"?|"+$b+"!)"+$b+"s" $found:=True $start:=1 While ($found) & ($start<=$textLen) $found:=Match regex($regex;$text;$start;$pos;$len) If ($found) APPEND TO ARRAY($arrSentenceStart;$pos) $start:=$pos+1 End if End while $start:=1 $size:=Size of array($arrSentenceStart) For ($i;1;$size) $pos:=$arrSentenceStart{$i} $s:=Substring($text;$start;$pos-$start) APPEND TO ARRAY($arrSentences->;$s) $start:=$pos+1 End for $s:=Substring($text;$pos+1) APPEND TO ARRAY($arrSentences->;$s) End if |
Example:
$text:="This is a test. Mr. Smith, filler words... Hello world! This is a test? The probablilty of .09 it isn't." ARRAY TEXT($arrSentences;0) UTIL_SPLIT_SENTENCES($text;->$arrSentences) |
$arrSentences will contain the following results: