KNOWLEDGE BASE
Log In    |    Knowledge Base    |    4D Home
Tech Tip: Utility method to split sentences in text
PRODUCT: 4D | VERSION: 15.1 | PLATFORM: Mac & Win
Published On: July 19, 2016

Below is an utility method that will split a given text into its sentences. Each sentence will be places as an element in a given array.

// Split given text into sentences. Returns array of sentences
// $1 - Text
// $2 - Pointer to a text array to contain sentences

C_TEXT($1;$text)
C_POINTER($2;$arrSentences)
C_LONGINT($pos;$start;$textLen;$len;$i;$size)
C_TEXT($b;$s)
C_BOOLEAN($found)
ARRAY LONGINT($arrSentenceStart;0)

If (Count parameters>=2)
   $text:=$1
   $arrSentences:=$2

   $textLen:=Length($text)
   $b:=Char(92) //backslash character

   // "(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?|\!)\s"
   $regex:="(?<!"+$b+"w"+$b+"."+$b+"w.)(?<![A-Z][a-z]"+$b+".)(?<="+$b+".|"+$b+"?|"+$b+"!)"+$b+"s"

   $found:=True
   $start:=1
   While ($found) & ($start<=$textLen)
      $found:=Match regex($regex;$text;$start;$pos;$len)
      If ($found)
         APPEND TO ARRAY($arrSentenceStart;$pos)
         $start:=$pos+1
      End if
   End while

   $start:=1
   $size:=Size of array($arrSentenceStart)
   For ($i;1;$size)
      $pos:=$arrSentenceStart{$i}
      $s:=Substring($text;$start;$pos-$start)
      APPEND TO ARRAY($arrSentences->;$s)
      $start:=$pos+1
   End for
   $s:=Substring($text;$pos+1)
   APPEND TO ARRAY($arrSentences->;$s)
End if


Example:
$text:="This is a test. Mr. Smith, filler words... Hello world! This is a test? The probablilty of .09 it isn't."

ARRAY TEXT($arrSentences;0)
UTIL_SPLIT_SENTENCES($text;->$arrSentences)

$arrSentences will contain the following results: