KNOWLEDGE BASE
Log In    |    Knowledge Base    |    4D Home
Tech Tip: How to use array parameters with Match regex
PRODUCT: 4D | VERSION: 11.6 | PLATFORM: Mac & Win
Published On: May 21, 2010

The Match regex command can accept two long integer arrays as the fourth and fifth parameters. Sometimes the use of these parameters is interpreted in the wrong way. A common mistake here is the expectation of getting the position and length of all occurrences of a specified pattern to be populated in these two arrays with only one call of the command. The arrays actually will contain the position and length of the group in the pattern, if any.

The documentation states:
"If you pass arrays, the command returns the position and length of the occurrence in the element zero of the arrays and the positions and lengths of the groups captured by the regular expression in the following elements."
Let's look at the following example code to help clarify:

C_BOOLEAN($foundFlag)
C_TEXT($myEmails;$pattern)
C_LONGINT($start)
ARRAY LONGINT($posFound_a;0)
ARRAY LONGINT($lengthFound_a;0)

`Sample string that contains two email addresses

$myEmails:="john.smith2003@gmail.com.edu and j.smith@personal.web.com"

`Sample pattern that looks for email address

$pattern:="[a-z0-9-]+"
$pattern:=$pattern+"(\\.[a-z0-9-]+)*" `group 1
$pattern:=$pattern+"@"+"[a-z0-9-]+"
$pattern:=$pattern+"(\\.[a-z0-9-]+)*" `group 2
$pattern:=$pattern+"(\\.[a-z]{2,4})" `group 3

$start:=1
$foundFlag:=False

Repeat
  $foundFlag:=Match regex($pattern;$myEmail;$start;$posFound_a;$lengthFound_a)
  $start:=$start+$lengthFound_a{0}
Until (Not($foundFlag))


$pattern contains the pattern string that looks for emails. The expressions between parentheses are called groups.

First group: "(\\.[a-z0-9-]+)"
Second group: "(\\.[a-z0-9-]+)"
Third group: "(\\.[a-z]{2,4})"

The sample string contains two eamil addresses: john.smith2003@gmail.com.edu and j.smith@personal.web.com.
Therefore the repeat loop will be executed twice. In the first run the arrays' elements will be populated as follows:

$posFound_a{0}=1 - start position where the pattern is found.
$posFound_a{1}=5 - start position where group one is found.
$posFound_a{2}=21 - start position where group two is found.
$posFound_a{3}=25 - start position where group three is found.

$lengthFound_a{0}=28 - length of the found pattern in the string.
$lengthFound_a{1}=10 - length of group one.
$lengthFound_a{2}=4 - length of group two.
$lengthFound_a{3}=4 - length of group three.

At the end of the first run $start is set to 29, so the second search will start from 29th string element.

In the second run the arrays' elements will be:

$posFound_a{0}=34 - start position where the pattern is found.
$posFound_a{1}=35 - start position where group one is found.
$posFound_a{2}=50 - start position where group two is found.
$posFound_a{3}=54 - start position where group three is found.

$lengthFound_a{0}=24 - length of the found pattern in the string.
$lengthFound_a{1}=6 - length of group one.
$lengthFound_a{2}=4 - length of group two.
$lengthFound_a{3}=4 - length of group three.

Having this information users can easily parse the match string to find domain name or user name for example. This example shows how it is the separate groups within a pattern for Match regex whose position are returned into the arrays passed to the command. Each iteration of Match regex still only finds the next match in a given string, but with groups and the optional array parameters developers can break the matching string down into substrings.