KNOWLEDGE BASE
Log In    |    Knowledge Base    |    4D Home
Tech Tip: Metacharacters and Match regex in the 4D command
PRODUCT: 4D | VERSION: 14.3 | PLATFORM: Mac & Win
Published On: March 5, 2015

A regular expression is a pattern that is matched against a string from left to right. Most characters stand for themselves in a pattern and match the corresponding characters in the string. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in some special way.

How metacharacters are treated depends on they are inside or outside a character class. A character class is defined by square brackest, [ ].

For PCRE regex, which 4D uses, the general rule is to escape metacharacters that are outside character classes, (like here [ -- ] and here), and not those that are inside character classes, square brackets, (-- [ like here ] --), see the tables below.

However, within 4D, patterns that are to be evalueated by Match regex, extra attention has to be given because 4D itself requires certain character to be escaped to be treated as nonmetat characters. In 4D, there are seven "4D metacharacters": the caret ^, the ampersand &, the dash -, the backslash \, the double-quote ", the opening square bracket [, and the closing square bracket ]. For them to be treated as regular characters, instead of metacharacters, they must be "double-escaped," (\\+metacharacter). In addition, within 4D there are character that can be escaped to represent non-printing character, such as "\r" for carriage return (CR), "\n" for new line or line feed (LF) and "\t" for tab, see Table 3 below. To use these "shorthand" representations in a Match regex pattern they have to be double-escaped, "\\r," "\\n," and "\\t."

Table 1. PCRE Regex metacharacters outside square brackets



Table 2. PCRE Regex metacharacters inside square brackets



In 4D there are additional characters to those listed in Table 2 which have to be escaped inside a character class. As shown in the Tech Tip Supporting special characters in a Match regex search pattern, see the pattern below, the ampersand "&" and comma "," also have to be escaped within the pattern used by Match regex.

$Regex_T:="[ ~!@#$%\\^\\&*?_\\-`~()\\\\,\"'{|}/<>\\[:\\];.=+]"

Backslash


The backslash character has several uses. Firstly, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. This use of backslash as an escape character applies both inside and outside character classes.

For example, to match a * character, write \* in the pattern. This escaping action applies whether or not the following character would otherwise be interpreted as a metacharacter, so it is always safe to precede a non-alphanumeric with backslash to specify that it stands for itself. In particular, to match a backslash, write \\.

Non-printing characters


A second use of backslash provides a way of encoding non-printing characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern, but when a pattern is being prepared by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents:

Table 3. Non-printing characters



Commented by Maurice Inzirillo on March 9, 2015 at 3:53 AM
4D is using ICU regex and not the PCRE regex !