Regular Expressions

Pikt regular expressions follow the usual regular expression rules with any necessary clarifications/amplifications to follow.

Here are the regular expression operators:

OPERATOR           MEANING

a =~ b             string b matches at least one
                   substring within a
a =~~ b            like the above, but without case sensitivity
a !~ b             string b matches no substring within a
a !~~ b            like the above, but without case sensitivity
For example, all of the following are true:
"this is a test" =~ "is"
"this is a test" =~~ "IS"
"this is a test" !~ "THIS"
"this is a test" !~~ "that"

"this is a test" =~ ""
"" !~ "this is a test"
These characters have special meaning within Pikt regular expressions:
CHARACTER(S)    MEANING

.               matches any single character
*               matches zero or more instances of the preceding
                character/pattern
?               matches zero or one instance(s) of the preceding
                character/pattern
+               matches one or more instances of the preceding
                character/pattern
{m,n}           matches as few as m, or as many as n, instances
                of the preceding character/pattern

( )             enclose a subexpression, or set of subexpressions
                separated by |
|               separates subexpressions (think of "or")
[ ]             enclose a set of characters/character ranges
^               as the first character in a [ ] subexpression,
                indicates set negation; as the first character
                in a regular expression, anchors to the
                beginning of the string expression on the
                left-hand side of the regexp operator
$               anchors to the end of the string expression
                on the left-hand side of the regexp operator
In addition to user-specified character classes, Pikt supports these built-in predefined character classes:
[[:alnum:]]     the set of alphanumeric characters
[[:alpha:]]     the set of letters
[[:blank:]]     tab and space
[[:cntrl:]]     the control characters
[[:digit:]]     the decimal digits
[[:graph:]]     the printable characters except space
[[:lower:]]     the lower-case letters
[[:print:]]     the printable characters
[[:punct:]]     the punctuation characters
[[:space:]]     whitespace characters
[[:upper:]]     the upper-case letters
Backslash escapes suppress a character's specialness.  So, "\\*" is a literal asterisk, and the following are all true:
"fo*bar" !~ "fo*bar"        // left side literal string,
                            // right side regexp
"fo*bar" !~ "fo\*bar"

"fo*bar" =~ "fo\\*bar"

"fo*bar" =~ "\\*"

"*" =~ "\\*"
In any of the above left-hand expressions, you could substitute "fo\*bar", and the statements would all still be true.

Usually, just a single backslash is required for this purpose.  In Pikt, however, backslashes are a general escape character.  If, for example, you want to output the literal text string "$x" without the $x being interpreted as a variable (which Pikt would attempt to resolve to a value), you would use "\$x".  So, if you require a backslash in the final product, you must supply double backslashes going in.  Again, see the sample config files for examples of double-backslash usage.

Note that every time a regular expression containing matching parentheses is invoked, for example in any of the following situations

dat "([^:]*):([^:]*)"

if $line =~ "^([^:]*):([^:]*)"

do #split($rdline, "([^:]*):([^:]*)")
you can reference the first parentheses-enclosed matched subexpression with $1, the second with $2, and so on.  $0 references the entire matched subexpression.

Note well:  The $0, $1, and so on only persist until the next regexp pattern match.  The next time you use =~ (or any of the other regexp operators), or the next time you invoke the #split() function (in any of its forms), any previous $0, $1, ... values get supplanted by the values in the latest regexp.  You will encounter many strange bugs unless you keep this in mind!

Alternate forms for referencing regexp matches are: $[0], $[1], $[2], and so on.  These make it possible to reference the matched expressions within for loops:

set #n = #split($rdlin)
for #i=1 #i<=#n #i+=1
        output $[#i]
endfor
Here is a technique for saving $0, $1, ... before a subsequent regexp action:
set #n = #split($rdlin)
for #i=1 #i<=#n #i+=1
        set $f[#i] = $[#i]
endfor
...
if $f[3] =~ "cantata|sonata|toccata"    // wipes out
                                        // $3 & $[3] value
        output $f[3]
fi
Better still is to use the #split() function (with all three arguments required) this way:
do #split($f, $rdlin, " ")
...
if $f[3] =~ "cantata|sonata|toccata"    // wipes out
                                        // $3 & $[3] value
        output $f[3]
fi
If you failed to save the previous regexp values in the $f[] array and simply referenced $3 or $[3], that value would be undefined, since in the =~ test you didn't put ( )'s around any third subexpression, but even if you did (around "toccata") you have lost your previous $3 value.

For further coverage of regular expressions, see the GNU RX info pages.

Refer to the sample alarms.cfg for examples.

prev page 1st page next page
 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
dmesg scan
Pikt script