SetConnect

Introduction

Regular Expression Basics
First: everywhere the icon is displayed you can click this and the "Test Regular Expressions" window will be displayed. There you can test/try regular-expression and it also contains a brief explanation of the most useful expressions.



Some basic stuff about regular expressions:

Regular expressions figure into all kinds of text-manipulation tasks. Searching and search-and-replace are among the more common uses, but regular expressions can also be used to test for certain conditions in a text file or data stream. You might use regular expressions, for example, as the basis for a short program that separates incoming mail from incoming spam. In this case, the program might use a regular expression to determine whether the name of a known spammer appeared in the "From:" line of the email. Email filtering programs, in fact, very often use regular expressions for exactly this type of operation.

Regular expressions tend to be easier to write than they are to read. This is less of a problem if you are the only one who ever needs to maintain the program (or sed routine, or shell script, or what have you), but if several people need to watch over it, the syntax can turn into more of a hindrance than an aid.

Ordinary macros (in particular, editable macros such as those generated by the major word processors and editors) tend not to be as fast, as flexible, as portable, as concise, or as fault-tolerant as regular expressions, but they have the advantage of being much more readable; even people with no programming background whatsoever can usually make enough sense of a macro script to change it if the need arises. For some jobs, such readablitity will outweigh all other concerns. As with all things in computing, it's largely a question of fitting the tool to the job.

Why are they called "regular expressions?"

Regular expressions trace back to the work of an American mathematician by the name of Stephen Kleene (one of the most influential figures in the development of theoretical computer science) who developed regular expressions as a notation for describing what he called "the algebra of regular sets." His work eventually found its way into some early efforts with computational search algorithms, and from there to some of the earliest text-manipulation tools on the Unix platform (including ed and grep). In the context of computer searches, the "*" is formally known as a "Kleene star."

Single-Character Metacharacters

Some metacharacters match single characters. This includes the following symbols:

. Matches any one character
[...] Matches any character listed between the brackets
[^...] Matches any character except those listed between the brackets

Quantifiers

The regular expression syntax also provides metacharacters which specify the number of times a particular character should match.

? Matches any character zero or one times
* Matches the preceding element zero or more times
+ Matches the preceding element one or more times
{num} Matches the preceding element num times
{min, max} Matches the preceding element at least min times, but not more than max times

These metacharacters allow you to match on a single-character pattern, but then continue to match on it until the pattern changes.

Anchors

Often, you need to specify the position at which a particular pattern occurs. This is often referred to as "anchoring" the pattern:

^ Matches at the start of the line
$ Matches at the end of the line
\< Matches at the beginning of a word
\> Matches at the end of a word
\b Matches at the beginning or the end of a word
\B Matches any charater not at the beginning or end of a word

"^" and "$" are some of the most useful metacharacters in the regex arsenal--particularly when you need to run a search-and-replace on a list of strings.

Regular Expression Test

With a regular-expression-test (this function is available in the SetVariable-plugin, function: RegExpTest), you can check if a string matches the specified regular-expression. The "Filter" and "Reject" settings of the Workflows also work this way.

Example

String: image0102_15feb2003.jpg

A simple regular-expression that would match the string is: .jpg

. means any number of characters must precede jpg

A somewhat more complex one would be: image[0-9]{4}_.+jpg

[0-9]{4} means there have to be 4 characters and these characters have to be numerical (0-9).
.+ means there can be any number of characters between _ and jpg

If you want to make sure the date-format is correct you could use: image(....)_([0-9]){2}([a-z]){3}([0-9]){4}.jpg

(....) means there must be 4 characters.
([0-9]){2} means there must be 2 numerical characters
([a-z]){3} means there must be 3 alpha-numerical characters
([0-9]){4} means there must be 4 numerical characters

There are a lot of ways to do the same in another way, so this is just one example.

Regular Expression Replace

With a regular-expression-replace (this function is available in the SetVariable-plugin, function: RegExpReplace), you can replace the matched substrings by the specified replacement-string, or extract the portions of the string that are matched by regular-expressions that are between ().
The last means that any regular-expression which is enclosed in () can be used as a $n variable where n stands for the n-th () in the regular expression.

Example

String: image0102_15feb2003.jpg

Say you want to extract the 4-digits after image from this string with a regular expression. You can do this with the following regular-expression: image(....)_15feb2003.jpg

There is one () enclosed pattern in the regular-expression. The string that matches (....) will be available in the variable $1. So if we match the string to the regular-expression, $1 would be 0102

Now we want to extract the same 4-digits, but also the date: image(....)_(.........).jpg
The 4-digits will again be in $1 (the first () enclosed pattern) and the date will be in $2 (the second () enclose pattern).

If you would want to extract the individual digits, you would use: image(.)(.)(.)(.)_(..)(...)(....).jpg

$1 will be 0
$2 will be 1
$3 will be 0
$4 will be 2
$5 will be 15
$6 will be feb
$7 will be 2003

All the above examples are available in the FST-Pro SetVariable-plugin functions: RegExpTest, RegExpReplace and RegExpReplaceGroup.

Documentation

Release notes