First: everywhere
the icon is
displayed you can click this and the "Test Regular Expressions" window
will be displayed. There you can test/try regular-expression and it also
contains a brief explanation of the most useful expressions.
Some basic stuff about regular expressions:
Regular expressions figure into all kinds of
text-manipulation tasks. Searching and search-and-replace are among the
more common uses, but regular expressions can also be used to test for
certain conditions in a text file or data stream. You might use regular
expressions, for example, as the basis for a short program that separates
incoming mail from incoming spam. In this case, the program might use a
regular expression to determine whether the name of a known spammer
appeared in the "From:" line of the email. Email filtering programs, in
fact, very often use regular expressions for exactly this type of
operation.
Regular expressions tend to be
easier to write than they are to read. This is less of a problem if you
are the only one who ever needs to maintain the program (or sed routine,
or shell script, or what have you), but if several people need to watch
over it, the syntax can turn into more of a hindrance than an
aid.
Ordinary macros (in particular, editable
macros such as those generated by the major word processors and editors)
tend not to be as fast, as flexible, as portable, as concise, or as
fault-tolerant as regular expressions, but they have the advantage of
being much more readable; even people with no programming background
whatsoever can usually make enough sense of a macro script to change it if
the need arises. For some jobs, such readablitity will outweigh all other
concerns. As with all things in computing, it's largely a question of
fitting the tool to the job.
Why are
they called "regular expressions?"
Regular expressions trace back to the work of an American
mathematician by the name of Stephen Kleene (one of the most influential
figures in the development of theoretical computer science) who developed
regular expressions as a notation for describing what he called "the
algebra of regular sets." His work eventually found its way into some
early efforts with computational search algorithms, and from there to some
of the earliest text-manipulation tools on the Unix platform (including ed
and grep). In the context of computer searches, the "*" is formally known
as a "Kleene star."
Single-Character
Metacharacters
Some metacharacters
match single characters. This includes the following
symbols:
. Matches any one character [...]
Matches any character listed between the brackets [^...] Matches any
character except those listed between the brackets
Quantifiers
The regular
expression syntax also provides metacharacters which specify the number of
times a particular character should match.
?
Matches any character zero or one times * Matches the preceding
element zero or more times + Matches the preceding element one or more
times {num} Matches the preceding element num times {min, max}
Matches the preceding element at least min times, but not more than max
times
These metacharacters allow you to match
on a single-character pattern, but then continue to match on it until the
pattern changes.
Anchors
Often, you need
to specify the position at which a particular pattern occurs. This is
often referred to as "anchoring" the pattern:
^ Matches at the
start of the line $ Matches at the end of the line \< Matches
at the beginning of a word \> Matches at the end of a word \b
Matches at the beginning or the end of a word \B Matches any charater
not at the beginning or end of a word
"^" and "$" are some of the
most useful metacharacters in the regex arsenal--particularly when you
need to run a search-and-replace on a list of
strings.
Regular Expression Test
With a
regular-expression-test (this function is available in the
SetVariable-plugin, function: RegExpTest), you can check if a string
matches the specified regular-expression. The "Filter" and "Reject"
settings of the Workflows also work this
way.
Example
String:
image0102_15feb2003.jpg
A simple regular-expression that would
match the string is: .jpg
. means any number of characters must
precede jpg
A somewhat more complex one would be:
image[0-9]{4}_.+jpg
[0-9]{4} means there have to be 4 characters
and these characters have to be numerical (0-9). .+ means there can be
any number of characters between _ and jpg
If you want to make sure
the date-format is correct you could use:
image(....)_([0-9]){2}([a-z]){3}([0-9]){4}.jpg
(....) means there
must be 4 characters. ([0-9]){2} means there must be 2 numerical
characters ([a-z]){3} means there must be 3 alpha-numerical
characters ([0-9]){4} means there must be 4 numerical
characters
There are a lot of ways to do the same in another way,
so this is just one example.
Regular Expression
Replace
With a regular-expression-replace (this function
is available in the SetVariable-plugin, function: RegExpReplace), you can
replace the matched substrings by the specified replacement-string, or
extract the portions of the string that are matched by regular-expressions
that are between (). The last means that any regular-expression which
is enclosed in () can be used as a $n variable where n stands for the n-th
() in the regular
expression.
Example
String:
image0102_15feb2003.jpg
Say you want to extract the 4-digits after
image from this string with a regular expression. You can do this with the
following regular-expression: image(....)_15feb2003.jpg
There is
one () enclosed pattern in the regular-expression. The string that matches
(....) will be available in the variable $1. So if we match the string to
the regular-expression, $1 would be 0102
Now we want to extract the
same 4-digits, but also the date: image(....)_(.........).jpg The
4-digits will again be in $1 (the first () enclosed pattern) and the date
will be in $2 (the second () enclose pattern).
If you would want to
extract the individual digits, you would use:
image(.)(.)(.)(.)_(..)(...)(....).jpg
$1 will be 0 $2 will be
1 $3 will be 0 $4 will be 2 $5 will be 15 $6 will be feb $7
will be 2003
All the above examples are
available in the FST-Pro SetVariable-plugin functions: RegExpTest,
RegExpReplace and RegExpReplaceGroup.
|