A regular expression defines a set of one or more strings of characters. Several of the UNIX utility programs, including ed, vi, grep, awk, and sed, use regular expressions to search for and replace strings.
A simple string of characters is a regular expression that defines one string of characters: itself.
A more complex regular expression uses letters, numbers. and special
characters to define many different strings of characters. A regular
expression is said to match any string it defines.
Characters and Delimiters
A character is any character except a NEWLINE. Most characters
represent themselves within a regular expression. A special character
is one that does not represent itself.
If you need to use a special character to represent itself, see
"Quoting Special Characters"
A character, called a delimiter, usually marks the beginning and
end of a regular expression. The delimiter is always a special
character for the regular expression it delimits (that is, it
does not represent itself but marks the beginning and end of the
expression). You can use any character as a delimiter, as long
as you use the same character at both ends of the regular expression.
For simplicity, all the regular expressions in this section use
a forward slash (/)
as a delimiter. In some unambiguous cases, the second delimiter
is not required. You can usually omit the second delimiter when
it would be followed immediately by a (<Return>).
Simple Strings
The most basic regular expression is a simple string that contains
no special characters except the delimiters. A simple string matches
only itself.
Regular expression | Meaning | Examples |
---|---|---|
/ring/ | matches ring | ring, spring, ringing, stringing |
/Thursday/ | matches Thursday | Thursday, Thursday's |
/or not/ | matches or not | or not, poor nothing |
Regular expression | Meaning | Examples |
---|---|---|
/ .alk/ | matches all strings that contain a <Space> followed by any character followed by alk | will talk, may balk |
/.ing/ | matches all strings with any character preceding ing | singing, ping, before inglenook |
Within a character class definition, backslashes, asterisks, and
dollar signs (all described in the following sections) lose their
special meanings. A right square bracket (appearing as a member
of the character class) can only appear as the first character
following the left square bracket, and a caret is only special
if it is the first character following the left bracket.
Regular expression | Meaning | Examples |
---|---|---|
/[bB]ill/ | defines the character class containing b or B-matches followed by ill | bill, Bill, billed |
/t[aeiou].k/ | matches t followed by a lower-case vowel, any character, and a k | teak, talkative, took, tanker |
/number [6-9]/ | matches number followed by a <Space> and a member of the specified character class | number 6, number 8, number 9 |
/[^a-zA-Z]/ | matches any character that is not a letter | 1, 7, @, ., } |
Regular expression | Meaning | Examples |
---|---|---|
/ab*c/ | matches a followed by zero or more b's followed by a c | ac, abc, abbc, abbbc |
/ab.*c/ | matches ab followed by zero or more other characters followed by c | abc, abxc, ab45c, ab 756.345 |
/t.*ing/ | matches t followed by zero or more characters followed by ing | thing, ting, thought of going |
/[a-zA-Z ]*/ | matches a string composed only of letters and <Space>s | any string without numbers or punctuation |
/(.*)/ | matches as long a string as possible between ( and ) | (this) and (that) |
/([^)]*)/ | matches the shortest string possible that starts with ( and ends with ) | (this), (this and that) |
Regular expression | Meaning | Examples |
---|---|---|
/^T/ | matches a T at the beginning of a line | That time ..., This line ... |
/^+[0-9] / | matches a plus sign followed by a number at the beginning of a line | +5 45.72, +759 Keep this ... |
/:$/ | matches a colon that ends a line | ...below: |
/^$/ | matches an empty line |
Regular expression | Meaning | Examples |
---|---|---|
/end\./ | matches all strings that contain end followed by a period | pretend.mail, The end., send. |
/\/ | matches a single backslash | \ |
/\*/ | matches an asterisk | *.c, an asterisk (*) |
/\[5\]/ | matches the string [5] | it was five [5] |
/and\/or/ | matches and/or | and/or |
The expression: matches: |
/Th.*is/ This (rug) is not what it once was (a long time ago), is |
and: matches: |
/(.*)/ (rug) is not what it once was (a long time ago) |
however: matches: |
/([^)]*)/ (rug) |
The expression: matches: |
/s.*ing/ singing songs, singing |
and: matches: |
/s.*ing song/ singing song |
s/mike/robert/
and then you want to make the same substitution again, you can use the command:
s//robert/
The empty regular expression represents the last regular expression you used (/mike/).
The expression: matches |
/\(rexp\)/ what /rexp/ would match |
and: matches |
/a\(b*\)c/ what /ab*c/ would match |
/\([a-z]\([A-Z]*\)\)/
The bracketed expressions are identified only by the opening \('s, so there is no ambiguity in identifying them.
Ampersands
Within a Replacement String, an ampersand (&) takes on the value of the string that the Search String (regular expression) matched.
For example, the following Substitute command surrounds a string of one or more numbers with NN. The ampersand in the Replacement String matches whatever string of numbers the regular expression (Search String) matched. Two character class definitions are required because the regular expression [0-9]* matches zero or more occurrences of a digit, and any character string is zero or more occurrences of a digit.
s/[0-9][0-9]*/NN&NN/
Quoted Digits
Within the regular expression itself, a quoted digit (\n) takes on the value of the string that the bracketed regular expression beginning with the nth \( matched.
Within a Replacement String, a quoted digit represents the string that the bracketed regular expression (portion of the Search String) beginning with the nth \( matched.
For example, you can take a list of people in the form
last-name, first-name initial
first-name initial last-name
1,$s/\([^,]*\), \(*\)/\2 \1/
The Replacement String consists of what the second bracketed regular
expression matched (\2) followed by a <Space> and what the
first bracketed regular expression matched (\1).
Summary of Regular Expressions
A regular expression defines a set of one or more strings of characters.
A regular expression is said to match any string it defines.
The following characters are special within a regular expression.
Special character | Function |
---|---|
. | matches any single character |
[xyz] | defines a character class that matches x, y, or z |
[^xyz] | defines a character class that matches any character except x, y, or z |
[x-z] | defines a character class that matches any character x through z inclusive |
* | matches zero or more occurrences of a match of the preceding character |
^ | forces a match to the beginning of a line |
$ | forces a match to the end of a line |
\ | used to quote special characters |
\(xyz\) | matches what xyz matches (a bracketed regular expression) |
The following characters are special within a Replacement String.
Character | Function |
---|---|
& | represents what the regular expression (Search String) matched |
\n | a quoted number, n, represents the nth bracketed regular expression in the Search String |