## 23.7. Basic Syntax of Regular Expressions

The two special symbols: `'^'` and `'\$'` indicate the start and the end of a string respectively, like so:

 `^The` : matches any string that starts with The; `of despair\$` : matches a string that ends in the substring of despair; `^abc\$` : a string that starts and ends with abc -- that could only be abc itself! `notice` : a string that has the text notice in it.

Without either of the above special character you are allowing the pattern to occur anywhere inside the string.

The symbols `'*'` , `'+'` , and `'?'` denote the number of times a character or a sequence of characters may occur. What they mean is: zero or more, one or more, and zero or one. Here are some examples:

 `ab*` : matches a string that has an a followed by zero or more b 's (a, ab, abbb, etc.); `ab+` : same, but there is at least one b (ab, abbb, etc.); `ab?` : there might be a b or not; `a?b+\$` : a possible a followed by one or more b 's ending a string.

You can also use bounds , which come inside braces and indicate ranges in the number of occurrences:

 `ab{2}` : matches a string that has an a followed by exactly two b 's (abb); `ab{2,}` : there are at least two b 's (abb, abbbb, etc.); `ab{3,5}` : from three to five b 's (abbb, abbbb, or abbbbb).

Note, that you must always specify the first number of a range (i.e, `{0,2}` , not `{,2}` ). Also, as you may have noticed, the symbols '*', '+', and '?' have the same effect as using the bounds `{0,}` , `{1,}` , and `{0,1}` , respectively.

Now, to quantify a sequence of characters, put them inside parentheses:

 `a(bc)*` : matches a string that has an a followed by zero or more copies of the sequence bc; `a(bc){1,5}` : one through five copies of bc.

There's also the '|' symbol, which works as an OR operator:

 `hi|hello` : matches a string that has either hi or hello in it; `(b|cd)ef` : a string that has either bef or cdef; `(a|b)*c` : a string that has a sequence of alternating a 's and b 's ending in a c ;

A period ('.') stands for any single character:

 `a.[0-9]` : matches a string that has an a followed by one character and a digit; `^.{3}\$` : a string with exactly 3 characters.

Bracket expressions specify which characters are allowed in a single position of a string:

 `[ab]` : matches a string that has either an a or a b (that's the same as `a|b` ); `[a-d]` : a string that has lowercase letters 'a' through 'd' (that's equal to `a|b|c|d` and even `[abcd]` ); `^[a-zA-Z]` : a string that starts with a letter; `[0-9]%` : a string that has a single digit before a percent sign; `,[a-zA-Z0-9]\$` : a string that ends in a comma followed by an alphanumeric character.

You can also list the characters that do NOT want -- just use a '^' as the first symbol in a bracketed expression (i.e., `%[^a-zA-Z]%` matches a string with a character that is not a letter between two percent signs).

Do not forget that bracket expressions are an exception to that rule--inside them, all special characters, including the backslash ('\'), lose their special powers (i.e., `[*\+?{}.]` matches exactly any of the characters inside the brackets). To include a literal ']' in the list, make it the first character (following a possible '^'). To include a literal '-', make it the first or last character, or the second endpoint of a range.