sipxportlib  Version 3.3
List of all members
RegEx Class Reference

#include <UtlRegex.h>

Constructors, Destructor, and Expression Information

static const unsigned long int MAX_RECURSION = SIPX_MAX_REGEX_RECURSION
 Default maximum for the recursion depth in searches. More...
 
 RegEx (const char *regex, int options=0, unsigned long int maxDepth=MAX_RECURSION)
 Compile a regular expression to create the matching object. More...
 
 RegEx (const RegEx &)
 Construct from a constant regex to save compilation time. More...
 
 ~RegEx ()
 
int SubStrings (void) const
 Count the number of possible substrings returned by this expression. More...
 

Searching

The searching methods apply a compiled regular expression to a subject string. All searching methods return a boolean result indicating whether or not some match was found in the subject. To get information about the match, use the Results methods.

bool Search (const char *subject, int len=-1, int options=0)
 Search a string for matches to this regular expression. More...
 
bool SearchAt (const char *subject, int offset, int len=-1, int options=0)
 Search a string starting at some offset for matches to this regular expression. More...
 
bool SearchAgain (int options=0)
 Repeat the last search operation, starting immediately after the previous match. More...
 

Results

The results methods provide information about the matches based on the results of the most recent Searching method call. It is an error to call any of these methods unless the most recent Searching call returned 'true'.

The substring index must be less than the result of RegEx::SubStrings on the regular expression, but may also be zero or -1 as follows:

  • (-1) returns the last searched subject.
  • (0) returns the match of the complete regular expression.
  • (1) returns $1, etc.
int Matches ()
 Get the maximum substring value from the most recent search. More...
 
bool MatchString (UtlString *matched, int i=0)
 Append a match from the last search operation to a UtlString. More...
 
bool Match (const int i, int &offset, int &length)
 Get the position and length of a match in the subject. More...
 
int MatchStart (const int i)
 Get the position of a match in the subject. More...
 
bool BeforeMatchString (UtlString *before)
 Append string preceeding the most recently matched value to a UtlString. More...
 
bool AfterMatchString (UtlString *before)
 Append string following the most recently matched value to a UtlString. More...
 
int AfterMatch (int i)
 Get the offset of the first character past the matched value. More...
 
const char * Match (int i=0)
 Get a string matched by a previous search. More...
 

Detailed Description

RegEx implements Perl-compatible regular expressions

A simple and small C++ wrapper for PCRE. PCRE (or libprce) is the Perl Compatible Regular Expression library. http://www.pcre.org/

Adapted for the sipXportLib from the regex.hpp wrapper:

regex.hpp 1.0 Copyright (c) 2003 Peter Petersen (pp@on.nosp@m.-tim.nosp@m.e.de) Simple C++ wrapper for PCRE

This source file is freeware. You may use it for any purpose without restriction except that the copyright notice as the top of this file as well as this paragraph may not be removed or altered.

Original wrapper by Peter Petersen, adapted to sipX by Scott Lawrence

The regular expression is compiled in the constructor, and then may be applied to target strings using one of the Search interfaces. The results are obtained using the Results interfaces.

This class is a wrapper around the PCRE package (see the project INSTALL for a pointer to where PCRE can be found). All the Options variables are identical to those in pcre.h

Note
Compiling the regular expressions is usually expensive compared to executing the actual search, so if an expression is frequently reused, it is best to compile it only once and then construct the expression to use in the search using the copy constructor.

Constructor & Destructor Documentation

RegEx ( const char *  regex,
int  options = 0,
unsigned long int  maxDepth = MAX_RECURSION 
)

Compile a regular expression to create the matching object.

If compiling the regular expression fails, an error message string is thrown as an exception. For options documentation, see 'man pcre'

RegEx ( const RegEx regex)

Construct from a constant regex to save compilation time.

If you are using the same constant regular expression frequently, you can use this constructor to save the time to compile and study it. First, declare a private constant copy of your expression - this will be compiled by PCRE just once when it is instantiated:

static const RegEx FooNumbers("foo([0-9]+)");

Then in your method, construct a copy of it to use when matching strings:

RegEx fooNumbers(FooNumbers);
fooNumbers.Search(someString);

Constructing this copy does not require a PCRE call to compile the expression.

~RegEx ( )

Member Function Documentation

int SubStrings ( void  ) const

Count the number of possible substrings returned by this expression.

SubStrings()

Returns
the number of substrings defined by the regular expression.

The match of the entire expression is also considered a substring, so the return value will always be >= 1.

This method is especially useful when the regular expression is loaded from some external source. For a hard-coded expression, the return is a constant, so you really don't need this method.

bool Search ( const char *  subject,
int  len = -1,
int  options = 0 
)

Search a string for matches to this regular expression.

Apply the regular expression to the subject string. Optional parameter len can be used to pass the subject's length to Search(). If not specified (or less than 0), strlen() is used internally to determine the length. Parameter options can contain any combination of options; for options documentation, see 'man pcre'

Returns
true if a match is found.
Parameters
subjectthe string to be searched for a match
lenthe length of the subject string
optionssum of any PCRE options flags
bool SearchAt ( const char *  subject,
int  offset,
int  len = -1,
int  options = 0 
)

Search a string starting at some offset for matches to this regular expression.

Apply the regular expression to the subject string, starting at the given offset. If the length is not specified, then strlen(subject) is used. Parameter options can contain any combination of options; for options documentation, see 'man pcre'

Returns
true if a match is found.
Note
The start of this search is not considered the start of the subject for the purposes of anchoring. So if the expresssion is "^xx", then subject "fooxx" will not match, even if offset is passed as '3'.
Parameters
subjectthe string to be searched for a match
offsetoffset to begin search in subject string
lenthe length of the subject string
optionssum of any PCRE options flags
bool SearchAgain ( int  options = 0)

Repeat the last search operation, starting immediately after the previous match.

SearchAgain() applies the regular expression to the same subject last passed to Search or SearchAt, but restarts the search after the last match. Subsequent calls to SearchAgain() will find all matches in the subject.

Returns
true if a further match is found. Example:
RegEx Pattern("A[0-9]");
const char* value = "xyzA1abcA2def";
for (matched = Pattern.Search(value); matched; matched = Pattern.SearchAgain())
{
printf("%s\n", Pattern.Match());
}
Would print "A1\n" and then "A2\n".
Note
Prefer MatchString over the less efficient Match
Parameters
optionssum of any PCRE options flags
int Matches ( )

Get the maximum substring value from the most recent search.

May only be called after a successful search using one of the searching interfaces, and applies to the results of that call.

  • any negative return indicates a caller error - the preceeding search call did not match
  • a return value of 1 indicates that the entire pattern matched, but no substrings within it matched.
  • a return value of N > 1 indicates that the full string and N-1 substrings are available
Note
If the expression has internal optional matches, they may not be matched; for example the expression "(foo|(bar))(bing)" matches subject "foobingo", and Matches would return 4 because substring 3 "bing" was matched, but substring 2 would be the null string for that match.
bool MatchString ( UtlString matched,
int  i = 0 
)

Append a match from the last search operation to a UtlString.

May only be called after a successful search and applies to the results of that call.

Returns
true if there was an ith match, false if not

Example:

RegEx matchBs("((B)B+)");
UtlString getB;
UtlString getBs;
if (matchB.Search("xxaBBBBcyy"))
{
matchB.MatchString(&getBs,0);
matchB.MatchString(&getB,2);
}

would set the UtlStrings

  • getBs to "BBBB"
  • getB to "B"
Parameters
matchedstring to append the match to - may be NULL, in which case no string is returned, but the return code still indicates whether or not this substring was matched.
iwhich substring to append from the last search
  • Match(-1) returns the last searched subject.
  • Match(0) returns the match of the complete regular expression.
  • Match(i>0) returns $i
bool Match ( const int  i,
int &  offset,
int &  length 
)

Get the position and length of a match in the subject.

May only be called after a successful call to one of the searching methods, and applies to the results of that call.

Parameter i must be less than SubStrings().

  • Match(-1) returns the last searched subject.
  • Match(0) returns the match of the complete regular expression.
  • Match(1) returns $1, etc.
Returns
true if the last search had an n'th match, false if not

Example:

RegEx matchABCs("A+(B+)(C+)");
UtlString subject("xAABBBBC");
int offset = 1;
if (matchABCs.SearchAt(subject, offset))
{
int all = matchABCs.Match(0, allStart, allLength);
int firstB = matchABCs.Match(1, firstB, numBs);
int firstC = matchABCs.Match(2, firstC, numCs);
}

would set the values

  • allStart = 1, allLength = 2
  • firstB = 3, numBs = 4
  • firstC = 7, numCs = 1
Note
The returned start position is relative to the beginning of the subject string, not from any offset value.
Parameters
iinput - must be < SubStrings() */
offsetoutput - offset in last subject of the n'th match
lengthoutput - length in last subject of the n'th match
int MatchStart ( const int  i)

Get the position of a match in the subject.

May only be called after a successful call to one of the searching methods, and applies to the results of that call.

Parameter i must be less than SubStrings().

  • Match(-1) returns the last searched subject.
  • Match(0) returns the match of the complete regular expression.
  • Match(1) returns $1, etc.

This is useful when searching at an offset in a string to check whether or not the match was at the offset or somewhere later in the string.

Example:

RegEx matchABCs("A+(B+)(C+)");
UtlString subject("xAABBBBC");
int offset = 1;
bool result = ( (matchABCs.SearchAt(subject, offset))
&& (matchABCs.MatchStart(0) == offset));

Note that this is not the same as haveing written the regular expression so that it is anchored: "^A+(B+)(C+)" because the anchor always refers to the actual start of the string (in the example, before the 'x'), even when used with an offset. So the 'result' variable in the example would be true.

Parameters
iinput - must be < SubStrings() */
bool BeforeMatchString ( UtlString before)

Append string preceeding the most recently matched value to a UtlString.

May only be called after a successful search and applies to the results of that call. This is equivalent to the Perl $` variable.

Returns
true if there was a string before the match, false if not Example:
RegEx matchB("B");
UtlString getBefore;
if (matchB.Search("xxaBcyy"))
{
matchB.BeforeMatchString(&getBefore);
}

would set the UtlString getBefore to "xxa".

Parameters
beforestring to append to - may be NULL, in which case no string is returned, but the return code still indicates whether or not there was some string preceeding the last match.
bool AfterMatchString ( UtlString before)

Append string following the most recently matched value to a UtlString.

May only be called after a successful search and applies to the results of that call. This is equivalent to the Perl $' variable.

Returns
true if there was a string following the match, false if not Example:
RegEx matchB("B");
UtlString getAfter;
if (matchB.Search("xxaBcyy"))
{
matchB.AfterMatchString(&getAfter);
}

would set the UtlString getAfter to "cyy".

Parameters
beforestring to append to - may be NULL, in which case no string is returned, but the return code still indicates whether or not there was some string following the last match.
int AfterMatch ( int  i)

Get the offset of the first character past the matched value.

May only be called after a successful search and applies to the results of that call.

Example:

RegEx matchBseq("A+(B+)C+");
if (matchBseq.Search("xxAABBBCCCyy"))
{
int afterB = matchBseq.AfterMatch(1);
int afterC = matchBseq.AfterMatch(0);
}

would set

  • afterB = 7
  • afterC = 10
Parameters
ithe substring specifier
const char * Match ( int  i = 0)

Get a string matched by a previous search.

Note
This does more memory allocation and data copying than any of the other results methods; use one of the others when possible.

May only be called after a successful search, and applies to the results of that call. Parameter i must be less than SubStrings().

  • Match(-1) returns the last searched subject.
  • Match(0) returns the match of the complete regular expression.
  • Match(1) returns $1, etc.
    Returns
    a pointer to the ith matched substring.
Parameters
imust be < SubStrings()

Member Data Documentation

const unsigned long int MAX_RECURSION = SIPX_MAX_REGEX_RECURSION
static

Default maximum for the recursion depth in searches.

The PCRE internal match() function implements some searches by recursion. This value is the default maximumm allowed depth for that recursion. It can be changed to some other value by passing the maxDepth option argument to the RegEx constructor. It is set at compile time from the SIPX_MAX_REGEX_RECURSION macro, if that value is defined.

If the maximum is exceeded, the match fails.

If this or the maxDepth constructor argument are set to zero, then no limit is enforced (use with caution).

See the discussions of stack size in the pcre documentation.

Note
Caution Test your limits carefully - in versions of PCRE prior to 6.5, there is no way to limit recursive matches, so this is implemented as a limit on the total number of calls to 'match' (PCRE_EXTRA_MATCH_LIMIT); this can dramatically shorten the length of the strings that a pattern that has nested parenthesis will match.