PCRE(3)                                                 PCRE(3)





NAME
       PCRE - Perl-compatible regular expressions

PCRE NATIVE API

       #include <pcre.h>

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_copy_substring(const char *subject, int  *ovec-
       tor,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int  pcre_get_substring(const  char *subject, int *ovec-
       tor,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int   *ovector,   int   stringcount,   const   char
       ***listptr);

       void pcre_free_substring(const char *stringptr);

       void pcre_free_substring_list(const char **stringptr);

       const unsigned char *pcre_maketables(void);

       int  pcre_fullinfo(const  pcre  *code,  const pcre_extra
       *extra,
            int what, void *where);

       int  pcre_info(const  pcre  *code,  int   *optptr,   int
       *firstcharptr);

       int pcre_config(int what, void *where);

       char *pcre_version(void);

       void *(*pcre_malloc)(size_t);

       void (*pcre_free)(void *);

       void *(*pcre_stack_malloc)(size_t);

       void (*pcre_stack_free)(void *);

       int (*pcre_callout)(pcre_callout_block *);

PCRE API OVERVIEW

       PCRE  has its own native API, which is described in this
       document. There is also a set of wrapper functions  that
       correspond  to  the POSIX regular expression API.  These
       are described in the pcreposix documentation.

       The native API function prototypes are  defined  in  the
       header  file  pcre.h,  and  on  Unix systems the library
       itself is called libpcre. It can normally be accessed by
       adding  -lpcre to the command for linking an application
       that uses PCRE.  The  header  file  defines  the  macros
       PCRE_MAJOR and PCRE_MINOR to contain the major and minor
       release numbers for the library.  Applications  can  use
       these to include support for different releases of PCRE.

       The   functions   pcre_compile(),   pcre_study(),    and
       pcre_exec()  are used for compiling and matching regular
       expressions. A sample program that demonstrates the sim-
       plest  way  of using them is provided in the file called
       pcredemo.c in the source  distribution.  The  pcresample
       documentation describes how to run it.

       In  addition  to  the  main compiling and matching func-
       tions, there are convenience  functions  for  extracting
       captured substrings from a matched subject string.  They
       are:

         pcre_copy_substring()
         pcre_copy_named_substring()
         pcre_get_substring()
         pcre_get_named_substring()
         pcre_get_substring_list()
         pcre_get_stringnumber()

       pcre_free_substring() and pcre_free_substring_list() are
       also  provided,  to  free  the memory used for extracted
       strings.

       The function pcre_maketables() is used to build a set of
       character  tables  in  the current locale for passing to
       pcre_compile() or  pcre_exec().   This  is  an  optional
       facility  that is provided for specialist use. Most com-
       monly, no special  tables  are  passed,  in  which  case
       internal  tables  that  are generated when PCRE is built
       are used.

       The function pcre_fullinfo() is used to find out  infor-
       mation about a compiled pattern; pcre_info() is an obso-
       lete version that returns only  some  of  the  available
       information,  but  is retained for backwards compatibil-
       ity.  The function pcre_version() returns a pointer to a
       string  containing  the  version of PCRE and its date of
       release.

       The global variables pcre_malloc and pcre_free initially
       contain  the  entry  points of the standard malloc() and
       free() functions, respectively. PCRE  calls  the  memory
       management  functions  via these variables, so a calling
       program can replace them if it wishes to  intercept  the
       calls. This should be done before calling any PCRE func-
       tions.

       The    global    variables     pcre_stack_malloc     and
       pcre_stack_free  are also indirections to memory manage-
       ment functions. These special functions  are  used  only
       when  PCRE  is  compiled to use the heap for remembering
       data, instead of recursive function  calls.  This  is  a
       non-standard  way  of building PCRE, for use in environ-
       ments that have limited stacks. Because of  the  greater
       use  of memory management, it runs more slowly. Separate
       functions are provided so that special-purpose  external
       code  can  be used for this case. When used, these func-
       tions are always called in  a  stack-like  manner  (last
       obtained,  first freed), and always for memory blocks of
       the same size.

       The  global  variable  pcre_callout  initially  contains
       NULL.  It  can be set by the caller to a "callout" func-
       tion, which PCRE will then call at specified points dur-
       ing a matching operation. Details are given in the pcre-
       callout documentation.

MULTITHREADING

       The PCRE functions can be used in multi-threading appli-
       cations,  with  the  proviso  that the memory management
       functions  pointed   to   by   pcre_malloc,   pcre_free,
       pcre_stack_malloc,  and pcre_stack_free, and the callout
       function pointed to by pcre_callout, are shared  by  all
       threads.

       The compiled form of a regular expression is not altered
       during matching, so the same compiled pattern can safely
       be used by several threads at once.

SAVING PRECOMPILED PATTERNS FOR LATER USE

       The  compiled  form of a regular expression can be saved
       and re-used at a later time,  possibly  by  a  different
       program,  and even on a host other than the one on which
       it was compiled. Details are given in the pcreprecompile
       documentation.

CHECKING BUILD-TIME OPTIONS

       int pcre_config(int what, void *where);

       The  function pcre_config() makes it possible for a PCRE
       client to discover which  optional  features  have  been
       compiled into the PCRE library. The pcrebuild documenta-
       tion has more details about these optional features.

       The first argument  for  pcre_config()  is  an  integer,
       specifying  which  information  is  required; the second
       argument is a pointer  to  a  variable  into  which  the
       information  is  placed.  The  following  information is
       available:

         PCRE_CONFIG_UTF8

       The output is an integer that is set  to  one  if  UTF-8
       support is available; otherwise it is set to zero.

         PCRE_CONFIG_UNICODE_PROPERTIES

       The  output  is an integer that is set to one if support
       for Unicode character properties is available; otherwise
       it is set to zero.

         PCRE_CONFIG_NEWLINE

       The output is an integer that is set to the value of the
       code that is used  for  the  newline  character.  It  is
       either linefeed (10) or carriage return (13), and should
       normally be the standard character  for  your  operating
       system.

         PCRE_CONFIG_LINK_SIZE

       The  output  is  an  integer that contains the number of
       bytes used for  internal  linkage  in  compiled  regular
       expressions.  The  value  is  2,  3, or 4. Larger values
       allow larger regular expressions to be compiled, at  the
       expense  of  slower  matching. The default value of 2 is
       sufficient for all but the most massive patterns,  since
       it  allows the compiled pattern to be up to 64K in size.

         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

       The output is an integer  that  contains  the  threshold
       above which the POSIX interface uses malloc() for output
       vectors. Further details are given in the pcreposix doc-
       umentation.

         PCRE_CONFIG_MATCH_LIMIT

       The  output  is  an integer that gives the default limit
       for the number of internal matching function calls in  a
       pcre_exec()  execution.  Further  details are given with
       pcre_exec() below.

         PCRE_CONFIG_STACKRECURSE

       The output is an integer that is set to one if  internal
       recursion  is  implemented  by  recursive function calls
       that use the stack to remember their state. This is  the
       usual  way  that PCRE is compiled. The output is zero if
       PCRE was compiled to use blocks  of  data  on  the  heap
       instead  of  recursive  function  calls.  In  this case,
       pcre_stack_malloc and pcre_stack_free are called to man-
       age  memory blocks on the heap, thus avoiding the use of
       the stack.

COMPILING A PATTERN

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       The function pcre_compile() is called to compile a  pat-
       tern  into  an  internal form. The pattern is a C string
       terminated by a binary zero, and is passed in  the  pat-
       tern  argument.  A  pointer  to a single block of memory
       that is obtained via pcre_malloc is returned. This  con-
       tains  the compiled code and related data. The pcre type
       is defined for the returned block; this is a typedef for
       a  structure  whose contents are not externally defined.
       It is up to the caller to free the memory when it is  no
       longer required.

       Although  the  compiled code of a PCRE regex is relocat-
       able, that is, it does not depend  on  memory  location,
       the  complete  pcre data block is not fully relocatable,
       because it may contain a copy of the tableptr  argument,
       which is an address (see below).

       The  options  argument  contains  independent  bits that
       affect the compilation. It should be zero if no  options
       are required. The available options are described below.
       Some of them, in particular, those that  are  compatible
       with  Perl,  can  also  be set and unset from within the
       pattern (see the detailed description in the pcrepattern
       documentation).  For  these options, the contents of the
       options argument specifies their initial settings at the
       start  of  compilation  and execution. The PCRE_ANCHORED
       option can be set at the time of matching as well as  at
       compile time.

       If  errptr  is NULL, pcre_compile() returns NULL immedi-
       ately.  Otherwise, if compilation of  a  pattern  fails,
       pcre_compile()  returns  NULL,  and  sets  the  variable
       pointed to by errptr to point to a  textual  error  mes-
       sage.  The  offset  from the start of the pattern to the
       character where the error was discovered  is  placed  in
       the  variable pointed to by erroffset, which must not be
       NULL. If it is, an immediate error is given.

       If the final argument, tableptr, is NULL,  PCRE  uses  a
       default set of character tables that are built when PCRE
       is compiled, using  the  default  C  locale.  Otherwise,
       tableptr must be an address that is the result of a call
       to pcre_maketables(). This value is stored with the com-
       piled  pattern,  and  used  again by pcre_exec(), unless
       another table pointer is passed to it. For more  discus-
       sion, see the section on locale support below.

       This  code fragment shows a typical straightforward call
       to pcre_compile():

         pcre *re;
         const char *error;
         int erroffset;
         re = pcre_compile(
           "^A.*Z",          /* the pattern */
           0,                /* default options */
           &error,           /* for error message */
           &erroffset,       /* for error offset */
           NULL);            /* use default character tables */

       The  following  names for option bits are defined in the
       pcre.h header file:

         PCRE_ANCHORED

       If this  bit  is  set,  the  pattern  is  forced  to  be
       "anchored",  that is, it is constrained to match only at
       the first matching point in the  string  that  is  being
       searched (the "subject string"). This effect can also be
       achieved  by  appropriate  constructs  in  the   pattern
       itself, which is the only way to do it in Perl.

         PCRE_AUTO_CALLOUT

       If this bit is set, pcre_compile() automatically inserts
       callout items, all with number 255, before each  pattern
       item.  For  discussion  of the callout facility, see the
       pcrecallout documentation.

         PCRE_CASELESS

       If this bit is set, letters in the  pattern  match  both
       upper and lower case letters. It is equivalent to Perl's
       /i option, and it can be changed within a pattern  by  a
       (?i)  option  setting.  When running in UTF-8 mode, case
       support for high-valued  characters  is  available  only
       when  PCRE is built with Unicode character property sup-
       port.

         PCRE_DOLLAR_ENDONLY

       If this bit  is  set,  a  dollar  metacharacter  in  the
       pattern  matches  only at the end of the subject string.
       Without this option, a dollar also  matches  immediately
       before  the  final character if it is a newline (but not
       before  any  other  newlines).  The  PCRE_DOLLAR_ENDONLY
       option  is ignored if PCRE_MULTILINE is set. There is no
       equivalent to this option in Perl, and no way to set  it
       within a pattern.

         PCRE_DOTALL

       If  this  bit  is set, a dot metacharater in the pattern
       matches all characters, including newlines. Without  it,
       newlines  are  excluded.  This  option  is equivalent to
       Perl's /s option, and it can be changed within a pattern
       by  a (?s) option setting. A negative class such as [^a]
       always matches a newline character, independent  of  the
       setting of this option.

         PCRE_EXTENDED

       If  this  bit  is set, whitespace data characters in the
       pattern are  totally  ignored  except  when  escaped  or
       inside  a  character  class. Whitespace does not include
       the VT character  (code  11).  In  addition,  characters
       between an unescaped # outside a character class and the
       next newline character,  inclusive,  are  also  ignored.
       This  is  equivalent  to Perl's /x option, and it can be
       changed within a pattern by a (?x) option setting.

       This option makes it possible to include comments inside
       complicated  patterns.  Note, however, that this applies
       only to data characters. Whitespace characters may never
       appear  within special character sequences in a pattern,
       for example within the sequence (?( which  introduces  a
       conditional subpattern.

         PCRE_EXTRA

       This  option was invented in order to turn on additional
       functionality of PCRE that is  incompatible  with  Perl,
       but  it  is  currently of very little use. When set, any
       backslash in a pattern that is followed by a letter that
       has  no  special meaning causes an error, thus reserving
       these combinations for future expansion. By default,  as
       in  Perl,  a backslash followed by a letter with no spe-
       cial meaning is treated  as  a  literal.  There  are  at
       present  no other features controlled by this option. It
       can also be set by a (?X) option setting within  a  pat-
       tern.

         PCRE_MULTILINE

       By default, PCRE treats the subject string as consisting
       of a single line of characters (even if it actually con-
       tains  newlines).  The "start of line" metacharacter (^)
       matches only at the start of the string, while the  "end
       of  line"  metacharacter  ($) matches only at the end of
       the string, or  before  a  terminating  newline  (unless
       PCRE_DOLLAR_ENDONLY is set). This is the same as Perl.

       When  PCRE_MULTILINE  it is set, the "start of line" and
       "end of line" constructs match immediately following  or
       immediately  before  any  newline in the subject string,
       respectively, as well as at the very start and end. This
       is equivalent to Perl's /m option, and it can be changed
       within a pattern by a (?m) option setting. If there  are
       no  "\n"  characters  in  a subject string, or no occur-
       rences of ^ or $ in a  pattern,  setting  PCRE_MULTILINE
       has no effect.

         PCRE_NO_AUTO_CAPTURE

       If  this  option is set, it disables the use of numbered
       capturing parentheses in the pattern. Any opening paren-
       thesis  that  is not followed by ? behaves as if it were
       followed by ?: but named parentheses can still  be  used
       for  capturing  (and  they  acquire numbers in the usual
       way). There is no equivalent of this option in Perl.

         PCRE_UNGREEDY

       This option inverts the "greediness" of the  quantifiers
       so  that  they  are  not  greedy  by default, but become
       greedy if followed by "?". It  is  not  compatible  with
       Perl. It can also be set by a (?U) option setting within
       the pattern.

         PCRE_UTF8

       This option causes PCRE to regard both the  pattern  and
       the  subject  as  strings of UTF-8 characters instead of
       single-byte character strings. However, it is  available
       only  when  PCRE  is  built to include UTF-8 support. If
       not, the use of this option provokes an  error.  Details
       of  how  this  option  changes the behaviour of PCRE are
       given in the section on UTF-8 support in the  main  pcre
       page.

         PCRE_NO_UTF8_CHECK

       When  PCRE_UTF8 is set, the validity of the pattern as a
       UTF-8 string is automatically  checked.  If  an  invalid
       UTF-8 sequence of bytes is found, pcre_compile() returns
       an error. If you  already  know  that  your  pattern  is
       valid,  and  you want to skip this check for performance
       reasons, you can set the PCRE_NO_UTF8_CHECK option. When
       it is set, the effect of passing an invalid UTF-8 string
       as a pattern is undefined. It may cause your program  to
       crash.   Note  that  this  option  can also be passed to
       pcre_exec(), to suppress the UTF-8 validity checking  of
       subject strings.

STUDYING A PATTERN

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       If a compiled pattern is going to be used several times,
       it is worth spending more time analyzing it in order  to
       speed  up  the  time  taken  for  matching. The function
       pcre_study() takes a pointer to a  compiled  pattern  as
       its  first  argument.  If  studying the pattern produces
       additional information that will help speed up matching,
       pcre_study() returns a pointer to a pcre_extra block, in
       which the study_data field points to the results of  the
       study.

       The  returned  value  from  pcre_study()  can  be passed
       directly to pcre_exec().  However,  a  pcre_extra  block
       also contains other fields that can be set by the caller
       before the block is passed; these are described below in
       the section on matching a pattern.

       If  studying the pattern does not produce any additional
       information, pcre_study() returns NULL. In that  circum-
       stance,  if the calling program wants to pass any of the
       other fields to pcre_exec(), it  must  set  up  its  own
       pcre_extra block.

       The  second  argument  of  pcre_study()  contains option
       bits. At present, no options are defined, and this argu-
       ment should always be zero.

       The  third argument for pcre_study() is a pointer for an
       error message. If studying succeeds (even if no data  is
       returned),  the  variable  it  points to is set to NULL.
       Otherwise it points to  a  textual  error  message.  You
       should  therefore  test the error pointer for NULL after
       calling pcre_study(), to be sure that it  has  run  suc-
       cessfully.

       This is a typical call to pcre_study():

         pcre_extra *pe;
         pe = pcre_study(
           re,             /* result of pcre_compile() */
           0,              /* no options exist */
           &error);         /*  set to NULL or points to a mes-
       sage */

       At present, studying a pattern is useful only  for  non-
       anchored patterns that do not have a single fixed start-
       ing character. A bitmap of possible  starting  bytes  is
       created.

LOCALE SUPPORT

       PCRE  handles  caseless matching, and determines whether
       characters are letters, digits, or whatever,  by  refer-
       ence  to  a  set  of tables, indexed by character value.
       (When running in UTF-8 mode, this applies only to  char-
       acters  with  codes  less  than 128. Higher-valued codes
       never match escapes such as \w or \d, but can be  tested
       with \p if PCRE is built with Unicode character property
       support.)

       An internal set of tables is created in  the  default  C
       locale  when  PCRE is built. This is used when the final
       argument of pcre_compile() is NULL,  and  is  sufficient
       for many applications. An alternative set of tables can,
       however, be supplied. These may be created in a  differ-
       ent  locale  from the default. As more and more applica-
       tions change to using Unicode, the need for this  locale
       support is expected to die away.

       External  tables  are  built by calling the pcre_maketa-
       bles() function, which has no arguments, in the relevant
       locale.  The result can then be passed to pcre_compile()
       or pcre_exec() as often as necessary.  For  example,  to
       build and use tables that are appropriate for the French
       locale (where accented characters  with  values  greater
       than  128  are  treated  as letters), the following code
       could be used:

         setlocale(LC_CTYPE, "fr_FR");
         tables = pcre_maketables();
         re = pcre_compile(..., tables);

       When pcre_maketables() runs, the  tables  are  built  in
       memory  that  is  obtained  via  pcre_malloc.  It is the
       caller's responsibility to ensure that the  memory  con-
       taining  the  tables remains available for as long as it
       is needed.

       The pointer that is passed to  pcre_compile()  is  saved
       with  the compiled pattern, and the same tables are used
       via this pointer by pcre_study() and  normally  also  by
       pcre_exec().  Thus,  by default, for any single pattern,
       compilation, studying and matching  all  happen  in  the
       same  locale,  but different patterns can be compiled in
       different locales.

       It is possible to pass a table pointer or NULL (indicat-
       ing  the  use  of  the  internal tables) to pcre_exec().
       Although not intended for this  purpose,  this  facility
       could  be  used to match a pattern in a different locale
       from the one in which it  was  compiled.  Passing  table
       pointers  at  run time is discussed below in the section
       on matching a pattern.

INFORMATION ABOUT A PATTERN

       int pcre_fullinfo(const  pcre  *code,  const  pcre_extra
       *extra,
            int what, void *where);

       The pcre_fullinfo() function returns information about a
       compiled pattern. It replaces the  obsolete  pcre_info()
       function,  which  is nevertheless retained for backwards
       compability (and is documented below).

       The first argument for pcre_fullinfo() is a  pointer  to
       the  compiled pattern. The second argument is the result
       of pcre_study(), or NULL if the pattern was not studied.
       The  third argument specifies which piece of information
       is required, and the fourth argument is a pointer  to  a
       variable  to receive the data. The yield of the function
       is zero for success, or one of  the  following  negative
       numbers:

         PCRE_ERROR_NULL       the argument code was NULL
                               the argument where was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found
         PCRE_ERROR_BADOPTION  the value of what was invalid

       The "magic number" is placed at the start of  each  com-
       piled  pattern  as  an  simple  check against passing an
       arbitrary memory pointer. Here  is  a  typical  call  of
       pcre_fullinfo(),  to  obtain  the length of the compiled
       pattern:

         int rc;
         unsigned long int length;
         rc = pcre_fullinfo(
           re,               /* result of pcre_compile() */
           pe,               /* result of pcre_study(), or NULL
       */
           PCRE_INFO_SIZE,   /* what is required */
           &length);         /* where to put the data */

       The  possible  values for the third argument are defined
       in pcre.h, and are as follows:

         PCRE_INFO_BACKREFMAX

       Return the number of the highest back reference  in  the
       pattern.  The  fourth  argument  should  point to an int
       variable. Zero is returned if there are no  back  refer-
       ences.

         PCRE_INFO_CAPTURECOUNT

       Return  the  number of capturing subpatterns in the pat-
       tern. The fourth argument should point to an  int  vari-
       able.

         PCRE_INFO_DEFAULTTABLES

       Return  a  pointer  to  the  internal  default character
       tables within PCRE. The fourth argument should point  to
       an  unsigned  char  * variable. This information call is
       provided for internal use by the pcre_study()  function.
       External  callers  can  cause  PCRE  to use its internal
       tables by passing a NULL table pointer.

         PCRE_INFO_FIRSTBYTE

       Return information about the first byte of  any  matched
       string, for a non-anchored pattern. (This option used to
       be called PCRE_INFO_FIRSTCHAR; the  old  name  is  still
       recognized for backwards compatibility.)

       If there is a fixed first byte, for example, from a pat-
       tern such as (cat|cow|coyote), it  is  returned  in  the
       integer pointed to by where.  Otherwise, if either

       (a)  the  pattern  was  compiled with the PCRE_MULTILINE
       option, and every branch starts with "^", or

       (b) every branch of the pattern  starts  with  ".*"  and
       PCRE_DOTALL  is  not  set  (if  it were set, the pattern
       would be anchored),

       -1 is returned, indicating that the pattern matches only
       at  the  start  of a subject string or after any newline
       within  the  string.  Otherwise  -2  is  returned.   For
       anchored patterns, -2 is returned.

         PCRE_INFO_FIRSTTABLE

       If  the  pattern  was  studied, and this resulted in the
       construction of a 256-bit table indicating a  fixed  set
       of  bytes  for  the first byte in any matching string, a
       pointer to the table  is  returned.  Otherwise  NULL  is
       returned.   The  fourth  argument  should  point  to  an
       unsigned char * variable.

         PCRE_INFO_LASTLITERAL

       Return the value of the rightmost literal byte that must
       exist in any matched string, other than at its start, if
       such a byte  has  been  recorded.  The  fourth  argument
       should  point  to  an  int variable. If there is no such
       byte, -1 is returned. For anchored patterns, a last lit-
       eral  byte  is  recorded only if it follows something of
       variable  length.   For   example,   for   the   pattern
       /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/
       the returned value is -1.

         PCRE_INFO_NAMECOUNT
         PCRE_INFO_NAMEENTRYSIZE
         PCRE_INFO_NAMETABLE

       PCRE supports the use of named as well as numbered  cap-
       turing parentheses. The names are just an additional way
       of identifying the parentheses, which still acquire num-
       bers.  A convenience function called pcre_get_named_sub-
       string() is provided for extracting an  individual  cap-
       tured  substring by name. It is also possible to extract
       the data directly, by first converting  the  name  to  a
       number  in  order  to access the correct pointers in the
       output vector (described with pcre_exec() below). To  do
       the  conversion, you need to use the name-to-number map,
       which is described by these three values.

       The map consists of  a  number  of  fixed-size  entries.
       PCRE_INFO_NAMECOUNT  gives  the  number  of entries, and
       PCRE_INFO_NAMEENTRYSIZE gives the size  of  each  entry;
       both  of  these  return  an  int  value.  The entry size
       depends   on   the   length   of   the   longest   name.
       PCRE_INFO_NAMETABLE returns a pointer to the first entry
       of the table (a pointer to char). The first two bytes of
       each  entry are the number of the capturing parenthesis,
       most significant byte first. The rest of  the  entry  is
       the  corresponding  name, zero terminated. The names are
       in alphabetical order. For example, consider the follow-
       ing pattern (assume PCRE_EXTENDED is set, so white space
       - including newlines - is ignored):

         (?P<date> (?P<year>(\d\d)?\d\d) -
         (?P<month>\d\d) - (?P<day>\d\d) )

       There are four named subpatterns, so the table has  four
       entries,  and  each  entry  in  the table is eight bytes
       long. The table is as follows, with  non-printing  bytes
       shows in hexadecimal, and undefined bytes shown as ??:

         00 01 d  a  t  e  00 ??
         00 05 d  a  y  00 ?? ??
         00 04 m  o  n  t  h  00
         00 02 y  e  a  r  00 ??

       When writing code to extract data from named subpatterns
       using the name-to-number map, remember that  the  length
       of  each  entry  is likely to be different for each com-
       piled pattern.

         PCRE_INFO_OPTIONS

       Return a copy of the options with which the pattern  was
       compiled.   The  fourth  argument  should  point  to  an
       unsigned long int variable. These option bits are  those
       specified in the call to pcre_compile(), modified by any
       top-level option settings within the pattern itself.

       A pattern is automatically anchored by PCRE  if  all  of
       its top-level alternatives begin with one of the follow-
       ing:

         ^     unless PCRE_MULTILINE is set
         \A    always
         \G    always
         .*    if PCRE_DOTALL is set and there are no back
                 references  to  the  subpattern  in  which  .*
       appears

       For  such  patterns, the PCRE_ANCHORED bit is set in the
       options returned by pcre_fullinfo().

         PCRE_INFO_SIZE

       Return the size of the compiled pattern,  that  is,  the
       value  that  was passed as the argument to pcre_malloc()
       when PCRE was getting memory in which to place the  com-
       piled data. The fourth argument should point to a size_t
       variable.

         PCRE_INFO_STUDYSIZE

       Return the size of the data  block  pointed  to  by  the
       study_data  field  in a pcre_extra block. That is, it is
       the value that was passed to pcre_malloc() when PCRE was
       getting  memory  into which to place the data created by
       pcre_study(). The fourth  argument  should  point  to  a
       size_t variable.

OBSOLETE INFO FUNCTION

       int   pcre_info(const   pcre  *code,  int  *optptr,  int
       *firstcharptr);

       The pcre_info() function is  now  obsolete  because  its
       interface is too restrictive to return all the available
       data about a compiled pattern. New programs  should  use
       pcre_fullinfo() instead. The yield of pcre_info() is the
       number of capturing subpatterns, or one of the following
       negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found

       If the optptr argument  is  not  NULL,  a  copy  of  the
       options with which the pattern was compiled is placed in
       the integer it points to (see PCRE_INFO_OPTIONS  above).

       If  the  pattern  is  not  anchored and the firstcharptr
       argument is not NULL, it is used to pass  back  informa-
       tion  about  the  first  character of any matched string
       (see PCRE_INFO_FIRSTBYTE above).

MATCHING A PATTERN

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       The  function  pcre_exec()  is called to match a subject
       string against a compiled pattern, which  is  passed  in
       the  code argument. If the pattern has been studied, the
       result of the study should be passed in the extra  argu-
       ment.

       In  most  applications,  the pattern will have been com-
       piled (and optionally studied) in the same process  that
       calls  pcre_exec(). However, it is possible to save com-
       piled patterns and study data, and then use  them  later
       in  different  processes,  possibly  even  on  different
       hosts. For a discussion about this, see the  pcreprecom-
       pile documentation.

       Here is an example of a simple call to pcre_exec():

         int rc;
         int ovector[30];
         rc = pcre_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,              /* the length of the subject string
       */
           0,              /* start at offset 0 in the  subject
       */
           0,              /* default options */
           ovector,         /* vector of integers for substring
       information */
           30);            /* number of elements in the  vector
       (NOT size in bytes) */

   Extra data for pcre_exec()

       If  the  extra  argument is not NULL, it must point to a
       pcre_extra data block. The pcre_study() function returns
       such  a block (when it doesn't return NULL), but you can
       also create one for yourself, and pass additional infor-
       mation  in  it.  The fields in a pcre_extra block are as
       follows:

         unsigned long int flags;
         void *study_data;
         unsigned long int match_limit;
         void *callout_data;
         const unsigned char *tables;

       The flags field is a bitmap that specifies which of  the
       other fields are set. The flag bits are:

         PCRE_EXTRA_STUDY_DATA
         PCRE_EXTRA_MATCH_LIMIT
         PCRE_EXTRA_CALLOUT_DATA
         PCRE_EXTRA_TABLES

       Other  flag  bits  should be set to zero. The study_data
       field is set in the pcre_extra block that is returned by
       pcre_study(),  together  with  the appropriate flag bit.
       You should not set this yourself, but you may add to the
       block  by setting the other fields and their correspond-
       ing flag bits.

       The match_limit field provides  a  means  of  preventing
       PCRE  from using up a vast amount of resources when run-
       ning patterns that are not going  to  match,  but  which
       have  a  very  large  number  of  possibilities in their
       search trees. The classic example is the use  of  nested
       unlimited repeats.

       Internally, PCRE uses a function called match() which it
       calls repeatedly (sometimes recursively). The  limit  is
       imposed  on  the number of times this function is called
       during a match, which has the  effect  of  limiting  the
       amount  of  recursion  and  backtracking  that  can take
       place. For patterns that are  not  anchored,  the  count
       starts  from  zero  for  each  position  in  the subject
       string.

       The default limit for the library can be set  when  PCRE
       is  built; the default default is 10 million, which han-
       dles all but the most extreme cases. You can reduce  the
       default  by suppling pcre_exec() with a pcre_extra block
       in which match_limit is set  to  a  smaller  value,  and
       PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the
       limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCH-
       LIMIT.

       The  pcre_callout  field is used in conjunction with the
       "callout" feature, which is described in the pcrecallout
       documentation.

       The  tables  field  is  used  to pass a character tables
       pointer to pcre_exec(); this overrides the value that is
       stored  with  the  compiled pattern. A non-NULL value is
       stored with the compiled pattern only if  custom  tables
       were  supplied  to pcre_compile() via its tableptr argu-
       ment.  If NULL is passed to pcre_exec() using this mech-
       anism, it forces PCRE's internal tables to be used. This
       facility is helpful when  re-using  patterns  that  have
       been  saved  after  compiling  with  an  external set of
       tables, because the external tables might be at  a  dif-
       ferent  address  when  pcre_exec()  is  called.  See the
       pcreprecompile documentation for a discussion of  saving
       compiled patterns for later use.

   Option bits for pcre_exec()

       The  unused bits of the options argument for pcre_exec()
       must be  zero.  The  only  bits  that  may  be  set  are
       PCRE_ANCHORED,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
       PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.

         PCRE_ANCHORED

       The PCRE_ANCHORED option limits pcre_exec() to  matching
       at  the  first  matching position. If a pattern was com-
       piled with PCRE_ANCHORED, or turned out to  be  anchored
       by  virtue  of its contents, it cannot be made unachored
       at matching time.

         PCRE_NOTBOL

       This option specifies that first character of  the  sub-
       ject  string is not the beginning of a line, so the cir-
       cumflex metacharacter should not match before  it.  Set-
       ting  this  without  PCRE_MULTILINE  (at  compile  time)
       causes circumflex never to match.  This  option  affects
       only  the  behaviour of the circumflex metacharacter. It
       does not affect \A.

         PCRE_NOTEOL

       This option specifies that the end of the subject string
       is  not  the  end of a line, so the dollar metacharacter
       should not match it nor (except  in  multiline  mode)  a
       newline  immediately  before  it.  Setting  this without
       PCRE_MULTILINE (at compile time) causes dollar never  to
       match.  This  option  affects  only the behaviour of the
       dollar metacharacter. It does not affect \Z or \z.

         PCRE_NOTEMPTY

       An empty string is not considered to be a valid match if
       this  option  is  set.  If there are alternatives in the
       pattern, they are tried. If all the  alternatives  match
       the  empty  string, the entire match fails. For example,
       if the pattern

         a?b?

       is applied to a string not beginning with "a" or "b", it
       matches  the  empty  string at the start of the subject.
       With PCRE_NOTEMPTY set, this match is not valid, so PCRE
       searches  further into the string for occurrences of "a"
       or "b".

       Perl has no direct equivalent of PCRE_NOTEMPTY,  but  it
       does make a special case of a pattern match of the empty
       string within its split() function, and when  using  the
       /g  modifier. It is possible to emulate Perl's behaviour
       after matching a null string by first trying  the  match
       again   at   the  same  offset  with  PCRE_NOTEMPTY  and
       PCRE_ANCHORED, and then if that fails by  advancing  the
       starting offset (see below) and trying an ordinary match
       again. There is some code that demonstrates  how  to  do
       this in the pcredemo.c sample program.

         PCRE_NO_UTF8_CHECK

       When  PCRE_UTF8  is set at compile time, the validity of
       the subject as a UTF-8 string is  automatically  checked
       when  pcre_exec()  is subsequently called.  The value of
       startoffset is also checked to ensure that it points  to
       the  start  of  a  UTF-8  character. If an invalid UTF-8
       sequence of bytes  is  found,  pcre_exec()  returns  the
       error  PCRE_ERROR_BADUTF8.  If  startoffset  contains an
       invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.

       If you already know that your subject is valid, and  you
       want  to  skip these checks for performance reasons, you
       can  set  the  PCRE_NO_UTF8_CHECK  option  when  calling
       pcre_exec().  You  might  want to do this for the second
       and subsequent calls to pcre_exec() if  you  are  making
       repeated  calls  to  find  all  the  matches in a single
       subject string. However, you should  be  sure  that  the
       value  of  startoffset  points  to  the start of a UTF-8
       character. When PCRE_NO_UTF8_CHECK is set, the effect of
       passing an invalid UTF-8 string as a subject, or a value
       of startoffset that does not point to  the  start  of  a
       UTF-8 character, is undefined. Your program may crash.

         PCRE_PARTIAL

       This  option  turns  on the partial matching feature. If
       the subject string fails to match the  pattern,  but  at
       some  point  during  the matching process the end of the
       subject was reached  (that  is,  the  subject  partially
       matches  the  pattern  and the failure to match occurred
       only because there were not enough subject  characters),
       pcre_exec()   returns   PCRE_ERROR_PARTIAL   instead  of
       PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is used, there are
       restrictions  on  what  may appear in the pattern. These
       are discussed in the pcrepartial documentation.

   The string to be matched by pcre_exec()

       The subject string is passed to pcre_exec() as a pointer
       in subject, a length in length, and a starting byte off-
       set in startoffset. In UTF-8 mode, the byte offset  must
       point to the start of a UTF-8 character. Unlike the pat-
       tern string, the subject may contain binary zero  bytes.
       When the starting offset is zero, the search for a match
       starts at the beginning of the subject, and this  is  by
       far the most common case.

       A  non-zero starting offset is useful when searching for
       another match in the same subject by calling pcre_exec()
       again  after  a  previous  success.  Setting startoffset
       differs from just passing over a  shortened  string  and
       setting PCRE_NOTBOL in the case of a pattern that begins
       with any kind of lookbehind. For example,  consider  the
       pattern

         \Biss\B

       which finds occurrences of "iss" in the middle of words.
       (\B matches only if the current position in the  subject
       is  not  a  word  boundary.)  When applied to the string
       "Mississipi" the first call  to  pcre_exec()  finds  the
       first  occurrence.  If  pcre_exec() is called again with
       just the remainder of the subject, namely  "issipi",  it
       does  not match, because \B is always false at the start
       of the subject, which is deemed to be a  word  boundary.
       However,  if  pcre_exec()  is  passed  the entire string
       again, but with startoffset set to 4, it finds the  sec-
       ond  occurrence  of  "iss"  because  it  is able to look
       behind the starting point to discover that  it  is  pre-
       ceded by a letter.

       If a non-zero starting offset is passed when the pattern
       is anchored, one attempt to match at the given offset is
       made.  This  can  only  succeed  if the pattern does not
       require the match to be at the start of the subject.

   How pcre_exec() returns captured substrings

       In general, a pattern matches a certain portion  of  the
       subject,  and  in  addition, further substrings from the
       subject may be picked out by parts of the pattern.  Fol-
       lowing  the  usage  in  Jeffrey  Friedl's  book, this is
       called "capturing" in what follows, and the phrase "cap-
       turing  subpattern"  is used for a fragment of a pattern
       that picks out a substring. PCRE supports several  other
       kinds  of  parenthesized  subpattern  that  do not cause
       substrings to be captured.

       Captured substrings are returned to  the  caller  via  a
       vector  of  integer  offsets  whose address is passed in
       ovector. The number of elements in the vector is  passed
       in  ovecsize, which must be a non-negative number. Note:
       this argument is NOT the size of ovector in bytes.

       The first two-thirds of the vector is used to pass  back
       captured  substrings,  each  substring  using  a pair of
       integers. The remaining third of the vector is  used  as
       workspace  by  pcre_exec() while matching capturing sub-
       patterns, and is not available for passing back informa-
       tion.  The  length passed in ovecsize should always be a
       multiple of three. If it is not, it is rounded down.

       When a match is successful, information  about  captured
       substrings is returned in pairs of integers, starting at
       the beginning of ovector,  and  continuing  up  to  two-
       thirds of its length at the most. The first element of a
       pair is set to the offset of the first  character  in  a
       substring,  and  the  second is set to the offset of the
       first character after the end of a substring. The  first
       pair, ovector[0] and ovector[1], identify the portion of
       the subject string matched by the  entire  pattern.  The
       next  pair  is  used for the first capturing subpattern,
       and so on. The value returned by pcre_exec() is the num-
       ber of pairs that have been set. If there are no captur-
       ing subpatterns, the  return  value  from  a  successful
       match  is 1, indicating that just the first pair of off-
       sets has been set.

       Some convenience functions are provided  for  extracting
       the  captured  substrings as separate strings. These are
       described in the following section.

       It is possible for an capturing subpattern number n+1 to
       match some part of the subject when subpattern n has not
       been used at all. For example, if the  string  "abc"  is
       matched  against  the  pattern (a|(z))(bc) subpatterns 1
       and 3 are matched, but 2 is not. When this happens, both
       offset values corresponding to the unused subpattern are
       set to -1.

       If a capturing subpattern is matched repeatedly,  it  is
       the  last  portion of the string that it matched that is
       returned.

       If the vector is too small to hold all the captured sub-
       string  offsets,  it  is  used as far as possible (up to
       two-thirds of its length), and the  function  returns  a
       value  of  zero. In particular, if the substring offsets
       are not of interest,  pcre_exec()  may  be  called  with
       ovector passed as NULL and ovecsize as zero. However, if
       the pattern contains back references and the ovector  is
       not  big enough to remember the related substrings, PCRE
       has to get additional memory for  use  during  matching.
       Thus it is usually advisable to supply an ovector.

       Note  that  pcre_info() can be used to find out how many
       capturing subpatterns there are in a  compiled  pattern.
       The smallest size for ovector that will allow for n cap-
       tured substrings, in addition to the offsets of the sub-
       string matched by the whole pattern, is (n+1)*3.

   Return values from pcre_exec()

       If  pcre_exec() fails, it returns a negative number. The
       following are defined in the header file:

         PCRE_ERROR_NOMATCH        (-1)

       The subject string did not match the pattern.

         PCRE_ERROR_NULL           (-2)

       Either code or subject was passed as  NULL,  or  ovector
       was NULL and ovecsize was not zero.

         PCRE_ERROR_BADOPTION      (-3)

       An unrecognized bit was set in the options argument.

         PCRE_ERROR_BADMAGIC       (-4)

       PCRE  stores a 4-byte "magic number" at the start of the
       compiled code, to catch the case when  it  is  passed  a
       junk  pointer and to detect when a pattern that was com-
       piled in an environment of one endianness is run  in  an
       environment with the other endianness. This is the error
       that PCRE gives when the magic number is not present.

         PCRE_ERROR_UNKNOWN_NODE   (-5)

       While running the pattern match,  an  unknown  item  was
       encountered in the compiled pattern. This error could be
       caused by a bug in PCRE or by overwriting  of  the  com-
       piled pattern.

         PCRE_ERROR_NOMEMORY       (-6)

       If  a  pattern contains back references, but the ovector
       that is passed to  pcre_exec()  is  not  big  enough  to
       remember the referenced substrings, PCRE gets a block of
       memory at the start of matching to use for this purpose.
       If  the  call  via  pcre_malloc()  fails,  this error is
       given. The memory is automatically freed at the  end  of
       matching.

         PCRE_ERROR_NOSUBSTRING    (-7)

       This   error   is  used  by  the  pcre_copy_substring(),
       pcre_get_substring(),   and    pcre_get_substring_list()
       functions   (see   below).   It  is  never  returned  by
       pcre_exec().

         PCRE_ERROR_MATCHLIMIT     (-8)

       The recursion and backtracking limit,  as  specified  by
       the  match_limit  field  in  a  pcre_extra structure (or
       defaulted) was reached. See the description above.

         PCRE_ERROR_CALLOUT        (-9)

       This error is never generated by pcre_exec() itself.  It
       is  provided  for  use by callout functions that want to
       yield a distinctive error code. See the pcrecallout doc-
       umentation for details.

         PCRE_ERROR_BADUTF8        (-10)

       A  string  that  contains an invalid UTF-8 byte sequence
       was passed as a subject.

         PCRE_ERROR_BADUTF8_OFFSET (-11)

       The UTF-8 byte sequence that was passed as a subject was
       valid, but the value of startoffset did not point to the
       beginning of a UTF-8 character.

         PCRE_ERROR_PARTIAL (-12)

       The subject string did not match, but it did match  par-
       tially. See the pcrepartial documentation for details of
       partial matching.

         PCRE_ERROR_BAD_PARTIAL (-13)

       The PCRE_PARTIAL option was used with a compiled pattern
       containing  items  that  are  not  supported for partial
       matching. See the pcrepartial documentation for  details
       of partial matching.

         PCRE_ERROR_INTERNAL (-14)

       An  unexpected  internal  error has occurred. This error
       could be caused by a bug in PCRE or  by  overwriting  of
       the compiled pattern.

         PCRE_ERROR_BADCOUNT (-15)

       This  error  is given if the value of the ovecsize argu-
       ment is negative.

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

       int pcre_copy_substring(const char *subject, int  *ovec-
       tor,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int  pcre_get_substring(const  char *subject, int *ovec-
       tor,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int   *ovector,   int   stringcount,   const   char
       ***listptr);

       Captured  substrings  can  be accessed directly by using
       the offsets returned by pcre_exec() in ovector. For con-
       venience,     the    functions    pcre_copy_substring(),
       pcre_get_substring(), and pcre_get_substring_list()  are
       provided for extracting captured substrings as new, sep-
       arate, zero-terminated strings. These functions identify
       substrings  by  number. The next section describes func-
       tions for extracting named substrings. A substring  that
       contains  a binary zero is correctly extracted and has a
       further zero added on the end, but the result is not, of
       course, a C string.

       The  first three arguments are the same for all three of
       these functions: subject is the subject string that  has
       just  been successfully matched, ovector is a pointer to
       the  vector  of  integer  offsets  that  was  passed  to
       pcre_exec(), and stringcount is the number of substrings
       that were captured by the match, including the substring
       that  matched the entire regular expression. This is the
       value returned by pcre_exec()  if  it  is  greater  than
       zero.  If  pcre_exec() returned zero, indicating that it
       ran out of space in ovector, the value passed as string-
       count  should  be  the  number of elements in the vector
       divided by three.

       The functions  pcre_copy_substring()  and  pcre_get_sub-
       string()  extract  a  single  substring, whose number is
       given as stringnumber. A value of zero extracts the sub-
       string  that  matched the entire pattern, whereas higher
       values   extract   the    captured    substrings.    For
       pcre_copy_substring(),  the  string is placed in buffer,
       whose  length  is  given  by   buffersize,   while   for
       pcre_get_substring()  a  new block of memory is obtained
       via pcre_malloc, and its address is returned via string-
       ptr.  The  yield  of  the  function is the length of the
       string, not including the terminating zero, or one of

         PCRE_ERROR_NOMEMORY       (-6)

       The buffer was too small for  pcre_copy_substring(),  or
       the  attempt  to  get  memory  failed  for pcre_get_sub-
       string().

         PCRE_ERROR_NOSUBSTRING    (-7)

       There is no substring whose number is stringnumber.

       The  pcre_get_substring_list()  function  extracts   all
       available  substrings  and  builds a list of pointers to
       them. All this is done in a single block of memory  that
       is  obtained  via pcre_malloc. The address of the memory
       block is returned via listptr, which is also  the  start
       of  the  list of string pointers. The end of the list is
       marked by a NULL pointer. The yield of the  function  is
       zero if all went well, or

         PCRE_ERROR_NOMEMORY       (-6)

       if the attempt to get the memory block failed.

       When  any  of these functions encounter a substring that
       is unset, which can  happen  when  capturing  subpattern
       number n+1 matches some part of the subject, but subpat-
       tern n has not been used at all, they  return  an  empty
       string.  This  can be distinguished from a genuine zero-
       length substring by inspecting the appropriate offset in
       ovector, which is negative for unset substrings.

       The  two convenience functions pcre_free_substring() and
       pcre_free_substring_list() can be used to free the  mem-
       ory  returned by a previous call of pcre_get_substring()
       or  pcre_get_substring_list(),  respectively.  They   do
       nothing  more  than  call  the  function  pointed  to by
       pcre_free, which of course could be called directly from
       a  C  program.  However, PCRE is used in some situations
       where it is linked via a special  interface  to  another
       programming   language   which   cannot   use  pcre_free
       directly; it is for these cases that the  functions  are
       provided.

EXTRACTING CAPTURED SUBSTRINGS BY NAME

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       To  extract  a substring by name, you first have to find
       associated number.  For example, for this pattern

         (a+)b(?<xxx>\d+)...

       the number of the subpattern called "xxx" is 2. You  can
       find    the    number   from   the   name   by   calling
       pcre_get_stringnumber(). The first argument is the  com-
       piled  pattern, and the second is the name. The yield of
       the   function   is   the    subpattern    number,    or
       PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of
       that name.

       Given  the  number,  you  can  extract   the   substring
       directly,  or  use one of the functions described in the
       previous section. For convenience, there  are  also  two
       functions that do the whole job.

       Most of the arguments of pcre_copy_named_substring() and
       pcre_get_named_substring() are the same as those for the
       similarly  named  functions  that  extract by number. As
       these are described in the previous  section,  they  are
       not re-described here. There are just two differences:

       First,  instead  of a substring number, a substring name
       is given. Second, there is an extra argument,  given  at
       the  start,  which is a pointer to the compiled pattern.
       This is needed in order to gain access to  the  name-to-
       number translation table.

       These  functions call pcre_get_stringnumber(), and if it
       succeeds,  they  then  call   pcre_copy_substring()   or
       pcre_get_substring(), as appropriate.

Last updated: 09 September 2004
Copyright (c) 1997-2004 University of Cambridge.



                                                        PCRE(3)
