PCRE(3)                                                 PCRE(3)





NAME
       PCRE - Perl-compatible regular expressions.

SYNOPSIS OF POSIX API

       #include <pcreposix.h>

       int regcomp(regex_t *preg, const char *pattern,
            int cflags);

       int regexec(regex_t *preg, const char *string,
            size_t nmatch, regmatch_t pmatch[], int eflags);

       size_t regerror(int errcode, const regex_t *preg,
            char *errbuf, size_t errbuf_size);

       void regfree(regex_t *preg);

DESCRIPTION

       This  set of functions provides a POSIX-style API to the
       PCRE regular expression package. See the  pcreapi  docu-
       mentation  for a description of PCRE's native API, which
       contains additional functionality.

       The functions described here are just wrapper  functions
       that  ultimately  call the PCRE native API. Their proto-
       types are defined in the pcreposix.h header file, and on
       Unix  systems  the library itself is called pcreposix.a,
       so can be accessed by adding -lpcreposix to the  command
       for  linking  an application that uses them. Because the
       POSIX functions call the native ones, it is also  neces-
       sary to add -lpcre.

       I  have  implemented  only those option bits that can be
       reasonably mapped to PCRE native options.  In  addition,
       the  options REG_EXTENDED and REG_NOSUB are defined with
       the value zero. They have no effect, but since  programs
       that  are written to the POSIX interface often use them,
       this makes it easier to slot in PCRE  as  a  replacement
       library. Other POSIX options are not even defined.

       When  PCRE is called via these functions, it is only the
       API that is POSIX-like in style. The syntax  and  seman-
       tics  of  the  regular  expressions themselves are still
       those of Perl, subject to the setting  of  various  PCRE
       options, as described below. "POSIX-like in style" means
       that the API approximates to the POSIX definition; it is
       not  fully  POSIX-compatible, and in multi-byte encoding
       domains it is probably even less compatible.

       The  header  for  these   functions   is   supplied   as
       pcreposix.h  to  avoid  any  potential  clash with other
       POSIX libraries.  It  can,  of  course,  be  renamed  or
       aliased as regex.h, which is the "correct" name. It pro-
       vides two structure types, regex_t for compiled internal
       forms, and regmatch_t for returning captured substrings.
       It also defines some constants whose  names  start  with
       "REG_"; these are used for setting options and identify-
       ing error codes.


COMPILING A PATTERN

       The function regcomp() is called to  compile  a  pattern
       into  an internal form. The pattern is a C string termi-
       nated by a binary zero, and is passed  in  the  argument
       pattern.  The  preg  argument  is a pointer to a regex_t
       structure that is used as a base for storing information
       about the compiled expression.

       The  argument  cflags is either zero, or contains one or
       more of the bits defined by the following macros:

         REG_ICASE

       The PCRE_CASELESS option is set when the  expression  is
       passed for compilation to the native function.

         REG_NEWLINE

       The  PCRE_MULTILINE option is set when the expression is
       passed for compilation to the native function. Note that
       this  does  not  mimic  the  defined POSIX behaviour for
       REG_NEWLINE (see the following section).

       In the absence of these flags, no options are passed  to
       the  native  function.  This means the the regex is com-
       piled with PCRE default semantics.  In  particular,  the
       way  it handles newline characters in the subject string
       is the Perl way, not the POSIX way.  Note  that  setting
       PCRE_MULTILINE  has  only  some of the effects specified
       for REG_NEWLINE. It does not affect the way newlines are
       matched  by  . (they aren't) or by a negative class such
       as [^a] (they are).

       The yield of regcomp() is zero on success, and  non-zero
       otherwise.  The  preg structure is filled in on success,
       and one member of the structure is public: re_nsub  con-
       tains the number of capturing subpatterns in the regular
       expression. Various  error  codes  are  defined  in  the
       header file.

MATCHING NEWLINE CHARACTERS

       This  area  is  not  simple, because POSIX and Perl take
       different views of things.  It is not  possible  to  get
       PCRE  to  obey  POSIX semantics, but then PCRE was never
       intended to be a POSIX engine. The following table lists
       the different possibilities for matching newline charac-
       ters in PCRE:

                                 Default   Change with

         . matches newline          no     PCRE_DOTALL
         newline matches [^a]       yes    not changeable
         $ matches \n at end        yes    PCRE_DOLLARENDONLY
         $ matches \n in middle     no     PCRE_MULTILINE
         ^ matches \n in middle     no     PCRE_MULTILINE

       This is the equivalent table for POSIX:

                                 Default   Change with

         . matches newline          yes    REG_NEWLINE
         newline matches [^a]       yes    REG_NEWLINE
         $ matches \n at end        no     REG_NEWLINE
         $ matches \n in middle     no     REG_NEWLINE
         ^ matches \n in middle     no     REG_NEWLINE

       PCRE's behaviour is the  same  as  Perl's,  except  that
       there  is no equivalent for PCRE_DOLLAR_ENDONLY in Perl.
       In both PCRE and Perl, there is no way to  stop  newline
       from matching [^a].

       The  default  POSIX  newline handling can be obtained by
       setting PCRE_DOTALL and PCRE_DOLLAR_ENDONLY,  but  there
       is  no  way  to  make  PCRE  behave  exactly  as for the
       REG_NEWLINE action.

MATCHING A PATTERN

       The function regexec() is called  to  match  a  compiled
       pattern preg against a given string, which is terminated
       by a zero byte, subject to the options in eflags.  These
       can be:

         REG_NOTBOL

       The  PCRE_NOTBOL option is set when calling the underly-
       ing PCRE matching function.

         REG_NOTEOL

       The PCRE_NOTEOL option is set when calling the  underly-
       ing PCRE matching function.

       The portion of the string that was matched, and also any
       captured substrings, are returned via the  pmatch  argu-
       ment,  which  points to an array of nmatch structures of
       type regmatch_t, containing the members rm_so and rm_eo.
       These  contain the offset to the first character of each
       substring and the offset to the  first  character  after
       the end of each substring, respectively. The 0th element
       of the vector relates to the entire  portion  of  string
       that was matched; subsequent elements relate to the cap-
       turing subpatterns of  the  regular  expression.  Unused
       entries  in the array have both structure members set to
       -1.

       A successful match yields a zero return;  various  error
       codes   are   defined  in  the  header  file,  of  which
       REG_NOMATCH is the "expected" failure code.

ERROR MESSAGES

       The regerror() function maps a non-zero  errorcode  from
       either regcomp() or regexec() to a printable message. If
       preg is not NULL, the error should have arisen from  the
       use  of that structure. A message terminated by a binary
       zero is placed in errbuf. The  length  of  the  message,
       including the zero, is limited to errbuf_size. The yield
       of the function is the size of buffer needed to hold the
       whole message.

MEMORY USAGE

       Compiling a regular expression causes memory to be allo-
       cated and associated with the preg structure. The  func-
       tion  regfree()  frees all such memory, after which preg
       may no longer be used as a compiled expression.

AUTHOR

       Philip Hazel <ph10@cam.ac.uk>
       University Computing Service,
       Cambridge CB2 3QG, England.

Last updated: 07 September 2004
Copyright (c) 1997-2004 University of Cambridge.



                                                        PCRE(3)
