PCRE(3)                                                 PCRE(3)





NAME
       PCRE - Perl-compatible regular expressions

PCRE BUILD-TIME OPTIONS

       This  document  describes  the optional features of PCRE
       that can be selected when the library is compiled.  They
       are all selected, or deselected, by providing options to
       the configure script that is run before  the  make  com-
       mand.  The complete list of options for configure (which
       includes the standard ones such as the selection of  the
       installation directory) can be obtained by running

         ./configure --help

       The  following  sections  describe certain options whose
       names begin with --enable or --disable.  These  settings
       specify  changes  to the defaults for the configure com-
       mand. Because of the way that configure works,  --enable
       and --disable always come in pairs, so the complementary
       option always exists as well, but as  it  specifies  the
       default, it is not described.

UTF-8 SUPPORT

       To  build PCRE with support for UTF-8 character strings,
       add

         --enable-utf8

       to the configure command. Of itself, this does not  make
       PCRE  treat  strings as UTF-8. As well as compiling PCRE
       with  this  option,  you  also  have  have  to  set  the
       PCRE_UTF8  option when you call the pcre_compile() func-
       tion.

UNICODE CHARACTER PROPERTY SUPPORT

       UTF-8 support allows PCRE to  process  character  values
       greater  than 255 in the strings that it handles. On its
       own, however, it does not  provide  any  facilities  for
       accessing the properties of such characters. If you want
       to be able to use the pattern escapes \P,  \p,  and  \X,
       which  refer  to  Unicode character properties, you must
       add

         --enable-unicode-properties

       to the configure command. This  implies  UTF-8  support,
       even if you have not explicitly requested it.

       Including  Unicode  property  support adds around 90K of
       tables to the PCRE library, approximately  doubling  its
       size.  Only  the  general category properties such as Lu
       and Nd are supported. Details are given in the  pcrepat-
       tern documentation.

CODE VALUE OF NEWLINE

       By  default,  PCRE treats character 10 (linefeed) as the
       newline character. This is the normal newline  character
       on  Unix-like systems. You can compile PCRE to use char-
       acter 13 (carriage return) instead by adding

         --enable-newline-is-cr

       to the configure command. For completeness there is also
       a --enable-newline-is-lf option, which explicitly speci-
       fies linefeed as the newline character.

BUILDING SHARED AND STATIC LIBRARIES

       The PCRE building process uses  libtool  to  build  both
       shared  and  static  Unix  libraries by default. You can
       suppress one of these by adding one of

         --disable-shared
         --disable-static

       to the configure command, as required.

POSIX MALLOC USAGE

       When PCRE is called through the POSIX interface (see the
       pcreposix  documentation), additional working storage is
       required for holding  the  pointers  to  capturing  sub-
       strings,  because  PCRE requires three integers per sub-
       string, whereas the POSIX interface provides  only  two.
       If the number of expected substrings is small, the wrap-
       per function uses space on the stack,  because  this  is
       faster  than  using  malloc() for each call. The default
       threshold above which the stack is no longer used is 10;
       it can be changed by adding a setting such as

         --with-posix-malloc-threshold=20

       to the configure command.

LIMITING PCRE RESOURCE USAGE

       Internally, PCRE has a function called match(), which it
       calls repeatedly (possibly recursively) when matching  a
       pattern. By controlling the maximum number of times this
       function may be called during a single  matching  opera-
       tion,  a  limit can be placed on the resources used by a
       single call to pcre_exec(). The limit can be changed  at
       run time, as described in the pcreapi documentation. The
       default is 10 million, but this can be changed by adding
       a setting such as

         --with-match-limit=500000

       to the configure command.

HANDLING VERY LARGE PATTERNS

       Within  a  compiled  pattern,  offset values are used to
       point from one part to another  (for  example,  from  an
       opening parenthesis to an alternation metacharacter). By
       default, two-byte values are  used  for  these  offsets,
       leading  to  a  maximum  size  for a compiled pattern of
       around 64K. This is sufficient to  handle  all  but  the
       most  gigantic  patterns.  Nevertheless,  some people do
       want to process enormous patterns, so it is possible  to
       compile  PCRE  to use three-byte or four-byte offsets by
       adding a setting such as

         --with-link-size=3

       to the configure command. The value given must be 2,  3,
       or  4.  Using longer offsets slows down the operation of
       PCRE because it has to load additional bytes  when  han-
       dling them.

       If  you  build  PCRE with an increased link size, test 2
       (and test 5 if you are using UTF-8) will fail.  Part  of
       the  output  of  these  tests is a representation of the
       compiled pattern, and this changes with the link size.

AVOIDING EXCESSIVE STACK USAGE

       PCRE implements backtracking while  matching  by  making
       recursive  calls to an internal function called match().
       In environments where the size of the stack is  limited,
       this  can  severely  limit  PCRE's  operation. (The Unix
       environment does not usually suffer from this  problem.)
       An  alternative  approach that uses memory from the heap
       to remember data, instead of  using  recursive  function
       calls,  has been implemented to work round this problem.
       If you want to build a version of PCRE that  works  this
       way, add

         --disable-stack-for-recursion

       to  the configure command. With this configuration, PCRE
       will use the pcre_stack_malloc and pcre_stack_free vari-
       ables  to  call  memory  management  functions. Separate
       functions are provided because the usage  is  very  pre-
       dictable: the block sizes requested are always the same,
       and the blocks are always  freed  in  reverse  order.  A
       calling  program  might  be  able to implement optimized
       functions that perform better than the standard malloc()
       and  free()  functions. PCRE runs noticeably more slowly
       when built in this way.

USING EBCDIC CODE

       PCRE assumes by default that it will run in an  environ-
       ment  where  the  character  code  is ASCII (or Unicode,
       which is a superset of ASCII).  PCRE  can,  however,  be
       compiled to run in an EBCDIC environment by adding

         --enable-ebcdic

       to the configure command.

Last updated: 09 September 2004
Copyright (c) 1997-2004 University of Cambridge.



                                                        PCRE(3)
