PCRE(3)                                                 PCRE(3)





NAME
       PCRE - Perl-compatible regular expressions

DIFFERENCES BETWEEN PCRE AND PERL

       This document describes the differences in the ways that
       PCRE and Perl handle regular  expressions.  The  differ-
       ences described here are with respect to Perl 5.8.

       1.  PCRE  does  not  have full UTF-8 support. Details of
       what it does have are given in the section on UTF-8 sup-
       port in the main pcre page.

       2.  PCRE  does not allow repeat quantifiers on lookahead
       assertions. Perl permits them, but they do not mean what
       you  might  think. For example, (?!a){3} does not assert
       that the next three characters  are  not  "a".  It  just
       asserts  that the next character is not "a" three times.

       3. Capturing  subpatterns  that  occur  inside  negative
       lookahead  assertions  are counted, but their entries in
       the offsets vector are never set. Perl sets its  numeri-
       cal  variables  from  any such patterns that are matched
       before the assertion fails to match  something  (thereby
       succeeding),  but  only if the negative lookahead asser-
       tion contains just one branch.

       4. Though binary zero characters are  supported  in  the
       subject string, they are not allowed in a pattern string
       because it is passed as a normal C string, terminated by
       zero.  The escape sequence \0 can be used in the pattern
       to represent a binary zero.

       5. The following Perl  escape  sequences  are  not  sup-
       ported: \l, \u, \L, \U, and \N. In fact these are imple-
       mented by Perl's general  string-handling  and  are  not
       part of its pattern matching engine. If any of these are
       encountered by PCRE, an error is generated.

       6. The Perl escape sequences \p, \P,  and  \X  are  sup-
       ported  only  if  PCRE  is  built with Unicode character
       property support. The properties that can be tested with
       \p and \P are limited to the general category properties
       such as Lu and Nd.

       7. PCRE does support the \Q...\E escape for quoting sub-
       strings.  Characters in between are treated as literals.
       This is slightly different from Perl in that $ and @ are
       also  handled  as  literals  inside the quotes. In Perl,
       they cause variable interpolation (but  of  course  PCRE
       does not have variables). Note the following examples:

           Pattern            PCRE matches      Perl matches

           \Qabc$xyz\E        abc$xyz           abc followed by
       the
                                                  contents   of
       $xyz
           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz

       The  \Q...\E sequence is recognized both inside and out-
       side character classes.

       8. Fairly obviously, PCRE does not support the (?{code})
       and  (?p{code}) constructions. However, there is support
       for recursive patterns using the  non-Perl  items  (?R),
       (?number),  and (?P>name). Also, the PCRE "callout" fea-
       ture allows an external function  to  be  called  during
       pattern  matching. See the pcrecallout documentation for
       details.

       9. There are some differences that  are  concerned  with
       the  settings of captured strings when part of a pattern
       is repeated. For example,  matching  "aba"  against  the
       pattern  /^(a(b)?)+$/  in  Perl  leaves $2 unset, but in
       PCRE it is set to "b".

       10. PCRE provides some extensions to  the  Perl  regular
       expression facilities:

       (a)  Although  lookbehind  assertions  must  match fixed
       length strings, each alternative branch of a  lookbehind
       assertion  can  match a different length of string. Perl
       requires them all to have the same length.

       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE  is
       not  set,  the $ meta-character matches only at the very
       end of the string.

       (c) If PCRE_EXTRA is set, a backslash followed by a let-
       ter with no special meaning is faulted.

       (d) If PCRE_UNGREEDY is set, the greediness of the repe-
       tition quantifiers is inverted, that is, by default they
       are  not greedy, but if followed by a question mark they
       are.

       (e) PCRE_ANCHORED can be used at matching time to  force
       a  pattern  to be tried only at the first matching posi-
       tion in the subject string.

       (f) The  PCRE_NOTBOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,  and
       PCRE_NO_AUTO_CAPTURE  options  for  pcre_exec()  have no
       Perl equivalents.

       (g) The (?R), (?number), and (?P>name) constructs allows
       for  recursive  pattern matching (Perl can do this using
       the (?p{code}) construct, which PCRE cannot support.)

       (h) PCRE supports named capturing substrings, using  the
       Python syntax.

       (i) PCRE supports the possessive quantifier "++" syntax,
       taken from Sun's Java package.

       (j) The (R) condition, for testing recursion, is a  PCRE
       extension.

       (k) The callout facility is PCRE-specific.

       (l) The partial matching facility is PCRE-specific.

       (m)  Patterns  compiled by PCRE can be saved and re-used
       at a later time, even on different hosts that  have  the
       other endianness.

Last updated: 09 September 2004
Copyright (c) 1997-2004 University of Cambridge.



                                                        PCRE(3)
