Wednesday, September 8, 2010

Explain this regex to me

There are a bunch of Perl questions on Stackoverflow.com where the poster is trying to understand a particular regex.

Fortunately, there is YAPE::Regex::Explain:

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(
    qr/"(?>(?:(?>[^"\\]+)|\\.)*)"/
)->explain;

Output:

The regular expression:

(?-imsx:"(?>(?:(?>[^"\\]+)|\\.)*)")

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (?>                      match (and do not backtrack afterwards):
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      (?>                      match (and do not backtrack
                               afterwards):
----------------------------------------------------------------------
        [^"\\]+                  any character except: '"', '\\' (1
                                 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      \\                       '\'
----------------------------------------------------------------------
      .                        any character except \n
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

1 comment:

  1. thanks for mentioning this... :)

    ReplyDelete