Next: , Previous: , Up: CL-Portable Perl-compatible Regular Expressions—CL-PPCRE   [Contents][Index]


4.1.2 Scanning

Function: create-scanner (re-string string) &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive

The function accepts most of the regex syntax of Perl 5.8 as described in man perlre including extended features like non-greedy repetitions, positive and negative look-ahead and look-behind assertions, "standalone" subexpressions, and conditional subpatterns.

The following Perl features are currently supported:

  • \t’, ‘\n’, ‘\r’, ‘\f’, ‘\a’, ‘\e’,
  • \033’ (octal character codes),
  • \x1B’ (hexadecimal character codes),
  • \c[’ (control characters),
  • \w’, ‘\W’, ‘\s’, ‘\S’, ‘\d’, ‘\D’, ‘\b’, ‘\B’, ‘\A’, ‘\Z’, and ‘\z
  • \Q’ and ‘\E
  • \p’ and ‘\P’ (named properties) but only the long form with braces is supported, i.e. ‘\p{Letter}’ and ‘\p{L}’ will work while ‘\pL’ won’t.

The following Perl features are (currently) not supported:

  • (?{ code }) and (??{ code }) because they obviously don’t make sense in Lisp.
  • \N{name}’ (named characters),
  • \x{263a}’ (wide hex characters),
  • \l’, ‘\u’, ‘\L’, and ‘\U’ because they’re actually not part of Perl’s regex syntax
  • \X’ (extended Unicode), and
  • \C’ (single character).
  • Posix character classes like ‘[[:alpha]]’. Use Unicode properties instead.
  • \G’ for Perl’s pos() because we don’t have it.
return values

scanner, register-names

re-string

Accepts a string which is a regular expression in Perl syntax and returns a closure which will scan strings for this regular expression.

*ALLOW-NAMED-REGISTERS*

The second value is only returned if ‘*ALLOW-NAMED-REGISTERS*’ is ‘true’.

REGISTER-NAMES

(return value) represents a list of strings mapping registers to their respective names; the first element stands for first register, the second element for second register, etc. You have to store this value if you want to map a register number to its name later as scanner doesn’t capture any information about register names. If a register isn’t named, it has ‘NIL’ as its name.

MODEs

The ‘mode’ keyword arguments are equivalent to the ‘imsx’ modifiers in Perl. The ‘destructive’ keyword will be ignored.

Parse Tree

Function: create-scanner (parse-tree t) &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive

This is similar to CREATE-SCANNER for regex strings above but accepts a parse tree as its first argument. A parse tree is an S-expression conforming to the following syntax:…

return values

scanner, register-names

Scan

Function: scan regex target-string &key start end

Searches the string target-string from start (which defaults to 0) to end (which defaults to the length of target-string) and tries to match regex. On success returns four values:

  • the start of the match,
  • the end of the match
  • array denoting the beginnings of register matches
  • array denoting the endings of register matches

On failure returns ‘NIL’.

return-values
  • match-start,
  • match-end,
  • reg-starts,
  • reg-ends

~scan-to-strings~

Function: scan-to-strings regex target-string &key start end sharedp
return values

match, regs

Like SCAN but returns substrings of target-string instead of positions, i.e. this function returns two values on success:

  • the whole match as a string
  • plus an array of substrings (or =NIL=s) corresponding to the matched registers. If sharedp is true, the substrings may share structure with target-string.

Next: Splitting and Replacing, Previous: CL-PPCRE Basic Operations, Up: CL-Portable Perl-compatible Regular Expressions—CL-PPCRE   [Contents][Index]