Misplaced Pages

Raku rules: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 06:00, 29 September 2006 editHarmil (talk | contribs)8,207 editsm History: re-naming← Previous edit Revision as of 17:07, 11 October 2006 edit undoHarmil (talk | contribs)8,207 editsm Clarify intro so that it doesn't seem like rules are just regexesNext edit →
Line 1: Line 1:
'''Perl 6 rules''' are ]'s ] and ] facility. This is built-in to the language. Since Perl's pattern-matching constructs have exceeded the capabilities of ] regular expressions for some time, Perl 6 documentation will exclusively refer to them as ''regexes'', distancing the term from the formal definition. '''Perl 6 rules''' are ]'s ], ] and general-purpose ] facility, and are a core part of the language. Since Perl's pattern-matching constructs have exceeded the capabilities of ] regular expressions for some time, Perl 6 documentation will exclusively refer to them as ''regexes'', distancing the term from the formal definition.


Perl 6 provides a superset of ] features with respect to regexes, folding them into a larger framework called ''rules'' which provide the capabilities of a ], as well as acting as a ] with respect to their lexical scope.<ref>{{cite web | url=http://dev.perl.org/perl6/doc/design/syn/S05.html | title=Synopsis 5: Regexes and Rules | author=Wall, Larry | date=June 24, 2002 }}</ref> Rules are introduced with the <code>rule</code> keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the <code>regex</code> (or <code>rx</code>) keyword, or they can simply be used inline as regexps were in Perl 5 via the <code>m</code> (matching) or <code>s</code> (search and replace) operators. Perl 6 provides a superset of ] features with respect to regexes, folding them into a larger framework called ''rules'' which provide the capabilities of a ], as well as acting as a ] with respect to their lexical scope.<ref>{{cite web | url=http://dev.perl.org/perl6/doc/design/syn/S05.html | title=Synopsis 5: Regexes and Rules | author=Wall, Larry | date=June 24, 2002 }}</ref> Rules are introduced with the <code>rule</code> keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the <code>regex</code> (or <code>rx</code>) keyword, or they can simply be used inline as regexps were in Perl 5 via the <code>m</code> (matching) or <code>s</code> (search and replace) operators.
Line 37: Line 37:
;s: An operator form of anonymous regex which can be used to perform search-and-replace with arbitrary delimeters. ;s: An operator form of anonymous regex which can be used to perform search-and-replace with arbitrary delimeters.
;ss: Shorthand for s with the <code>:sigspace</code> modifier. ;ss: Shorthand for s with the <code>:sigspace</code> modifier.
;/.../: Simply placing a regex between slashes is shorthand for <code>m/.../</code>. ;<code>/.../</code>: Simply placing a regex between slashes is shorthand for <code>m/.../</code>.





Revision as of 17:07, 11 October 2006

Perl 6 rules are Perl 6's regular expression, pattern matching and general-purpose parsing facility, and are a core part of the language. Since Perl's pattern-matching constructs have exceeded the capabilities of formal regular expressions for some time, Perl 6 documentation will exclusively refer to them as regexes, distancing the term from the formal definition.

Perl 6 provides a superset of Perl 5 features with respect to regexes, folding them into a larger framework called rules which provide the capabilities of a parsing expression grammar, as well as acting as a closure with respect to their lexical scope. Rules are introduced with the rule keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the regex (or rx) keyword, or they can simply be used inline as regexps were in Perl 5 via the m (matching) or s (search and replace) operators.

History

In Apocalypse 5, Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with 'real' language".

Between late 2004 and mid-2005, a compiler for Perl 6 style rules was developed for the Parrot virtual machine called Parrot Grammar Engine (PGE) which was later re-named to the more generic, Parser Grammar Engine. PGE is a combination of runtime and compiler for Perl 6 style grammers that allows any parrot-based compiler to use these tools for parsing, and also to provide rules to their runtimes.

Changes from Perl 5

There are only six unchanged features from Perl 5's regexes:

  • Literals: word characters such as "A" and underscore will be matched literally.
  • Capturing: (...)
  • Alternatives: |
  • Backslash escape: \
  • Repetition quantifiers: *, +, and ?
  • Minimal matching suffix: *?, +?, ??

A few of the most powerful additions include:

  • The ability to reference rules using <rulename> to build up entire grammars.
  • A handful of commit operators that allow the programmer to control backtracking during matching.

The following changes greatly improve the readability of regexes

  • Simplified non-capturing groups: which are the same as Perl 5's: (?:...)
  • Simplified code assertions: <?{...}>
  • Perl 5's /x is now the default.

Implementation

Keywords

There are several keywords used in conjunction with Perl 6 rules:

regex
A named or anonymous regex which will ignore whitespace by default.
rule
A named or anonymous regex which implies the :ratchet and :sigspace modifiers.
token
A named or anonymous regex which implies the :ratchet modifier.
rx
An anonymous regex which can take arbitrary delimeters such as // where regex can only take braces.
m
An operator form of anonymous regex which can be used to perform matches with arbitrary delimeters.
ms
Shorthand for m with the :sigspace modifier.
s
An operator form of anonymous regex which can be used to perform search-and-replace with arbitrary delimeters.
ss
Shorthand for s with the :sigspace modifier.
/.../
Simply placing a regex between slashes is shorthand for m/.../.


Modifiers

Modifiers may be placed after any of the regex keywords, and before the delimeter. If a regex is named, the modifier comes after the name. Modifiers control the way regexes are parsed and how they behave. They are always introduced with a leading : character.

Some of the more important modifiers include:

  • :i or :ignorecase – Perform matching without respect to case.
  • :g or :global – Perform the match more than once on a given target string.
  • :s or :sigspace – Replace whitespace in the regex with a whitespace-matching rule, rather than simply ignoring it.
  • :Perl5 – Treat the regex as a Perl 5 regular expression.
  • :ratchet – Never perform backtracking in the rule.

Grammars

A grammar may be defined using the grammar operator. A grammar is essentially just a namespace for rules:

grammar Str::SprintfFormat {
 regex format_token { \%: <index>? <precision>? <modifier>? <directive> }
 token index { \d+ \$ }
 token precision { <flags>? <vector>? <precision_count> }
 token flags { <>+ }
 token precision_count { >\d* | \* ]?  ]? }
 token vector { \*? v }
 token modifier { ll | <> }
 token directive { <> }
}

This is the grammar used to define Perl's sprintf string formatting notation.

Outside of this namespace, you could use these rules like so:

if / <Str::SprintfFormat::format_token> / { ... }

A rule used in this way is actually identical to the invocation of a subroutine with the extra semantics and side-effects of pattern matching (e.g. rule invocations can be backtracked).

Examples

Here are some example rules in Perl 6:

rx { a  ( d | e ) f : g }
rx { ( ab* ) <{ $1.size % 2 == 0 }> }

That last is identical to:

rx { ( ab* ) }

References

  1. Wall, Larry (June 24, 2002). "Synopsis 5: Regexes and Rules".
  2. Wall, Larry (June 4, 2002). "Apocalypse 5: Pattern Matching".
Category: