< Parrot Virtual Machine

Advanced PGE

We've already looked at some of the basics of parser constructing using PGE and NQP. In this chapter we are going to give a more in-depth look at some of the features of the grammar engine that we haven't seen yet. Some of these more advanced features, such as inline PIR code, assertions, function calls and built-in token types will make the life of a compiler designer much easier, but are not needed for most basic tasks.

regex, token and proto

A regex is a high-level matching operation that allows backtracking. A token is a low-level matching operation that does not allow backtracking. A proto is like a regex but allows multiple dispatch. Think of a proto declaration as being a prototype or signature that several functions can match.

Inline PIR Sections

PIR can be embedded directly into both PGE grammar files and NQP files. This is important to fill in some gaps that NQP cannot handle due to its limitations. It is also helpful to insert some active processing into a grammar sometimes, to be able to direct the parser in a more intelligent way.

In NQP, PIR code can be inlined using the PIR statement, followed by a quoted string of PIR code. This quoted string can be in the form of a perl-like "qw< ... >" type of quotation, if you think that looks better.

In PGE, inline PIR can be inserted using double-curly-brackets "{{ ... }}". Once in PIR mode, you can access the current match object by calling $Px = find_global "$/" (where $Px is any of the valid PIR registers where x is a number).

Built-In Token Types

PGE has basic default values of certain rules already defined to help with parsing. However, you can redefine these to be something else, if you don't like the default behavior.

Calling Functions

functions or subroutines are an integral part of modern programming practices. As such, support for them is part of the PAST system, and is relatively easy to implement. We're going to cover a little bit of necessary background information first, and then we will discuss how to put all the pieces together to create a system with usable subroutines.

return Described

In Parrot control flow, especially return operations from subroutines, are implemented as special control exceptions. The reason why it is done as an exception and not as a basic .return() PIR statement is a little bit complicated. Many languages allow for nested lexical scopes, where variables defined in an "inner" scope cannot be seen, accessed, or modified by statements in the "outer" scope. In most compilers, this behavior is enforced by the compiler directly, and is invisible when the code is converted to assembly and machine languages. However PIR is like an assembly language for the Parrot system, and it's not possible to hide things at that level. All local variables are local to the entire subroutine and cannot be localized to a single part of a subroutine. To implement nested scopes, Parrot instead uses nested subroutine

Returns and Return Values

Functions can be made to return a value use the "return" PAST.op type. The return system is based on a control exception. Exceptions, as we've discussed before, move control flow to a specified location called the "exception handler". In terms of a return exception, the handler is the code directly after the original function call. The return values (currently, the return PAST node only allows a single return value) are passed as exception data items and are retrieved by the control exception handler.

All of these details are generally hidden from the programmer, and you can treat a return PAST node exactly like you would expect. You pass a return value, if any, to the return PAST node. The current function ends and its scope is destroyed. Control flow returns to the calling function, and the return value from the function is made available.

Assertions

Repetition Counting with **

MetaSyntactic Assertions

You can call a function from within a rule using the <FUNC( )> format.

Non-Capturing Assertions

Use <. > form to create a match object that does not capture its contents.

Indirect Rules

A rule of the form <$ >, which can be a string or some other data, is converted into a regular expression and then run.

Character Classes

Rules of the form <[ ]> contain custom character classes. Rules with <-[ ]> are complimented character classes.

Built-in Assertions

  • <?before>, <!before>
  • <?after>, <!after>
  • <?same>, <!same>
  • <.ws>
  • <?at()>, <!at()>

Partial Matches

You can specify a partial match, a match which attempts to match as much as possible and never fails, with the <* > form.

Recursive Calls

You can recurse back into subrules of the current match rule using the <~~ > rule.

Resources

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.