Mercurial > ~dholland > hg > ag > index.cgi
view doc/misc/html/summary.html @ 16:f9e4689b837d
Some minor updates for 15 years later.
author | David A. Holland |
---|---|
date | Tue, 31 May 2022 01:45:26 -0400 |
parents | 13d2b8934445 |
children |
line wrap: on
line source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>Summary of AnaGram Notation</TITLE> </HEAD> <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" TEXT="#000000" LINK="#0033CC" VLINK="#CC0033" ALINK="#CC0099"> <P> <IMG ALIGN="right" SRC="images/agrsl6c.gif" ALT="AnaGram" WIDTH=124 HEIGHT=30> <BR CLEAR="all"> Back to <A HREF="index.html">Index</A> <P> <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <BR CLEAR="all"> <H1 ALIGN="LEFT">Summary of AnaGram Notation</H1> <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > </P> <BR> The rules for using AnaGram are given in Chapters 8 and 9 of the AnaGram User's Guide. This page contains a brief summary. Section headings and terms which appear in <b> bold face</b> can be found in the online <b> Help Topics</b>. <P> <BR> <H2>Lexical Conventions</H2> AnaGram allows the free use of spaces, tabs and <b> comments</b>. Both C style and C++ style comments are allowed. Blank lines are allowed, but only <i>between</i> statements. <p> AnaGram statements may continue onto following lines as long as they are clearly incomplete. Normally this rule is satisfied by dangling punctuation or open parentheses, brackets, or braces. In no case can a statement continue over a blank line. <P> <BR> <H2>Names</H2> Symbol names must begin with a letter or underscore, and may contain letters, digits, or underscores. They may also contain embedded spaces, tabs, and comments. Any sequence of embedded space, however, is replaced by a single blank character. <p> The names <b> eof</b>, <b> error</b>, and <b> grammar</b> have special meanings. <P> <BR> <H2>Character Representations</H2> You may represent a character using the same rules as for <b> character constants</b> in C. You may also use signed integers, using either decimal, octal or hexadecimal formats, again following the rules for C. You may specify control characters using ^, e.g., ^C. <P> <BR> <H2>Character Ranges</H2> Character ranges may be specified either in the form 'a-z' or with two simple characters separated by "..", e.g, 32..255. <P> <BR> <H2>Character Sets</H2> Use the following operators for more complex character sets: <DL> <DT>Set <b> union</b></DT><DD>A + B</DD> <DT>Set <b> difference</b></DT><DD>A - B</DD> <DT>Set <b> intersection</b></DT><DD>A & B</DD> <DT>Set <b> complement</b></DT><DD>~A</DD> </DL> AnaGram interprets a single character to mean the set containing only the character itself. <P> <BR> <H2>Keywords</H2> A character string enclosed in double quotes is a <B>keyword</B>. The rules for writing keyword strings are the same as for literal strings in C. AnaGram parsers have special lookahead logic to recognize keywords, so that keywords get special treatment. They are <i>not</i> equivalent to the corresponding sequence of single characters. <P> <BR> <H2>Tokens</H2> The units of a grammar are called <A HREF="gloss.html#Token">tokens</A>. <A HREF="gloss.html#Terminal"> Terminal tokens </A>may be <b> character sets</b>, <b> keywords</b>, <b> immediate actions</b>, or <b> virtual productions</b>. <A HREF="gloss.html#Nonterminal"> Nonterminal tokens</A> are defined in terms of other tokens by means of productions. <P> <BR> <H2>Productions</H2> A <A HREF="gloss.html#Production">production</A> consists of one or more token names on the left, an arrow ( <CODE>-></CODE> ), and a <A HREF="gloss.html#GrammarRule"> grammar rule</A> on the right. A production with more than one name on the left is called a <A HREF="gloss.html#SemanticallyDetermined"> semantically determined production</A>. Additional productions with the same left side may be joined by using | or another arrow. The arrow, if used, must start a new line. <P> If the token on the left side of a production is called <i>grammar</i> or is tagged with a following dollar sign, it is taken to be the <A HREF="gloss.html#GrammarToken"> grammar token</A>, or goal token for the grammar. <P> The names on the left side of a production may be preceded by a type cast indicating the data type of the <A HREF="gloss.html#SemanticValue"> semantic value</A> of the named tokens. <P> A grammar rule is a sequence of <A HREF="gloss.html#RuleElement"> rule elements</A> joined by commas. The rule elements may be <b> character sets</b>, <b> keywords</b>, <b> token names</b>, <b> virtual productions</b>, or <b> immediate actions</b>. <P> A <A HREF="gloss.html#VirtualProduction">virtual production</A> is a token name or character set expression followed by ? or ?..., or a sequence of one or more rules, joined by vertical bars ( | ) , inside brackets or braces and optionally followed by an ellipsis (...). The ? indicates an optional token. Braces indicate a choice among the listed rules. Brackets indicate an optional choice. The ellipsis represents unlimited repetition. <P> <BR> <H2>Reduction Procedures</H2> A <A HREF="gloss.html#ReductionProcedure">reduction procedure</A> is a piece of C or C++ code following a grammar rule that is to be executed when the rule is recognized in the parser's input stream. Reduction procedures may be short form: a single expression followed by a semicolon, or long form: a block of code enclosed in braces. In either case they are preceded by an equal sign. Short form procedures may not continue onto another line. <P> Reduction procedures may access the <A HREF="gloss.html#SemanticValue"> semantic values</A> of tokens in the grammar rule to which they are attached. To each token whose value is needed append a colon and the variable name used for the token value in the reduction procedure. In a short form reduction procedure, the value of the expression is assigned to the <A HREF="gloss.html#ReductionToken"> reduction token</A>, the token on the left side of the production. In a long form procedure, use the return statement to assign a value to the token on the left side of the production. <p> An <b> immediate action</b> differs from a reduction procedure in that it may occur in the middle of a grammar rule. To distinguish it from a reduction procedure, it begins with an exclamation point rather than an equal sign. <P> <BR> <H2>Definitions</H2> You may assign names to frequently used character sets, virtual productions, keywords, or immediate actions by using a definition statement consisting of a name, an equal sign and the entity to be named. <P> <BR> <H2>Configuration Section</H2> A configuration section is a block of special statements enclosed in brackets. These are either <b> attribute statements</b> or assign values to <b> configuration parameters</b> or switches, all of which are described in on-line help windows. <P> <BR> <H2>Embedded C</H2> You may include C or C++ code to support your reduction procedures at any point in your grammar by enclosing it in braces. The beginning brace must be on a fresh line, and no other statement may follow on the same line as the terminating brace. A block of embedded C at the very beginning of a <b> syntax file</b> is called the <b> C prologue</b>. <P> <BR> <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <IMG ALIGN="right" SRC="images/pslrb6d.gif" ALT="Parsifal Software" WIDTH=181 HEIGHT=25> <BR CLEAR="right"> <P> Back to <A HREF="index.html">Index</A> <P> <ADDRESS><FONT SIZE="-1"> AnaGram parser generator - documentation<BR> Summary of AnaGram Notation<BR> Copyright © 1993-1999, Parsifal Software. <BR> All Rights Reserved.<BR> </FONT></ADDRESS> </BODY> </HTML>