Mercurial > ~dholland > hg > ag > index.cgi
view doc/misc/html/examples/ffcex.html @ 24:a4899cdfc2d6 default tip
Obfuscate the regexps to strip off the IBM compiler's copyright banners.
I don't want bots scanning github to think they're real copyright
notices because that could cause real problems.
author | David A. Holland |
---|---|
date | Mon, 13 Jun 2022 00:40:23 -0400 |
parents | 13d2b8934445 |
children |
line wrap: on
line source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>Four Function Calculator</TITLE> </HEAD> <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" TEXT="#000000" LINK="#0033CC" VLINK="#CC0033" ALINK="#CC0099"> <P> <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" WIDTH=124 HEIGHT=30> <BR CLEAR="all"> Back to <A HREF="../index.html">Index</A> <P> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > </P> <H2>Four Function Calculator:<BR>An Annotated AnaGram Example</H2> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P>The following example is a complete program: The output produced by AnaGram from this example can be compiled, linked and run without any support modules other than the standard run-time library provided by any C compiler. In the interest of brevity, the example has been kept very simple. </P> <P>FFCALC.SYN implements a simple four function calculator which reads its input from stdin. The calculator has 52 registers, labeled 'a' through 'z' and 'A' through 'Z'. FFCALC evaluates arithmetic expressions and assignment statements and prints the results to stdout. The expressions may contain '+', '-', '*', and '/' operators as well as parentheses. In addition, FFCALC supports the free use of white space and C style comments in the input. It also contains complete error handling, including syntax error diagnostics and <A HREF="../gloss.html#Resynchronization">resynchronization</A> after syntax errors. </P> <P> <STRONG>For purposes of annotation, line numbers have been inserted at the left margin.</STRONG> The line numbers are not part of the AnaGram syntax. Immediately following the example are some brief explanatory notes keyed to the line numbers. </P> <PRE> <A HREF="#Note1" NAME="Line1">Line 1:</A> {/* FOUR FUNCTION CALCULATOR: FFCALC.SYN */} Line 2: // -- CONFIGURATION SECTION ---------------------------- <A HREF="#Note3" NAME="Line3">Line 3:</A> [ <A HREF="#Note4" NAME="Line4">Line 4:</A> default token type = double <A HREF="#Note5" NAME="Line5">Line 5:</A> disregard white space <A HREF="#Note6" NAME="Line6">Line 6:</A> lexeme { real} <A HREF="#Note7" NAME="Line7">Line 7:</A> // You could specify traditional engine here Line 8: ] <A HREF="#Note9" NAME="Line9">Line 9:</A> // -- FOUR FUNCTION CALCULATOR ------------------------- <A HREF="#Note10" NAME="Line10">Line 10:</A> (void) calculator $ <A HREF="#Note11" NAME="Line11">Line 11:</A> -> [calculation?, '\n']..., eof Line 12: (void) calculation <A HREF="#Note13" NAME="Line13">Line 13:</A> -> expression:x =printf("%g\n",x); <A HREF="#Note14" NAME="Line14">Line 14:</A> -> name:n, '=', expression:x ={ Line 15: printf("%c = %g\n",n+'A',value[n]=x);} <A HREF="#Note16" NAME="Line16">Line 16:</A> -> error Line 17: expression <A HREF="#Note18" NAME="Line18">Line 18:</A> -> term <A HREF="#Note19" NAME="Line19">Line 19:</A> -> expression:x, '+', term:t = x+t; Line 20: -> expression:x, '-', term:t = x-t; Line 21: term Line 22: -> factor Line 23: -> term:t, '*', factor:f = t*f; <A HREF="#Note24" NAME="Line24">Line 24:</A> -> term:t, '/', factor:f = t/f; Line 25: factor Line 26: -> name:n = value[n]; Line 27: -> real Line 28: -> '(', expression:x, ')' = x; Line 29: -> '-', factor:f = -f; Line 30: // -- LEXICAL UNITS ------------------------------------ <A HREF="#Note31" NAME="Line31">Line 31:</A> digit = '0-9' <A HREF="#Note32" NAME="Line32">Line 32:</A> eof = -1 <A HREF="#Note33" NAME="Line33">Line 33:</A> (void) white space <A HREF="#Note34" NAME="Line34">Line 34:</A> -> ' ' + '\t' + '\r' + '\f' + '\v' <A HREF="#Note35" NAME="Line35">Line 35:</A> -> "/*", ~eof?..., "*/" // C style comment <A HREF="#Note36" NAME="Line36">Line 36:</A> (int) name <A HREF="#Note37" NAME="Line37">Line 37:</A> -> 'a-z' + 'A-Z':c = c-'A'; <A NAME="Line38">Line 38:</A> real Line 39: -> integer part:i, '.', fraction part:f = i+f; Line 40: -> integer part, '.'? Line 41: -> '.', fraction part:f = f; Line 42: integer part Line 43: -> digit:d = d-'0'; Line 44: -> integer part:x, digit:d = 10*x + d-'0'; Line 45: fraction part Line 46: -> digit:d = (d-'0')/10.; Line 47: -> digit:d, fraction part:f = (d-'0' + f)/10.; Line 48: { /* -- EMBEDDED C ---------------------------------- */ Line 49: double value[64]; /* registers */ <A HREF="#Note50" NAME="Line50">Line 50:</A> void main(void) { <A HREF="#Note51" NAME="Line51">Line 51:</A> ffcalc(); Line 52: } Line 53: } // -- END OF EMBEDDED C ------------------------------ </PRE> <H3>Notes to example</H3> <P>General note: When an AnaGram <A HREF="../gloss.html#Grammar">grammar</A> is written to use direct character input, the <A HREF="../gloss.html#Terminal">terminal tokens</A> are written as <A HREF="../gloss.html#CharacterSets">character sets</A>. A single character is construed to be the set consisting only of the character itself. Otherwise character sets can be defined by ranges, e.g., 'a-z', or by set expressions using +, -, &, or ~ to represent <A HREF="../gloss.html#SetUnion">union</A>, <A HREF="../gloss.html#SetDifference">difference</A>, <A HREF="../gloss.html#SetIntersection">intersection</A>, or <A HREF="../gloss.html#SetComplement">complement</A> respectively. If the sets used in the grammar are not pairwise disjoint, and they seldom are, AnaGram calculates a disjoint covering of the <A HREF="../gloss.html#Universe">character universe</A>, and extends the grammar appropriately. The semantic value of a terminal token is the ascii character code, so that semantic distinctions may still be made even when characters are syntactically equivalent. </P> <P><A HREF="#Line1" NAME="Note1">Line 1.</A> Braces { } are used to denote embedded C or C++ code that should be passed unchanged to the <A HREF="../gloss.html#Parser">parser</A>. Embedded C at the very beginning of the syntax file is placed at the beginning of the parser file. All other embedded C is placed following a set of definitions and declarations AnaGram needs for the code it generates. AnaGram saves up all the <A HREF="../gloss.html#ReductionProcedure">reduction procedures</A>, or semantic actions, and places them after all the embedded C. </P> <P><A HREF="#Line3" NAME="Note3">Line 3.</A> Brackets [ ] are used to denote configuration sections. Configuration sections contain settings for configuration parameters and switches, and a number of attribute statements that provide metasyntactic information. </P> <P><A HREF="#Line4" NAME="Note4">Line 4.</A> This statement sets the default token type for <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A> to double. The default value for "default token type" is void. You can override the type for a particular token using an explicit cast. See <A HREF="#Line10">line 10.</A> The default type for <A HREF="../gloss.html#Terminal">terminal tokens</A> is int. AnaGram uses the token type declarations to set up calls and definitions of <A HREF="../gloss.html#ReductionProcedure">reduction procedures</A> and also to set up the parser value stack. </P> <P><A HREF="#Line5" NAME="Note5">Line 5.</A> The disregard statement tells AnaGram to extend the <A HREF="../gloss.html#Grammar">grammar</A> so that the generated <A HREF="../gloss.html#Parser">parser</A> will skip all instances of white space which are not contained within lexemes. "White space" is a token defined at <A HREF="#Line33">line 33</A>. There is nothing magic about the name. Any other name could have been used. </P> <P><A HREF="#Line6" NAME="Note6">Line 6.</A> The lexeme statement identifies a list of <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A> within which the "disregard" statement is inoperative. real is defined at <A HREF="#Line38">line 38</A>. </P> <P><A HREF="#Line7" NAME="Note7">Line 7.</A> "traditional engine" is a configuration switch. Simply asserting it turns it on. You can also write: traditional engine = ON. To turn off a switch use ~: thus ~traditional engine would guarantee the switch is off, whatever its default value. Alternatively set traditional engine = OFF. </P> <P>AnaGram <A HREF="../gloss.html#Parser">parsers</A> normally use a parsing engine with more than the standard four parsing <A HREF="../gloss.html#ParserAction">actions</A>: shift, reduce, error and accept. The extra actions are compound actions. The result of using these actions is to speed up the parser and to reduce the size of the state table by about fifty per cent. The traditional engine switch turns this optimization off, so the parser will only use the four traditional actions. This is usually only done for clarity when using the File Trace or Grammar Trace options described below. </P> <P><A HREF="#Line9" NAME="Note9">Line 9. </A>AnaGram supports both C and C++ style comments. Nesting of C comments is controlled by the "nest comments" switch. </P> <P><A HREF="#Line10" NAME="Note10">Line 10.</A> An explicit cast can be used to override the token type for <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A>. Types can be just about any C or C++ type, including template types. Basically, the only exceptions are types containing <CODE>( )</CODE> or <CODE>[ ]</CODE>. </P> <P>The simplest way to specify the <A HREF="../gloss.html#GrammarToken">goal token</A> for a <A HREF="../gloss.html#Grammar">grammar</A> is to mark it with a dollar sign. You can also simply name it "grammar", or set the "grammar token" parameter in a configuration section. </P> <P><A HREF="#Line11" NAME="Note11">Line 11.</A> For "<CODE>-></CODE>" read "produces". A question mark following a token name makes it optional. Tokens in a rule are separated by commas. Multiple rules with the same left side can also be separated with the vertical bar, '<CODE>|</CODE>'. </P> <P>The rules for character constants are the same as for C. Brackets [] indicate the rule is optional. Braces { } would be used if the rule were not optional. Brackets and braces can include multiple rules separated by |. The ellipsis ... indicates unlimited repetition. These constructs are referred to as <A HREF="../gloss.html#VirtualProduction">"virtual productions"</A>. eof is defined at <A HREF="#Line32">line 32</A>. </P> <P><A HREF="#Line10">Lines 10 and 11</A> taken together specify that this <A HREF="../gloss.html#Grammar">grammar</A> describes a possibly empty sequence of lines terminated with an eof character. Each line contains an optional "calculation" followed by a newline character. </P> <P><A HREF="#Line13" NAME="Note13">Line 13</A>. To assign the value of a token (stored on the parser value stack) to a c variable for use in a semantic action, or <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A>, simply follow the token name with a colon and the name of the variable. </P> <P>Short form reduction procedures are simple C or C++ expressions terminated with a semicolon. They cannot include a newline character. The name of the C variable is local to this particular procedure. Normally the value of the reduction procedure is assigned to the token on the left side of the <A HREF="../gloss.html#Production">production</A>. In this case, since calculation is of type "void", the result of the printf call is discarded. </P> <P><A HREF="#Line14" NAME="Note14">Line 14.</A> When <A HREF="../gloss.html#ReductionProcedure">reduction procedures</A> won't fit on a single line or are more complex than a single expression, they can be enclosed in braces { }. Use a return statement to return a value. </P> <P><A HREF="#Line16" NAME="Note16">Line 16.</A> The error token can be used to <A HREF="../gloss.html#Resynchronization">resynchronize</A> a parser after encountering a syntax error. It works more or less like the error token in YACC. In this case it matches any portion of a "calculation" up to a syntax error and then everything up to the next newline, as determined by the <A HREF="../gloss.html#Production">production</A> on <A HREF="#Line11">line 11</A>. AnaGram also provides an alternative form of error continuation called "automatic resynchronization" which uses a heuristic approach derived from the <A HREF="../gloss.html#Grammar">grammar</A>. By default, AnaGram <A HREF="../gloss.html#Parser">parsers</A> provide syntax error diagnostics. The user may provide his own if he wishes. </P> <P><A HREF="#Line18" NAME="Note18">Line 18.</A> If a <A HREF="../gloss.html#GrammarRule">grammar rule</A> does not have a <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A>, the value of the first token in the rule is assigned to the token on the left side of the <A HREF="../gloss.html#Production">production</A>. </P> <P><A HREF="#Line19" NAME="Note19">Line 19.</A> Since the default type specification given on <A HREF="#Line4">line 4</A> was "double", x and t have type double, and the <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A> returns their sum, also double. </P> <P><A HREF="#Line24" NAME="Note24">Line 24.</A> Note that in the interest of simplicity, this <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A> omits any provision for divide by zero errors. </P> <P><A HREF="#Line31" NAME="Note31">Line 31.</A> Definition statements may be used to provide shorthand names. '0-9' is a character range, as discussed above. </P> <P><A HREF="#Line32" NAME="Note32">Line 32. </A> Input characters can also be defined using decimal, octal or hex notation. They are not limited to any particular range, so that it is possible to define the end of file token as the standard stream I/O end of file value. </P> <P><A HREF="#Line33" NAME="Note33">Line 33.</A> Note that AnaGram permits embedded blanks in token names. </P> <P><A HREF="#Line34" NAME="Note34">Line 34.</A> The set consisting of blank, tab, return, form feed or vertical tab. </P> <P><A HREF="#Line35" NAME="Note35">Line 35.</A> Keywords are strings of characters enclosed in double quotes. Standard C rules apply for literal strings. Keywords stand outside the character space and are recognized in preference to individual characters. </P> <P> ~ indicates the <A HREF="../gloss.html#SetComplement">complement</A> of a character set, so that ~eof is any character except end of file. The <A HREF="../gloss.html#Universe">character universe</A> is the set of characters on the range 0..255 unless there are characters outside this range, in which case it is extended to the smallest contiguous range which includes the outside characters. ?... allows zero or more comment characters. This rule describes a standard C comment (no nesting allowed). </P> <P><A HREF="#Line36" NAME="Note36">Line 36.</A> The value of name is an int, an index into the value table. </P> <P><A HREF="#Line37" NAME="Note37">Line 37.</A> The '+' is <A HREF="../gloss.html#SetUnion">set union</A>. Therefore c is any alphabetic character. </P> <P><A HREF="#Line50" NAME="Note50">Line 50.</A> If you don't have any embedded C in your syntax file, AnaGram will create a main program automatically. Since there was already embedded C at line 1, AnaGram won't automatically create a main program, so we need to define one explicitly. </P> <P><A HREF="#Line51" NAME="Note51">Line 51.</A> The default function name for the <A HREF="../gloss.html#Parser">parser</A> is taken from the file name, in lower case. There is a configuration parameter available to set it to something else if necessary. Lacking any contrary specification, the parser will read its input from stdin. </P> <P> <BR> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" WIDTH=181 HEIGHT=25> <BR CLEAR="right"> <P> Back to <A HREF="../index.html">Index</A> <P> <ADDRESS><FONT SIZE="-1"> AnaGram parser generator - examples<BR> Annotated four function calculator<BR> Copyright © 1993-1999, Parsifal Software. <BR> All Rights Reserved.<BR> </FONT></ADDRESS> </BODY> </HTML>