view doc/misc/html/examples/ffcex.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line source

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Four Function Calculator</TITLE>


</HEAD>

<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
 TEXT="#000000" LINK="#0033CC"
 VLINK="#CC0033" ALINK="#CC0099">

<P>
<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
         WIDTH=124 HEIGHT=30>

<BR CLEAR="all">

Back to <A HREF="../index.html">Index</A>
<P>
<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
</P>


<H2>Four Function Calculator:<BR>An Annotated AnaGram Example</H2>
<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >


<P>The following example is a complete program: The output produced by AnaGram
from this example can be compiled, linked and run without any support modules
other than the standard run-time library provided by any C compiler. In the
interest of brevity, the example has been kept very simple.
</P>
<P>FFCALC.SYN implements a simple four function calculator which reads its
input from stdin. The calculator has 52 registers, labeled 'a' through 'z' and
'A' through 'Z'. FFCALC evaluates arithmetic expressions and assignment
statements and prints the results to stdout. The expressions may contain '+',
'-', '*', and '/' operators as well as parentheses. In addition, FFCALC supports
the free use of white space and C style comments in the input. It also contains
complete error handling, including syntax error diagnostics and
<A HREF="../gloss.html#Resynchronization">resynchronization</A> after syntax errors.
</P>
<P> <STRONG>For purposes of annotation, line numbers have been inserted at the left
margin.</STRONG>  The line numbers are not part of the AnaGram syntax.
Immediately following the example are some brief explanatory notes keyed to the
line numbers.
</P>
<PRE>
<A HREF="#Note1" NAME="Line1">Line  1:</A>  {/*   FOUR FUNCTION CALCULATOR: FFCALC.SYN           */}

Line  2:  // -- CONFIGURATION SECTION ----------------------------
<A HREF="#Note3" NAME="Line3">Line  3:</A>  [
<A HREF="#Note4" NAME="Line4">Line  4:</A>    default token type = double
<A HREF="#Note5" NAME="Line5">Line  5:</A>    disregard white space
<A HREF="#Note6" NAME="Line6">Line  6:</A>    lexeme { real}
<A HREF="#Note7" NAME="Line7">Line  7:</A>        // You could specify traditional engine here
Line  8:  ]

<A HREF="#Note9" NAME="Line9">Line  9:</A>  // -- FOUR FUNCTION CALCULATOR -------------------------
<A HREF="#Note10" NAME="Line10">Line 10:</A>  (void) calculator $
<A HREF="#Note11" NAME="Line11">Line 11:</A>   -&gt; [calculation?, '\n']..., eof

Line 12:  (void) calculation
<A HREF="#Note13" NAME="Line13">Line 13:</A>   -&gt; expression:x                      =printf("%g\n",x);
<A HREF="#Note14" NAME="Line14">Line 14:</A>   -&gt; name:n, '=', expression:x                         ={
Line 15:                    printf("%c = %g\n",n+'A',value[n]=x);}
<A HREF="#Note16" NAME="Line16">Line 16:</A>   -&gt; error

Line 17:  expression
<A HREF="#Note18" NAME="Line18">Line 18:</A>   -&gt; term
<A HREF="#Note19" NAME="Line19">Line 19:</A>   -&gt; expression:x, '+', term:t                     = x+t;
Line 20:   -&gt; expression:x, '-', term:t                     = x-t;

Line 21:  term
Line 22:   -&gt; factor
Line 23:   -&gt; term:t, '*', factor:f                         = t*f;
<A HREF="#Note24" NAME="Line24">Line 24:</A>   -&gt; term:t, '/', factor:f                         = t/f;

Line 25:  factor
Line 26:   -&gt; name:n                                   = value[n];
Line 27:   -&gt; real
Line 28:   -&gt; '(', expression:x, ')'                          = x;
Line 29:   -&gt; '-', factor:f                                  = -f;

Line 30:  // -- LEXICAL UNITS ------------------------------------
<A HREF="#Note31" NAME="Line31">Line 31:</A>  digit   = '0-9'
<A HREF="#Note32" NAME="Line32">Line 32:</A>  eof     = -1

<A HREF="#Note33" NAME="Line33">Line 33:</A>  (void) white space
<A HREF="#Note34" NAME="Line34">Line 34:</A>   -&gt; ' ' + '\t' + '\r' + '\f' + '\v'
<A HREF="#Note35" NAME="Line35">Line 35:</A>   -&gt; "/*", ~eof?..., "*/"              // C style comment

<A HREF="#Note36" NAME="Line36">Line 36:</A>  (int) name
<A HREF="#Note37" NAME="Line37">Line 37:</A>   -&gt; 'a-z' + 'A-Z':c                             = c-'A';

<A NAME="Line38">Line 38:</A>  real
Line 39:   -&gt; integer part:i, '.', fraction part:f          = i+f;
Line 40:   -&gt; integer part, '.'?
Line 41:   -&gt; '.', fraction part:f                            = f;

Line 42:  integer part
Line 43:   -&gt; digit:d                                     = d-'0';
Line 44:   -&gt; integer part:x, digit:d              = 10*x + d-'0';

Line 45:  fraction part
Line 46:   -&gt; digit:d                               = (d-'0')/10.;
Line 47:   -&gt; digit:d, fraction part:f          = (d-'0' + f)/10.;

Line 48:  { /* -- EMBEDDED C ---------------------------------- */
Line 49:    double value[64];                      /* registers */
<A HREF="#Note50" NAME="Line50">Line 50:</A>    void main(void) {
<A HREF="#Note51" NAME="Line51">Line 51:</A>      ffcalc();
Line 52:    }
Line 53:  } // -- END OF EMBEDDED C ------------------------------
</PRE>
<H3>Notes to example</H3>
<P>General note: When an AnaGram <A HREF="../gloss.html#Grammar">grammar</A> is written to use direct character
input, the <A HREF="../gloss.html#Terminal">terminal tokens</A> are written as <A HREF="../gloss.html#CharacterSets">character sets</A>. A single character is
construed to be the set consisting only of the character itself. Otherwise
character sets can be defined by ranges, e.g., 'a-z', or by set expressions
using +, -, &amp;, or ~ to represent <A HREF="../gloss.html#SetUnion">union</A>,
<A HREF="../gloss.html#SetDifference">difference</A>, <A HREF="../gloss.html#SetIntersection">intersection</A>, or
<A HREF="../gloss.html#SetComplement">complement</A> respectively. If the sets used in the grammar are not pairwise
disjoint, and they seldom are, AnaGram calculates a disjoint covering of the
<A HREF="../gloss.html#Universe">character universe</A>, and extends the grammar appropriately. The semantic value of
a terminal token is the ascii character code, so that semantic distinctions may
still be made even when characters are syntactically equivalent.
</P>
<P><A HREF="#Line1" NAME="Note1">Line 1.</A> Braces { } are used
to denote embedded C or C++ code that should be passed unchanged to the <A HREF="../gloss.html#Parser">parser</A>.
Embedded C at the very beginning of the syntax file is placed at the beginning
of the parser file. All other embedded C is placed following a set of
definitions and declarations AnaGram needs for the code it generates. AnaGram
saves up all the <A HREF="../gloss.html#ReductionProcedure">reduction procedures</A>, or semantic actions, and places them
after all the embedded C.
</P>
<P><A HREF="#Line3" NAME="Note3">Line 3.</A> Brackets [ ] are used to denote
configuration sections. Configuration sections contain settings for
configuration parameters and switches, and a number of attribute statements that
provide metasyntactic information.
</P>
<P><A HREF="#Line4" NAME="Note4">Line 4.</A> This statement sets the default
token type for <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A> to double. The default value for "default
token type" is void. You can override the type for a particular token using
an explicit cast. See
<A HREF="#Line10">line 10.</A> The default type for <A HREF="../gloss.html#Terminal">terminal tokens</A> is int.
AnaGram uses the token type declarations to set up calls and definitions of
<A HREF="../gloss.html#ReductionProcedure">reduction procedures</A> and also to set up the parser value stack.
</P>
<P><A HREF="#Line5" NAME="Note5">Line 5.</A> The disregard statement tells
AnaGram to extend the <A HREF="../gloss.html#Grammar">grammar</A> so that the generated <A HREF="../gloss.html#Parser">parser</A> will skip all
instances of white space which are not contained within lexemes. "White
space" is a token defined at <A HREF="#Line33">line 33</A>. There is
nothing magic about the name. Any other name could have been used.
</P>
<P><A HREF="#Line6" NAME="Note6">Line 6.</A> The lexeme statement identifies a
list of <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A> within which the "disregard" statement is
inoperative. real is defined at <A HREF="#Line38">line 38</A>.
</P>
<P><A HREF="#Line7" NAME="Note7">Line 7.</A> "traditional engine" is
a configuration switch. Simply asserting it turns it on. You can also write:
traditional engine = ON. To turn off a switch use ~: thus ~traditional engine
would guarantee the switch is off, whatever its default value. Alternatively set
traditional engine = OFF.
</P>
<P>AnaGram <A HREF="../gloss.html#Parser">parsers</A> normally use a parsing engine with more than the standard
four parsing <A HREF="../gloss.html#ParserAction">actions</A>: shift, reduce, error and accept. The extra actions are
compound actions. The result of using these actions is to speed up the parser
and to reduce the size of the state table by about fifty per cent. The
traditional engine switch turns this optimization off, so the parser will only
use the four traditional actions. This is usually only done for clarity when
using the File Trace or Grammar Trace options described below.
</P>
<P><A HREF="#Line9" NAME="Note9">Line 9. </A>AnaGram supports both C and C++
style comments. Nesting of C comments is controlled by the "nest comments"
switch.
</P>
<P><A HREF="#Line10" NAME="Note10">Line 10.</A> An explicit cast can be used
to override the token type for <A HREF="../gloss.html#Nonterminal">nonterminal tokens</A>. Types can be just about any C
or C++ type, including template types. Basically, the only exceptions are types
containing <CODE>( )</CODE> or <CODE>[ ]</CODE>.
</P>
<P>The simplest way to specify the <A HREF="../gloss.html#GrammarToken">goal token</A> for a <A HREF="../gloss.html#Grammar">grammar</A> is to mark it with
a dollar sign. You can also simply name it "grammar", or set the "grammar
token" parameter in a configuration section.
</P>
<P><A HREF="#Line11" NAME="Note11">Line 11.</A> For "<CODE>-&gt;</CODE>"
  read "produces".
A question mark following a token name makes it optional. Tokens in a rule
are separated by commas. Multiple rules with the same left side can also be
separated with the vertical bar, '<CODE>|</CODE>'.
</P>
<P>The rules for character constants are the same as for C. Brackets []
indicate the rule is optional. Braces { } would be used if the rule were not
optional. Brackets and braces can include multiple rules separated by |. The
ellipsis ... indicates unlimited repetition. These constructs are referred to as
<A HREF="../gloss.html#VirtualProduction">"virtual productions"</A>. eof is defined at <A HREF="#Line32">line 32</A>.

</P>
<P><A HREF="#Line10">Lines 10 and 11</A> taken together specify that this
<A HREF="../gloss.html#Grammar">grammar</A> describes a possibly empty sequence of lines terminated with an eof
character. Each line contains an optional "calculation" followed by a
newline character.
</P>
<P><A HREF="#Line13" NAME="Note13">Line 13</A>. To assign the value of a token
(stored on the parser value stack) to a c variable for use in a semantic action,
or <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A>, simply follow the token name with a colon and the name
of the variable.
</P>
<P>Short form reduction procedures are simple C or C++ expressions terminated
with a semicolon. They cannot include a newline character. The name of the C
variable is local to this particular procedure. Normally the value of the
reduction procedure is assigned to the token on the left side of the <A HREF="../gloss.html#Production">production</A>.
In this case, since calculation is of type "void", the result of the
printf call is discarded.
</P>
<P><A HREF="#Line14" NAME="Note14">Line 14.</A> When <A HREF="../gloss.html#ReductionProcedure">reduction procedures</A>
won't fit on a single line or are more complex than a single expression, they
can be enclosed in braces { }. Use a return statement to return a value.
</P>
<P><A HREF="#Line16" NAME="Note16">Line 16.</A> The error token can be used
to <A HREF="../gloss.html#Resynchronization">resynchronize</A> a parser after
encountering a syntax error. It works more or
less like the error token in YACC. In this case it matches any portion of a "calculation"
up to a syntax error and then everything up to the next newline, as determined
by the <A HREF="../gloss.html#Production">production</A> on <A HREF="#Line11">line 11</A>. AnaGram also provides an
alternative form of error continuation called "automatic resynchronization"
which uses a heuristic approach derived from the <A HREF="../gloss.html#Grammar">grammar</A>. By default, AnaGram
<A HREF="../gloss.html#Parser">parsers</A> provide syntax error diagnostics. The user may provide his own if he
wishes.
</P>
<P><A HREF="#Line18" NAME="Note18">Line 18.</A> If a <A HREF="../gloss.html#GrammarRule">grammar rule</A> does not
have a <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A>, the value of the first token in the rule is assigned
to the token on the left side of the <A HREF="../gloss.html#Production">production</A>.
</P>
<P><A HREF="#Line19" NAME="Note19">Line 19.</A> Since the default type
specification given on <A HREF="#Line4">line 4</A> was "double", x
and t have type double, and the <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A> returns their sum, also
double.
</P>
<P><A HREF="#Line24" NAME="Note24">Line 24.</A> Note that in the interest of
simplicity, this <A HREF="../gloss.html#ReductionProcedure">reduction procedure</A> omits any provision for divide by zero
errors.
</P>
<P><A HREF="#Line31" NAME="Note31">Line 31.</A> Definition statements may be
used to provide shorthand names. '0-9' is a character range, as discussed above.

</P>
<P><A HREF="#Line32" NAME="Note32">Line 32. </A> Input characters can also be
defined using decimal, octal or hex notation. They are not limited to any
particular range, so that it is possible to define the end of file token as the
standard stream I/O end of file value.
</P>
<P><A HREF="#Line33" NAME="Note33">Line 33.</A> Note that AnaGram permits
embedded blanks in token names.
</P>
<P><A HREF="#Line34" NAME="Note34">Line 34.</A> The set consisting of blank,
tab, return, form feed or vertical tab.
</P>
<P><A HREF="#Line35" NAME="Note35">Line 35.</A> Keywords are strings of
characters enclosed in double quotes. Standard C rules apply for literal
strings. Keywords stand outside the character space and are recognized in
preference to individual characters.
</P>
<P>
~ indicates the <A HREF="../gloss.html#SetComplement">complement</A> of a character set, so that ~eof is any character
except end of file. The <A HREF="../gloss.html#Universe">character universe</A> is the set of characters on the range
0..255 unless there are characters outside this range, in which case it is
extended to the smallest contiguous range which includes the outside characters.
 ?... allows zero or more comment characters. This rule describes a standard C
comment (no nesting allowed).
</P>
<P><A HREF="#Line36" NAME="Note36">Line 36.</A> The value of name is an int,
an index into the value table.
</P>
<P><A HREF="#Line37" NAME="Note37">Line 37.</A> The '+' is <A HREF="../gloss.html#SetUnion">set union</A>.
Therefore c is any alphabetic character.
</P>
<P><A HREF="#Line50" NAME="Note50">Line 50.</A> If you don't have any embedded
C in your syntax file, AnaGram will create a main program automatically. Since
there was already embedded C at line 1, AnaGram won't automatically create a
main program, so we need to define one explicitly.
</P>
<P><A HREF="#Line51" NAME="Note51">Line 51.</A> The default function name for
the <A HREF="../gloss.html#Parser">parser</A> is taken from the file name, in lower case. There is a configuration
parameter available to set it to something else if necessary. Lacking any
contrary specification, the parser will read its input from stdin.
</P>


<P>
<BR>

<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
<P>
<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
                WIDTH=181 HEIGHT=25>
<BR CLEAR="right">
<P>

Back to <A HREF="../index.html">Index</A>
<P>
<ADDRESS><FONT SIZE="-1">
                  AnaGram parser generator - examples<BR>
                  Annotated four function calculator<BR>
                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
                  All Rights Reserved.<BR>
</FONT></ADDRESS>

</BODY>
</HTML>