Mercurial > ~dholland > hg > ag > index.cgi

diff doc/misc/html/examples/fc.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author: David A. Holland
date: Sat, 22 Dec 2007 17:52:45 -0500
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/misc/html/examples/fc.html	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,752 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE> Fahrenheit-Celsius Converter</TITLE>
+</HEAD>
+
+
+
+
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+ TEXT="#000000" LINK="#0033CC"
+ VLINK="#CC0033" ALINK="#CC0099">
+
+<P>
+<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
+         WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to <A HREF="../index.html">Index</A>
+<P>
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+
+<H1> Fahrenheit-Celsius Converter</H1>
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+<H2>Introduction</H2>
+<P>
+          Conversion of temperatures from Fahrenheit to Celsius is a
+          traditional starting point for learning how to use a
+          programming language. This directory contains a graded sequence
+          of Fahrenheit to Celsius conversion programs, starting with
+          a very simple case and working up to one of some complexity.
+          This sequence of programs illustrates an important aspect of
+          syntax directed programming: In contrast to conventional
+          programming methods it is quite easy to begin with a simple
+          case and then extend it to more complex situations.
+<P>
+          All of these programs accept input from <tt>stdin</tt> and write
+          output to <tt>stdout</tt>. These programs are somewhat exceptional,
+          since, except for FC5, they do not have any embedded C and
+          therefore do not require explicit definition of a main
+          program.
+<P>
+          <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
+          conversion programs. It expects the user to type a positive
+          integer value, assumed to be a Fahrenheit temperature, which
+          it converts to Celsius. It then exits.
+<P>
+          <tt>fc2</tt>, the next example, is a somewhat more interesting
+          Fahrenheit-Celsius converter. This time the input stream may
+          contain any number of temperatures, either Fahrenheit or
+          Celsius, each terminated by a newline character. The
+          program will continue until it encounters an end of file. If it
+          encounters a syntax error in the input, it will skip to the
+          next newline character and continue. <tt>fc2</tt> has been set up to
+          illustrate the usage of the File Trace feature of AnaGram.
+<P>
+          <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
+          floating point
+          arithmetic, so that it can deal with non-integral values,
+          and it allows optional white space in the input, except
+          within numbers. In addition, it changes the output format,
+          so that results are printed in degrees Kelvin, as well as in
+          Fahrenheit and Celsius.
+          <font size=-1>(Yes, we know that in Newspeak the official
+          usage is "Kelvins", not "degrees Kelvin". Shush.)</font>
+<P>
+          <tt>fc4</tt> illustrates a shift-reduce conflict which arose when
+          modifying the <tt>fc3</tt> grammar to allow input in degrees Kelvin.
+          You should probably skip this example until you encounter a
+          shift-reduce conflict in one of your own grammars. <tt>fc4a</tt> and
+          <tt>fc4b</tt> are two different resolutions of the conflict.
+<P>
+          <tt>fc5</tt> illustrates the use of an event driven parser. The
+          actual grammar is the same as <tt>fc4b</tt>. The only difference is
+          in the method of providing input to the parser.
+<P>
+
+<H2>FC1</H2>
+          <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
+          conversion programs. It expects the user to type an integer
+          value, assumed to be a Fahrenheit temperature, which it
+          converts to Celsius. It then exits.
+<P>
+          The following features of AnaGram are introduced in <tt>fc1</tt>:
+<UL>
+<LI>     recursive definition of tokens </LI>
+<LI>     definition of a set as a range of characters</LI>
+<LI>     token type declaration </LI>
+<LI>     default token type </LI>
+<LI>     passing token values to reduction procedures </LI>
+<LI>     long and short form reduction procedures </LI>
+</UL>
+
+<P>
+          <tt>fc1</tt> defines two nonterminal tokens, "grammar", which
+          describes the entire input stream, and "integer", which
+          describes a simple unsigned integer value. "grammar", defined
+          by one production, describes the input as consisting of an
+          "integer" followed by a newline character. There is a
+          following reduction procedure to print out both the input
+          and converted values.
+<P>
+          "integer" is recursively defined by two productions. The
+          first production says that an "integer" may be represented
+          by a single decimal digit. '0-9' represents the set of ascii
+          characters on the range '0' through '9'. The token can be
+          matched by any character from this set. The second
+          production contains the recursion. It says that the combination of
+          any "integer" followed by another decimal digit is also an
+          integer. Note that the left side is the same for these two
+          productions and it need not be repeated.
+<P>
+          Note the type cast preceding "integer". This type cast
+          defines the data type of the semantic value of "integer" to
+          be int. When the parser stores a token value for "integer"
+          on its value stack or retrieves a value from the stack, the
+          type of the data transmitted will be int.
+<P>
+          Since there is no type cast for the token "grammar", the
+          data type for the semantic value of "grammar" is given by
+          the "default token type" configuration parameter, which
+          defaults to void.
+<P>
+          The semantic values of the tokens in a grammar rule may be
+          passed to the associated reduction procedure. The name of
+          the variable in the reduction procedure is simply appended
+          to the token name or expression in the rule with a colon as
+          a separator.
+<P>
+          All three reduction procedures in <tt>fc1</tt> operate on
+          the semantic
+          values of tokens on the parser stack as parameters. In
+          the first reduction procedure, the variable <tt>f</tt> represents the
+          value of the integer typed by the user. It is taken as a
+          Fahrenheit temperature and converted to Celsius by the
+          reduction procedure. This reduction procedure uses the long
+          form, consisting of an equal sign followed by a block of C
+          code.
+<P>
+          The reduction procedures for the two productions for "integer"
+          convert the integer from ascii form as typed by the
+          user to binary form. The first reduction procedure
+          calculates the value of a single digit integer. The second
+          reduction procedure calculates the value for an integer with more
+          than one digit. Notice that these reduction procedures both
+          use the short form: a C expression terminated by a semicolon.
+          The value of the expression is saved as the semantic
+          value of the reduced token.
+<P>
+
+<H3>Testing FC1</H3>
+          Run AnaGram and build a parser, <tt>fc1.c</tt>. Compile it
+          and link it
+          with your C compiler. Run <tt>fc1</tt> from the command line.
+          Type an integer and press
+          Enter. <tt>fc1</tt> will print out Fahrenheit and Celsius
+          temperatures.
+<P>
+
+
+<H2>FC2</H2>
+          The next example of a syntax file, <tt>fc2.syn</tt>, is a somewhat
+          more interesting Fahrenheit-Celsius converter. This time the
+          input stream may contain any number of temperatures. The
+          program will continue until it encounters an end of file. If
+          it encounters a syntax error in the input, it will skip to
+          the next newline character and continue. <tt>fc2</tt> has
+          been set up
+          to illustrate the File Trace feature of AnaGram.
+<P>
+          The following features of AnaGram are introduced in <tt>fc2</tt>:
+<UL>
+<LI>          configuration section </LI>
+<LI>          character set defined by set union </LI>
+<LI>          end of file token </LI>
+<LI>          virtual productions </LI>
+<LI>          use of '?' to define a virtual production </LI>
+<LI>          error token resynchronization </LI>
+<LI>          File Trace </LI>
+</UL>
+<P>
+          <tt>fc2</tt> allows the temperature values to be signed
+          integers which
+          may be either Fahrenheit or Celsius, as determined by a
+          following 'f' or 'c'. Each temperature value to be
+          converted must be followed by a newline. Spaces in the input
+          are not allowed.
+<P>
+          To facilitate testing, a configuration section has been
+          added at the beginning of the file to set two configuration
+          parameters, discussed below. The lines specifying the
+          parameters have been labeled C1 and C2 at the right margin
+          to make them easy to refer to in this documentation.
+          Similarly, productions have been labeled P1, P2, etc.
+<P>
+          After the configuration section, an end of file token, eof,
+          is defined. Remember that when using stream I/O, the end of
+          file is signalled by a -1.
+<P>
+          The first production, P1, describes the entire input file as
+          an optional sequence of temperatures followed by an end of
+          file.
+<P>
+          The expression <CODE>[temperature, '\n']...</CODE> is a "virtual
+          production". The square brackets indicate the rule inside the
+          brackets is optional. The ellipsis (<CODE>...</CODE>)
+          indicates that the
+          rule may be repeated an arbitrary number of times.
+<P>
+          Productions P2 and P3 define "temperature" in the normal
+          case. Production P4 controls error recovery, described
+          below.In AnaGram, <CODE>'f' + 'F'</CODE> represents the set
+          of characters
+          containing both upper and lower case 'f'. (The plus sign is
+          the union operator of set theory.) Either an upper or lower
+          case 'f' in the input will match this set. <CODE>'c' +
+          'C'</CODE> is
+          similarly interpreted. Thus a temperature consists of a
+          number followed by an 'f' or 'c' which can be either upper
+          or lower case. The reduction procedures, of course, are
+          different for 'f' and 'c'.
+<P>
+          Productions P5 and P6 define "number" as consisting of an
+          integer with an optional sign. <CODE>'+'?</CODE> is a virtual production.
+          The question mark indicates that the preceding element, in
+          this case the plus sign, is optional. These productions
+          allow you to write numbers in the form 17, +17, or -17.
+<P>
+          Productions P7 and P8 define "integer" exactly as in <tt>fc1</tt>.
+<P>
+
+<H3> Recovering and Continuing after a Syntax Error </H3>
+          The production P4 controls error recovery. "error" is a
+          special token in an AnaGram grammar, which can be matched by
+          any sequence of input which contains a syntax error. If your
+          grammar has an error token, when your parser encounters a
+          syntax error it looks to see if there is an error token to
+          match with the syntax error. If "error" is not admissible in
+          the current state, it discards the previous token on the
+          input stack and looks again. It continues until it gets back
+          to a state where "error" is acceptable input or the stack is
+          empty. If the stack is empty, it terminates the parse.
+          Otherwise, it then looks to see if the next input token is
+          admissible. If so, the parse continues. If not, the token is
+          discarded and the parser reads input until it finds an
+          acceptable token or the end of file. In this example, the
+          parser will read characters until it finds a newline character
+          At this point the parse will continue as though nothing
+          had happened. This process is called "error token
+          resynchronization". It is one of several ways to continue after a
+          parser detects a syntax error in its input stream.
+<P>
+
+
+<H3>  Configuration Parameters </H3>
+          Two configuration parameters have been set in the
+          configuration section of <tt>fc2</tt> to facilitate testing
+          using the File
+          Trace. The first, test file mask, limits the choice of test
+          files to be used with the File Trace option to files with
+          the extension <tt>.fc2</tt>. The second, traditional engine, turns
+          off certain optimizations AnaGram normally builds into its
+          parsers. When the traditional engine switch is set, the
+          parsers AnaGram builds use only the four traditional parser
+          actions: shift, reduce, error, and accept. Otherwise,
+          AnaGram parsers use a number of compound actions in order to
+          reduce the size of the parsing tables and increase the speed
+          of the parser. In this case, the traditional engine switch
+          has been turned on in order to make the behavior of the
+          parser as seen with the File Trace correspond to textbook
+          behavior.
+<P>
+
+<H3> Testing FC2 </H3>
+          Test <tt>fc2</tt> just as you did <tt>fc1</tt>: Run AnaGram
+          and build a
+          parser, <tt>fc2.c</tt>. Compile it and link it with your C
+          compiler. Run
+          <tt>fc2</tt> from the command line. Type an integer, with or
+          without a sign, the letter 'f'
+          or 'c', and press Enter. <tt>fc2</tt> will print out Fahrenheit and
+          Celsius temperatures. Repeat until you are satisfied the
+          program works. Try making a few deliberate typos to test the
+          error token resynchronization. Note that the parser
+          automatically provides error diagnostics. These diagnostics
+          are created by the default <tt>SYNTAX_ERROR</tt> macro.
+<P>
+          Type ^Z and Enter (Windows) or ^D (Unix) to generate an end
+	  of file and end the program. (Of course, you
+          could also use ^C or ^Brk.)
+<P>
+          Alternatively, you may use the test file, <tt>test.fc2</tt>,
+          which has
+          been provided for use with the File Trace (see below).
+          Simply run <tt>fc2</tt> with input redirection to take input from
+          <tt>test.fc2</tt>. At the command prompt, type:
+<PRE>
+      fc2 &lt; test.fc2
+</PRE>
+<P>
+
+<H2>  File Trace </H2>
+          The File Trace feature of AnaGram allows you to test a
+          grammar without actually building a parser. This enables you to
+          completely decouple the debugging of the grammar from the
+          debugging of the reduction procedures. You can try out test
+          files before you have written anything more than the
+          grammar. This allows for very early testing in your projects.
+<P>
+
+         File Trace allows you to see in fine detail just exactly how
+         your parser will analyze an input file. A File Trace consists of  a
+         window with various panes so you can see what is going on,
+         and an interpretive parser which works in the background.
+<P>
+          You can select File Trace from the Action
+          menu once you have analyzed your grammar. For the <tt>fc2</tt>
+          example, you will be offered a choice of test files with
+          extension <tt>.fc2</tt>. A good choice would be <tt>test1.fc2</tt>.
+         The File Trace window will show a Parser Stack
+         pane to your left, a Test File pane, which shows you the
+         input file you are parsing, to your right, and a Rule Stack
+         pane across the bottom of the window.
+<P>
+          The way the File Trace parser works is this: Initially none of
+          the test file has been parsed. If you double-click  with the
+          left mouse button at a point in the Test File pane, the parser
+          parses to that point. The unparsed part of the file will be
+          colored differently from the parsed part  (in the default
+          color scheme, parsed characters
+          have a lighter background). To back up the parse to a
+          previous location, double-click at that spot, or single-click
+          and press Enter or the Synch Parse button at the bottom of
+          the File Trace window. To check a file for syntax errors, all
+          you need to do is to click the Parse File button. If there is
+          a syntax error, the parse will not advance beyond the error
+          point. Normally, however, you will probably want to proceed
+          more deliberately, moving the cursor one character at a
+          time.
+<P>
+          If you wish to see even finer detail, you may make the
+          parser work in single step mode. by clicking on Single Step
+          or pressing Enter. Each time you click on the Single Step
+          button or press Enter, the parser will perform one parser
+          action. Note that in its normal configuration, AnaGram
+          produces parsers that use a number of parser actions more
+          complex than the traditional shift and reduce actions.
+<P>
+          The Parser Stack pane shows the levels of the parser stack, the
+          state numbers on the stack and the tokens that have been
+          recognized so far.
+          As you advance through the test file, you can see by
+          looking at the Parser Stack pane how the parser stack changes
+          as characters are shifted in and reductions occur. The Rule
+          Stack pane is an alternate view of the parser stack showing
+          the grammar rules in play at any moment. Notice how the
+          syntax file window is synched with the Rule Stack.
+<P>
+          If the parse position is not located at the blinking cursor,
+          the Single Step button will be changed to read "Synch Parse".
+          Clicking on the button will move the parse position to the
+          cursor.
+<P>
+          If you now click on tokens at various levels in the Token
+          Stack, the Test File characters corresponding to these tokens
+          will be highlighted. You can restart the parse at any level
+          by double-clicking just preceding the highlight. The Rule
+          Stack is also synched with the Token Stack and Test File
+          panes.
+<P>
+          You may interrupt the File Trace at any time to inspect any
+          other window without interfering with the File Trace.
+          Whenever you come back to it, you can proceed as though
+          nothing has happened.
+<P>
+          If you have a long file, a complex grammar, or a (very) slow
+          computer, it can sometimes take a while for the parse to catch
+          up with the cursor. If you have a long test file and press
+          Parse File to move the cursor to the end of the file, the
+          parser has a lot of computation to do to catch up.
+<P>
+          For further details about the File Trace, please refer to the
+          AnaGram User's Guide and the on-line documentation.
+<P>
+<BR>
+
+
+<H2> FC3 </H2>
+          <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
+          floating point
+          arithmetic, so that it can deal with non-integral values,
+          and it allows optional white space, including comments, in
+          the input. In addition, it changes the output format, so
+          that results are printed in degrees Kelvin, as well as in
+          Fahrenheit and Celsius.
+<P>
+          The following features of AnaGram are introduced in <tt>fc3</tt>:
+<UL>
+<LI>         Setting the default token type </LI>
+<LI>         The "disregard" statement </LI>
+<LI>         The "lexeme" statement </LI>
+<LI>         Right recursion </LI>
+<LI>         Default value of a reduction token </LI>
+</UL>
+<P>
+
+<H3> Floating Point Arithmetic </H3>
+          To deal with floating point arithmetic, a number of new
+          productions have been added and two productions have been
+          changed. Productions P5 and P6 have been changed to define a
+          "number" in terms of an "unsigned number" instead of an
+          "integer". Productions P6a, P6b, and P6c define "unsigned
+          number" in terms of its integer part and its fraction part.
+          Productions P9 and P10 define the fraction part of the
+          number. Note that "fraction" is described using right
+          recursion rather than left recursion. This makes the reduction
+          procedures neater. Since reduction does not occur until
+          a rule is complete, note that with right recursion each new
+          digit causes the stack depth to increase by one. You can
+          observe this with the File Trace.
+<P>
+          In order to replace integer arithmetic with floating point
+          arithmetic, statement C3 was added to the configuration
+          section of the grammar. It declares the default token type,
+          the type assigned to nonterminal tokens absent a specific
+          declaration, to be "double". Since we are not interested in
+          values for "grammar" and "temperature", they have been
+          explicitly cast to "void".
+<P>
+          Note that production P6a does not have a reduction procedure.
+          If a rule has no reduction procedure the value of the
+          reduction token defaults to the value of the first element
+          in the rule. In this case, the value of "unsigned number",
+          in the absence of a reduction procedure, is taken to be the
+          value of "integer".
+<P>
+
+<H3> Skipping White Space </H3>
+          Two statements, C4 and C5, are used to skip over uninteresting
+          white space in the input to <tt>fc3</tt>. The "disregard" statement
+          instructs AnaGram to rewrite your grammar in a standard
+          way so that your parser will skip over any instance of the
+          "white space" token that occurs between lexemes, or lexical
+          units, in the input to your parser. Of course, you can use
+          any token name you wish in a "disregard" statement. You can
+          even have multiple "disregard" statements and all of the
+          tokens you specify will be disregarded.
+<P>
+          The "lexeme" statement is used to declare that certain
+          nonterminal tokens are each to be considered as indivisible
+          lexical units, or lexemes, from the point of view of lexical
+          analysis, so that the "disregard" statement is inoperative
+          inside the nonterminal tokens listed. In this case, statement
+          C5 simply guarantees that white space will not be
+          allowed inside a number. All terminal tokens are automatically
+          lexemes. <!-- when not part of a larger lexeme. -->
+<P>
+<!--
+   It would be nice to rewrite this so the example expansion is 
+   vaguely close to syntactically legal.
+-->
+          To make your parser skip white space, AnaGram renames and
+          redefines the lexemes in your grammar and defines a number
+          of new tokens. For example, if '+' is a lexeme in your
+          grammar and your grammar is to disregard "white space",
+          AnaGram will rename the plus sign token as '+'%. It then
+          introduces a new production in your grammar as follows:
+<PRE>
+          '+'
+             -&gt; '+'%, white space?...
+</PRE>
+
+          This means that a plus sign followed by some white space
+          will now be treated the same, syntactically, as a plus sign
+          alone. The percent sign (a degree sign in AnaGram 1.x) is
+          used to indicate the original, or "pure",
+          definition of the token.
+<P>
+          Productions P11 and P12 together define what is meant by
+          white space in this grammar. P11 defines white space to
+          include blanks and tab characters. P12 includes C style
+          comments (not nested).
+<P>
+          The white space defined in P11 and P12 does not include
+          newline characters. There is a good reason for this. The
+          grammar uses newline characters as delimiters, marking the
+          end of a "temperature". In order to allow blank lines,
+          production P1 was modified to make the temperature optional.
+<P>
+          To allow "//" style comments, a new token, "end of line",
+          defined by production P13, was added. In production P1, '\n'
+          was then replaced by "end of line".
+<P>
+
+<H3> Testing FC3 </H3>
+          The "test file mask" was changed to use files with the
+          extension ".fc3" in the File Trace. TEST.FC3 can be used as
+          input for the File Trace.
+<P>
+          Build and compile <tt>fc3</tt> in the same way as previous versions.
+          When you run it, try using numbers with decimal fractions.
+          Try typing blanks in various places to see how the parser
+          deals with them. Try redirecting input from <tt>test.fc3</tt>.
+<P>
+
+
+<H2> FC4 </H2>
+          <tt>fc4</tt> illustrates a shift-reduce conflict in a grammar. You
+          should probably skip this example until you encounter a
+          shift-reduce conflict in one of your own grammars. In the
+          meantime, skip ahead to <tt>fc5</tt>.
+<P>
+          The following features of AnaGram are introduced in <tt>fc4</tt>:
+<UL>
+<LI>            Conflicts window </LI>
+<LI>            Auxiliary Windows menu </LI>
+<LI>            State Definition window </LI>
+<LI>            Expansion Chain window </LI>
+</UL>
+<P>
+          In <tt>fc3</tt>, the output was changed to provide results in degrees
+          Kelvin, as well as in Fahrenheit and Celsius. In <tt>fc4</tt>, a
+          production, P3a, is added to accept input in degrees Kelvin.
+          Since there is no such thing as a negative temperature on
+          the Kelvin scale, it seemed appropriate to require that the
+          temperature be an unsigned number. This, however, caused a
+          conflict, i.e. an ambiguity, in <tt>fc4</tt>. In fact, it caused four
+          conflicts to be diagnosed, all of which have a common
+          source.
+<P>
+
+<H3> Finding a Conflict </H3>
+          If you run AnaGram and analyze <tt>fc4</tt> you will find that
+          AnaGram finds conflicts in the grammar. To determine the nature
+          of the conflicts, you should first open the Conflicts
+          window. The Conflicts window is available via the Browse Menu
+          or a Control Panel button.
+<P>
+          The first thing you see is that there are conflicts in
+          states S005 and S025. In each state, one conflict occurs
+          because a decimal point can either be shifted, in accordance
+          with rule R021, or it can reduce rule R014, an empty rule,
+          or null production. The other conflict occurs because a
+          decimal digit can either be shifted, in accordance with rule
+          R022, or it can reduce rule R014, the same rule that gives
+          trouble with the decimal point.
+<P>
+          The first step in understanding the conflict is to see rules
+          R014, R021, and R022 in context. The Conflicts window is
+          synchronized with the syntax file window. Arrange these
+          windows so you can see them both at once.
+          Then, in the Conflicts window, move the cursor bar up and
+          down. In the syntax file window, the cursor will move
+          between productions for number, unsigned number and integer.
+<P>
+          Although it is possible to recognize rules R021 and R022 in
+          the grammar, there is no explicit null production in the
+          grammar. To find out for sure what rule R014 is, pop up the
+          Rule Table (listed in the Browse menu) and look for R014. It
+          turns out that R014 is the null production that corresponds
+          to an optional plus sign, written '+'? in the grammar.
+<P>
+          To get a better idea of what is going on, it is worthwhile to
+          find out what the parser is expecting to see in states S005
+          or S025. To find out, click the right mouse button in the
+          Conflicts window on any line describing a state S005
+          conflict to pop up the Auxiliary Windows menu. Select State
+          Definition to find out what state S005 is all about. It
+          seems that state S005 occurs when the parser has skipped
+          over the initial white space and is about to begin dealing
+          with the actual input. But, looking at this, it is still not
+          clear why a decimal digit or a decimal point is ambiguous.
+          The Expansion Chain windows for rules R021, R022 and R014
+          will show how these rules are derived from the
+          characteristic rule for state S005.
+<P>
+
+          Return to the Conflicts window. Click the right mouse button
+          on rule R021, the rule that expects the decimal digit,
+          to pop up the Auxiliary Windows menu. Select
+          Expansion Chain. The Expansion Chain window shows how rule
+          R021 derives from the characteristic rule for the state.
+          Each line in this window is a grammar rule produced by the
+          marked token  in the rule on
+          the previous line.
+<P>
+          Now return to the Conflicts window, and get the Expansion
+          Chain window for rule R014. Rearrange the windows on the
+          screen so you can compare the Expansion Chain windows for
+          rules R014 and R021. Note that rule R014 derives from the
+          production
+<PRE>
+        temperature
+         -&gt; number, 'c' + 'C'
+</PRE>
+          and rule R021 derives from the production
+<PRE>
+        temperature
+         -&gt; unsigned number, 'k' + 'K'
+</PRE>
+<P>
+          It is now possible to see the nature of the conflict. On the
+          one hand, if the input is a supposed to be a Kelvin
+          temperature, the parser can go right ahead accumulating a
+          number. On the other hand, if the input is supposed to be a
+          Celsius number, the parser has to first acknowledge the
+          optional plus sign.
+<P>
+          It is the nature of LALR parsers that they can keep track of
+          many threads of possible parses simultaneously as long as
+          they don't have to do any reductions. When they come to the
+          end of a rule, however, they are forced to decide whether
+          the rule has been successfully matched. In this case, the
+          parser is at the end of a null production which arises from
+          the virtual production <CODE> '+'? </CODE> in P6 and is forced to decide
+          whether this null production has been matched. The conflict
+          diagnostics discussed above say that if the next token is a
+          digit or a decimal point, the parser cannot decide between
+          several possibilities. That is, it cannot decide whether or
+          not it has seen an elided plus sign. In effect the parser is
+          being required, because of the null production, to make a
+          premature decision as to whether a Kelvin or Celsius
+          temperature is present in the input.
+<P>
+          The conflicts can be eliminated by rewriting the grammar so
+          the parser will not come to the end of a rule and be forced
+          to choose among the several threads of the parse until it
+          encounters the determining letter 'f', 'c', or 'k'.
+<P>
+          Two ways to remove the conflict are illustrated. The first is
+          found in FC4A. In this grammar, production P6 has been
+          replaced with two productions, P6x and P6y. This is a
+          standard method of rewriting a grammar to eliminate null
+          productions. If you analyze <tt>fc4a</tt>, you will find that it no
+          longer has a conflict, so this solves the problem.
+<P>
+          However, if you look at the <tt>fc4a</tt> grammar closely, you will
+          notice that +23.7K is not acceptable input, although common
+          sense suggests that you ought to be able to use a plus sign
+          on Kelvin temperatures. <tt>fc4b</tt> shows another way to fix the
+          grammar which deals with this quibble. In this grammar,
+          production P3a has been modified to allow an optional plus
+          sign before a Kelvin temperature. If you analyze <tt>fc4b</tt>, you
+          will find that this change also solves the problem.
+<P>
+          In both these instances, the technique used to resolve the
+          conflicts was to rewrite the grammar so that there are no
+          differences between constructs upstream of the point where
+          they diverge. Another way to put it is this: In the original
+          grammar Kelvin temperatures were distinguished from
+          Fahrenheit and Celsius not only by the 'K' suffix, but also by the
+          optional plus sign. The essence of both fixes to the problem
+          is to remove this distinction.
+<P>
+          Other AnaGram windows available from the Auxiliary Windows
+          popup menu for the Conflicts window such as Rule Derivation, Token
+          Derivation, Conflict Trace and Problem States are helpful in
+          tracking down conflicts. See the help messages and your
+          AnaGram User's Guide for further details.
+<P>
+
+<H3>Testing FC4A and FC4B </H3>
+          Build and compile <tt>fc4a</tt> and <tt>fc4b</tt> in the
+          same way you built
+          previous versions. When you run the parsers, try using
+          Kelvin temperatures with and without leading plus signs.
+<P>
+          The "test file mask" was changed to use files with the
+          extension ".fc4" in the File Trace. <tt>test.fc4</tt> can be
+          used as
+          input either for the parsers themselves or for the File
+          Trace.
+<P>
+
+
+<H2> FC5 </H2>
+          <tt>fc5</tt> illustrates the use of an event driven parser. The
+          grammar is essentially the same as for <tt>fc4b</tt>. The primary
+          difference is in the method of providing input to the
+          parser. The following features of AnaGram are introduced in
+          <tt>fc5</tt>:
+<UL>
+<LI>            event driven parser </LI>
+<LI>            embedded C </LI>
+<LI>            main program </LI>
+<LI>            initializing and calling the parser </LI>
+</UL>
+
+          Statement C6 in the configuration section causes AnaGram to
+          build an event driven parser. An event driven parser is
+          first explicitly initialized and then called once for each
+          input unit.
+<P>
+          A small main program has been included in a block of
+          embedded C at the end of the file. This program calls the
+          initializer and then reads characters from <CODE>stdin</CODE> and passes
+          them on to the parser. Previous <tt>fc</tt> programs did not
+          include a
+          main program, but relied on AnaGram to create one. However,
+          AnaGram does not automatically create a main program for
+          parsers which are event driven, use pointer input, or have
+          any embedded C. Therefore a main program is necessary in
+          this syntax file.
+<P>
+          Note that the default names for the initializer and parser
+          are <CODE>init_fc5</CODE> and <CODE>fc5</CODE> respectively.
+          Only event driven parsers
+          require that the user explicitly call the initializer
+          function.
+<P>
+          In addition, a global constant, "zero", was defined in the
+          embedded C to provide the value of absolute zero, and the
+          reduction procedures were modified to refer to "zero"
+          instead of the explicit value.
+<P>
+          This illustrates an important point about the C parser file
+          that AnaGram builds: All blocks of embedded C precede the
+          reduction procedures, so that the reduction procedures can
+          access all variables and definitions included in embedded
+          C, no matter where they are located in the file.
+<P>
+
+<H3>Testing FC5 </H3>
+          Since <tt>fc5</tt> uses the same grammar as <tt>fc4</tt>,
+          you can use the same
+          test files for <tt>fc5</tt> as for <tt>fc4</tt>.
+</P>
+
+<BR>
+
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+      WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
+                WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+<P>
+Back to <A HREF="../index.html">Index</A>
+<P>
+<ADDRESS><FONT SIZE="-1">
+                  AnaGram parser generator - examples<BR>
+                  Fahrenheit-Celsius Converter<BR>
+                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
+                  All Rights Reserved.<BR>
+</FONT></ADDRESS>
+
+</BODY>
+</HTML>
+
author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children