view doc/misc/html/examples/fc.html @ 21:1c9dac05d040

Add lint-style FALLTHROUGH annotations to fallthrough cases. (in the parse engine and thus the output code) Document this, because the old output causes warnings with gcc10.
author David A. Holland
date Mon, 13 Jun 2022 00:04:38 -0400
parents 13d2b8934445
children
line wrap: on
line source

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE> Fahrenheit-Celsius Converter</TITLE>
</HEAD>




<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
 TEXT="#000000" LINK="#0033CC"
 VLINK="#CC0033" ALINK="#CC0099">

<P>
<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
         WIDTH=124 HEIGHT=30 >
<BR CLEAR="all">
Back to <A HREF="../index.html">Index</A>
<P>
<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>

<H1> Fahrenheit-Celsius Converter</H1>
<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>
<H2>Introduction</H2>
<P>
          Conversion of temperatures from Fahrenheit to Celsius is a
          traditional starting point for learning how to use a
          programming language. This directory contains a graded sequence
          of Fahrenheit to Celsius conversion programs, starting with
          a very simple case and working up to one of some complexity.
          This sequence of programs illustrates an important aspect of
          syntax directed programming: In contrast to conventional
          programming methods it is quite easy to begin with a simple
          case and then extend it to more complex situations.
<P>
          All of these programs accept input from <tt>stdin</tt> and write
          output to <tt>stdout</tt>. These programs are somewhat exceptional,
          since, except for FC5, they do not have any embedded C and
          therefore do not require explicit definition of a main
          program.
<P>
          <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
          conversion programs. It expects the user to type a positive
          integer value, assumed to be a Fahrenheit temperature, which
          it converts to Celsius. It then exits.
<P>
          <tt>fc2</tt>, the next example, is a somewhat more interesting
          Fahrenheit-Celsius converter. This time the input stream may
          contain any number of temperatures, either Fahrenheit or
          Celsius, each terminated by a newline character. The
          program will continue until it encounters an end of file. If it
          encounters a syntax error in the input, it will skip to the
          next newline character and continue. <tt>fc2</tt> has been set up to
          illustrate the usage of the File Trace feature of AnaGram.
<P>
          <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
          floating point
          arithmetic, so that it can deal with non-integral values,
          and it allows optional white space in the input, except
          within numbers. In addition, it changes the output format,
          so that results are printed in degrees Kelvin, as well as in
          Fahrenheit and Celsius.
          <font size=-1>(Yes, we know that in Newspeak the official
          usage is "Kelvins", not "degrees Kelvin". Shush.)</font>
<P>
          <tt>fc4</tt> illustrates a shift-reduce conflict which arose when
          modifying the <tt>fc3</tt> grammar to allow input in degrees Kelvin.
          You should probably skip this example until you encounter a
          shift-reduce conflict in one of your own grammars. <tt>fc4a</tt> and
          <tt>fc4b</tt> are two different resolutions of the conflict.
<P>
          <tt>fc5</tt> illustrates the use of an event driven parser. The
          actual grammar is the same as <tt>fc4b</tt>. The only difference is
          in the method of providing input to the parser.
<P>

<H2>FC1</H2>
          <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
          conversion programs. It expects the user to type an integer
          value, assumed to be a Fahrenheit temperature, which it
          converts to Celsius. It then exits.
<P>
          The following features of AnaGram are introduced in <tt>fc1</tt>:
<UL>
<LI>     recursive definition of tokens </LI>
<LI>     definition of a set as a range of characters</LI>
<LI>     token type declaration </LI>
<LI>     default token type </LI>
<LI>     passing token values to reduction procedures </LI>
<LI>     long and short form reduction procedures </LI>
</UL>

<P>
          <tt>fc1</tt> defines two nonterminal tokens, "grammar", which
          describes the entire input stream, and "integer", which
          describes a simple unsigned integer value. "grammar", defined
          by one production, describes the input as consisting of an
          "integer" followed by a newline character. There is a
          following reduction procedure to print out both the input
          and converted values.
<P>
          "integer" is recursively defined by two productions. The
          first production says that an "integer" may be represented
          by a single decimal digit. '0-9' represents the set of ascii
          characters on the range '0' through '9'. The token can be
          matched by any character from this set. The second
          production contains the recursion. It says that the combination of
          any "integer" followed by another decimal digit is also an
          integer. Note that the left side is the same for these two
          productions and it need not be repeated.
<P>
          Note the type cast preceding "integer". This type cast
          defines the data type of the semantic value of "integer" to
          be int. When the parser stores a token value for "integer"
          on its value stack or retrieves a value from the stack, the
          type of the data transmitted will be int.
<P>
          Since there is no type cast for the token "grammar", the
          data type for the semantic value of "grammar" is given by
          the "default token type" configuration parameter, which
          defaults to void.
<P>
          The semantic values of the tokens in a grammar rule may be
          passed to the associated reduction procedure. The name of
          the variable in the reduction procedure is simply appended
          to the token name or expression in the rule with a colon as
          a separator.
<P>
          All three reduction procedures in <tt>fc1</tt> operate on
          the semantic
          values of tokens on the parser stack as parameters. In
          the first reduction procedure, the variable <tt>f</tt> represents the
          value of the integer typed by the user. It is taken as a
          Fahrenheit temperature and converted to Celsius by the
          reduction procedure. This reduction procedure uses the long
          form, consisting of an equal sign followed by a block of C
          code.
<P>
          The reduction procedures for the two productions for "integer"
          convert the integer from ascii form as typed by the
          user to binary form. The first reduction procedure
          calculates the value of a single digit integer. The second
          reduction procedure calculates the value for an integer with more
          than one digit. Notice that these reduction procedures both
          use the short form: a C expression terminated by a semicolon.
          The value of the expression is saved as the semantic
          value of the reduced token.
<P>

<H3>Testing FC1</H3>
          Run AnaGram and build a parser, <tt>fc1.c</tt>. Compile it
          and link it
          with your C compiler. Run <tt>fc1</tt> from the command line.
          Type an integer and press
          Enter. <tt>fc1</tt> will print out Fahrenheit and Celsius
          temperatures.
<P>


<H2>FC2</H2>
          The next example of a syntax file, <tt>fc2.syn</tt>, is a somewhat
          more interesting Fahrenheit-Celsius converter. This time the
          input stream may contain any number of temperatures. The
          program will continue until it encounters an end of file. If
          it encounters a syntax error in the input, it will skip to
          the next newline character and continue. <tt>fc2</tt> has
          been set up
          to illustrate the File Trace feature of AnaGram.
<P>
          The following features of AnaGram are introduced in <tt>fc2</tt>:
<UL>
<LI>          configuration section </LI>
<LI>          character set defined by set union </LI>
<LI>          end of file token </LI>
<LI>          virtual productions </LI>
<LI>          use of '?' to define a virtual production </LI>
<LI>          error token resynchronization </LI>
<LI>          File Trace </LI>
</UL>
<P>
          <tt>fc2</tt> allows the temperature values to be signed
          integers which
          may be either Fahrenheit or Celsius, as determined by a
          following 'f' or 'c'. Each temperature value to be
          converted must be followed by a newline. Spaces in the input
          are not allowed.
<P>
          To facilitate testing, a configuration section has been
          added at the beginning of the file to set two configuration
          parameters, discussed below. The lines specifying the
          parameters have been labeled C1 and C2 at the right margin
          to make them easy to refer to in this documentation.
          Similarly, productions have been labeled P1, P2, etc.
<P>
          After the configuration section, an end of file token, eof,
          is defined. Remember that when using stream I/O, the end of
          file is signalled by a -1.
<P>
          The first production, P1, describes the entire input file as
          an optional sequence of temperatures followed by an end of
          file.
<P>
          The expression <CODE>[temperature, '\n']...</CODE> is a "virtual
          production". The square brackets indicate the rule inside the
          brackets is optional. The ellipsis (<CODE>...</CODE>)
          indicates that the
          rule may be repeated an arbitrary number of times.
<P>
          Productions P2 and P3 define "temperature" in the normal
          case. Production P4 controls error recovery, described
          below.In AnaGram, <CODE>'f' + 'F'</CODE> represents the set
          of characters
          containing both upper and lower case 'f'. (The plus sign is
          the union operator of set theory.) Either an upper or lower
          case 'f' in the input will match this set. <CODE>'c' +
          'C'</CODE> is
          similarly interpreted. Thus a temperature consists of a
          number followed by an 'f' or 'c' which can be either upper
          or lower case. The reduction procedures, of course, are
          different for 'f' and 'c'.
<P>
          Productions P5 and P6 define "number" as consisting of an
          integer with an optional sign. <CODE>'+'?</CODE> is a virtual production.
          The question mark indicates that the preceding element, in
          this case the plus sign, is optional. These productions
          allow you to write numbers in the form 17, +17, or -17.
<P>
          Productions P7 and P8 define "integer" exactly as in <tt>fc1</tt>.
<P>

<H3> Recovering and Continuing after a Syntax Error </H3>
          The production P4 controls error recovery. "error" is a
          special token in an AnaGram grammar, which can be matched by
          any sequence of input which contains a syntax error. If your
          grammar has an error token, when your parser encounters a
          syntax error it looks to see if there is an error token to
          match with the syntax error. If "error" is not admissible in
          the current state, it discards the previous token on the
          input stack and looks again. It continues until it gets back
          to a state where "error" is acceptable input or the stack is
          empty. If the stack is empty, it terminates the parse.
          Otherwise, it then looks to see if the next input token is
          admissible. If so, the parse continues. If not, the token is
          discarded and the parser reads input until it finds an
          acceptable token or the end of file. In this example, the
          parser will read characters until it finds a newline character
          At this point the parse will continue as though nothing
          had happened. This process is called "error token
          resynchronization". It is one of several ways to continue after a
          parser detects a syntax error in its input stream.
<P>


<H3>  Configuration Parameters </H3>
          Two configuration parameters have been set in the
          configuration section of <tt>fc2</tt> to facilitate testing
          using the File
          Trace. The first, test file mask, limits the choice of test
          files to be used with the File Trace option to files with
          the extension <tt>.fc2</tt>. The second, traditional engine, turns
          off certain optimizations AnaGram normally builds into its
          parsers. When the traditional engine switch is set, the
          parsers AnaGram builds use only the four traditional parser
          actions: shift, reduce, error, and accept. Otherwise,
          AnaGram parsers use a number of compound actions in order to
          reduce the size of the parsing tables and increase the speed
          of the parser. In this case, the traditional engine switch
          has been turned on in order to make the behavior of the
          parser as seen with the File Trace correspond to textbook
          behavior.
<P>

<H3> Testing FC2 </H3>
          Test <tt>fc2</tt> just as you did <tt>fc1</tt>: Run AnaGram
          and build a
          parser, <tt>fc2.c</tt>. Compile it and link it with your C
          compiler. Run
          <tt>fc2</tt> from the command line. Type an integer, with or
          without a sign, the letter 'f'
          or 'c', and press Enter. <tt>fc2</tt> will print out Fahrenheit and
          Celsius temperatures. Repeat until you are satisfied the
          program works. Try making a few deliberate typos to test the
          error token resynchronization. Note that the parser
          automatically provides error diagnostics. These diagnostics
          are created by the default <tt>SYNTAX_ERROR</tt> macro.
<P>
          Type ^Z and Enter (Windows) or ^D (Unix) to generate an end
	  of file and end the program. (Of course, you
          could also use ^C or ^Brk.)
<P>
          Alternatively, you may use the test file, <tt>test.fc2</tt>,
          which has
          been provided for use with the File Trace (see below).
          Simply run <tt>fc2</tt> with input redirection to take input from
          <tt>test.fc2</tt>. At the command prompt, type:
<PRE>
      fc2 &lt; test.fc2
</PRE>
<P>

<H2>  File Trace </H2>
          The File Trace feature of AnaGram allows you to test a
          grammar without actually building a parser. This enables you to
          completely decouple the debugging of the grammar from the
          debugging of the reduction procedures. You can try out test
          files before you have written anything more than the
          grammar. This allows for very early testing in your projects.
<P>

         File Trace allows you to see in fine detail just exactly how
         your parser will analyze an input file. A File Trace consists of  a
         window with various panes so you can see what is going on,
         and an interpretive parser which works in the background.
<P>
          You can select File Trace from the Action
          menu once you have analyzed your grammar. For the <tt>fc2</tt>
          example, you will be offered a choice of test files with
          extension <tt>.fc2</tt>. A good choice would be <tt>test1.fc2</tt>.
         The File Trace window will show a Parser Stack
         pane to your left, a Test File pane, which shows you the
         input file you are parsing, to your right, and a Rule Stack
         pane across the bottom of the window.
<P>
          The way the File Trace parser works is this: Initially none of
          the test file has been parsed. If you double-click  with the
          left mouse button at a point in the Test File pane, the parser
          parses to that point. The unparsed part of the file will be
          colored differently from the parsed part  (in the default
          color scheme, parsed characters
          have a lighter background). To back up the parse to a
          previous location, double-click at that spot, or single-click
          and press Enter or the Synch Parse button at the bottom of
          the File Trace window. To check a file for syntax errors, all
          you need to do is to click the Parse File button. If there is
          a syntax error, the parse will not advance beyond the error
          point. Normally, however, you will probably want to proceed
          more deliberately, moving the cursor one character at a
          time.
<P>
          If you wish to see even finer detail, you may make the
          parser work in single step mode. by clicking on Single Step
          or pressing Enter. Each time you click on the Single Step
          button or press Enter, the parser will perform one parser
          action. Note that in its normal configuration, AnaGram
          produces parsers that use a number of parser actions more
          complex than the traditional shift and reduce actions.
<P>
          The Parser Stack pane shows the levels of the parser stack, the
          state numbers on the stack and the tokens that have been
          recognized so far.
          As you advance through the test file, you can see by
          looking at the Parser Stack pane how the parser stack changes
          as characters are shifted in and reductions occur. The Rule
          Stack pane is an alternate view of the parser stack showing
          the grammar rules in play at any moment. Notice how the
          syntax file window is synched with the Rule Stack.
<P>
          If the parse position is not located at the blinking cursor,
          the Single Step button will be changed to read "Synch Parse".
          Clicking on the button will move the parse position to the
          cursor.
<P>
          If you now click on tokens at various levels in the Token
          Stack, the Test File characters corresponding to these tokens
          will be highlighted. You can restart the parse at any level
          by double-clicking just preceding the highlight. The Rule
          Stack is also synched with the Token Stack and Test File
          panes.
<P>
          You may interrupt the File Trace at any time to inspect any
          other window without interfering with the File Trace.
          Whenever you come back to it, you can proceed as though
          nothing has happened.
<P>
          If you have a long file, a complex grammar, or a (very) slow
          computer, it can sometimes take a while for the parse to catch
          up with the cursor. If you have a long test file and press
          Parse File to move the cursor to the end of the file, the
          parser has a lot of computation to do to catch up.
<P>
          For further details about the File Trace, please refer to the
          AnaGram User's Guide and the on-line documentation.
<P>
<BR>


<H2> FC3 </H2>
          <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
          floating point
          arithmetic, so that it can deal with non-integral values,
          and it allows optional white space, including comments, in
          the input. In addition, it changes the output format, so
          that results are printed in degrees Kelvin, as well as in
          Fahrenheit and Celsius.
<P>
          The following features of AnaGram are introduced in <tt>fc3</tt>:
<UL>
<LI>         Setting the default token type </LI>
<LI>         The "disregard" statement </LI>
<LI>         The "lexeme" statement </LI>
<LI>         Right recursion </LI>
<LI>         Default value of a reduction token </LI>
</UL>
<P>

<H3> Floating Point Arithmetic </H3>
          To deal with floating point arithmetic, a number of new
          productions have been added and two productions have been
          changed. Productions P5 and P6 have been changed to define a
          "number" in terms of an "unsigned number" instead of an
          "integer". Productions P6a, P6b, and P6c define "unsigned
          number" in terms of its integer part and its fraction part.
          Productions P9 and P10 define the fraction part of the
          number. Note that "fraction" is described using right
          recursion rather than left recursion. This makes the reduction
          procedures neater. Since reduction does not occur until
          a rule is complete, note that with right recursion each new
          digit causes the stack depth to increase by one. You can
          observe this with the File Trace.
<P>
          In order to replace integer arithmetic with floating point
          arithmetic, statement C3 was added to the configuration
          section of the grammar. It declares the default token type,
          the type assigned to nonterminal tokens absent a specific
          declaration, to be "double". Since we are not interested in
          values for "grammar" and "temperature", they have been
          explicitly cast to "void".
<P>
          Note that production P6a does not have a reduction procedure.
          If a rule has no reduction procedure the value of the
          reduction token defaults to the value of the first element
          in the rule. In this case, the value of "unsigned number",
          in the absence of a reduction procedure, is taken to be the
          value of "integer".
<P>

<H3> Skipping White Space </H3>
          Two statements, C4 and C5, are used to skip over uninteresting
          white space in the input to <tt>fc3</tt>. The "disregard" statement
          instructs AnaGram to rewrite your grammar in a standard
          way so that your parser will skip over any instance of the
          "white space" token that occurs between lexemes, or lexical
          units, in the input to your parser. Of course, you can use
          any token name you wish in a "disregard" statement. You can
          even have multiple "disregard" statements and all of the
          tokens you specify will be disregarded.
<P>
          The "lexeme" statement is used to declare that certain
          nonterminal tokens are each to be considered as indivisible
          lexical units, or lexemes, from the point of view of lexical
          analysis, so that the "disregard" statement is inoperative
          inside the nonterminal tokens listed. In this case, statement
          C5 simply guarantees that white space will not be
          allowed inside a number. All terminal tokens are automatically
          lexemes. <!-- when not part of a larger lexeme. -->
<P>
<!--
   It would be nice to rewrite this so the example expansion is 
   vaguely close to syntactically legal.
-->
          To make your parser skip white space, AnaGram renames and
          redefines the lexemes in your grammar and defines a number
          of new tokens. For example, if '+' is a lexeme in your
          grammar and your grammar is to disregard "white space",
          AnaGram will rename the plus sign token as '+'%. It then
          introduces a new production in your grammar as follows:
<PRE>
          '+'
             -&gt; '+'%, white space?...
</PRE>

          This means that a plus sign followed by some white space
          will now be treated the same, syntactically, as a plus sign
          alone. The percent sign (a degree sign in AnaGram 1.x) is
          used to indicate the original, or "pure",
          definition of the token.
<P>
          Productions P11 and P12 together define what is meant by
          white space in this grammar. P11 defines white space to
          include blanks and tab characters. P12 includes C style
          comments (not nested).
<P>
          The white space defined in P11 and P12 does not include
          newline characters. There is a good reason for this. The
          grammar uses newline characters as delimiters, marking the
          end of a "temperature". In order to allow blank lines,
          production P1 was modified to make the temperature optional.
<P>
          To allow "//" style comments, a new token, "end of line",
          defined by production P13, was added. In production P1, '\n'
          was then replaced by "end of line".
<P>

<H3> Testing FC3 </H3>
          The "test file mask" was changed to use files with the
          extension ".fc3" in the File Trace. TEST.FC3 can be used as
          input for the File Trace.
<P>
          Build and compile <tt>fc3</tt> in the same way as previous versions.
          When you run it, try using numbers with decimal fractions.
          Try typing blanks in various places to see how the parser
          deals with them. Try redirecting input from <tt>test.fc3</tt>.
<P>


<H2> FC4 </H2>
          <tt>fc4</tt> illustrates a shift-reduce conflict in a grammar. You
          should probably skip this example until you encounter a
          shift-reduce conflict in one of your own grammars. In the
          meantime, skip ahead to <tt>fc5</tt>.
<P>
          The following features of AnaGram are introduced in <tt>fc4</tt>:
<UL>
<LI>            Conflicts window </LI>
<LI>            Auxiliary Windows menu </LI>
<LI>            State Definition window </LI>
<LI>            Expansion Chain window </LI>
</UL>
<P>
          In <tt>fc3</tt>, the output was changed to provide results in degrees
          Kelvin, as well as in Fahrenheit and Celsius. In <tt>fc4</tt>, a
          production, P3a, is added to accept input in degrees Kelvin.
          Since there is no such thing as a negative temperature on
          the Kelvin scale, it seemed appropriate to require that the
          temperature be an unsigned number. This, however, caused a
          conflict, i.e. an ambiguity, in <tt>fc4</tt>. In fact, it caused four
          conflicts to be diagnosed, all of which have a common
          source.
<P>

<H3> Finding a Conflict </H3>
          If you run AnaGram and analyze <tt>fc4</tt> you will find that
          AnaGram finds conflicts in the grammar. To determine the nature
          of the conflicts, you should first open the Conflicts
          window. The Conflicts window is available via the Browse Menu
          or a Control Panel button.
<P>
          The first thing you see is that there are conflicts in
          states S005 and S025. In each state, one conflict occurs
          because a decimal point can either be shifted, in accordance
          with rule R021, or it can reduce rule R014, an empty rule,
          or null production. The other conflict occurs because a
          decimal digit can either be shifted, in accordance with rule
          R022, or it can reduce rule R014, the same rule that gives
          trouble with the decimal point.
<P>
          The first step in understanding the conflict is to see rules
          R014, R021, and R022 in context. The Conflicts window is
          synchronized with the syntax file window. Arrange these
          windows so you can see them both at once.
          Then, in the Conflicts window, move the cursor bar up and
          down. In the syntax file window, the cursor will move
          between productions for number, unsigned number and integer.
<P>
          Although it is possible to recognize rules R021 and R022 in
          the grammar, there is no explicit null production in the
          grammar. To find out for sure what rule R014 is, pop up the
          Rule Table (listed in the Browse menu) and look for R014. It
          turns out that R014 is the null production that corresponds
          to an optional plus sign, written '+'? in the grammar.
<P>
          To get a better idea of what is going on, it is worthwhile to
          find out what the parser is expecting to see in states S005
          or S025. To find out, click the right mouse button in the
          Conflicts window on any line describing a state S005
          conflict to pop up the Auxiliary Windows menu. Select State
          Definition to find out what state S005 is all about. It
          seems that state S005 occurs when the parser has skipped
          over the initial white space and is about to begin dealing
          with the actual input. But, looking at this, it is still not
          clear why a decimal digit or a decimal point is ambiguous.
          The Expansion Chain windows for rules R021, R022 and R014
          will show how these rules are derived from the
          characteristic rule for state S005.
<P>

          Return to the Conflicts window. Click the right mouse button
          on rule R021, the rule that expects the decimal digit,
          to pop up the Auxiliary Windows menu. Select
          Expansion Chain. The Expansion Chain window shows how rule
          R021 derives from the characteristic rule for the state.
          Each line in this window is a grammar rule produced by the
          marked token  in the rule on
          the previous line.
<P>
          Now return to the Conflicts window, and get the Expansion
          Chain window for rule R014. Rearrange the windows on the
          screen so you can compare the Expansion Chain windows for
          rules R014 and R021. Note that rule R014 derives from the
          production
<PRE>
        temperature
         -&gt; number, 'c' + 'C'
</PRE>
          and rule R021 derives from the production
<PRE>
        temperature
         -&gt; unsigned number, 'k' + 'K'
</PRE>
<P>
          It is now possible to see the nature of the conflict. On the
          one hand, if the input is a supposed to be a Kelvin
          temperature, the parser can go right ahead accumulating a
          number. On the other hand, if the input is supposed to be a
          Celsius number, the parser has to first acknowledge the
          optional plus sign.
<P>
          It is the nature of LALR parsers that they can keep track of
          many threads of possible parses simultaneously as long as
          they don't have to do any reductions. When they come to the
          end of a rule, however, they are forced to decide whether
          the rule has been successfully matched. In this case, the
          parser is at the end of a null production which arises from
          the virtual production <CODE> '+'? </CODE> in P6 and is forced to decide
          whether this null production has been matched. The conflict
          diagnostics discussed above say that if the next token is a
          digit or a decimal point, the parser cannot decide between
          several possibilities. That is, it cannot decide whether or
          not it has seen an elided plus sign. In effect the parser is
          being required, because of the null production, to make a
          premature decision as to whether a Kelvin or Celsius
          temperature is present in the input.
<P>
          The conflicts can be eliminated by rewriting the grammar so
          the parser will not come to the end of a rule and be forced
          to choose among the several threads of the parse until it
          encounters the determining letter 'f', 'c', or 'k'.
<P>
          Two ways to remove the conflict are illustrated. The first is
          found in FC4A. In this grammar, production P6 has been
          replaced with two productions, P6x and P6y. This is a
          standard method of rewriting a grammar to eliminate null
          productions. If you analyze <tt>fc4a</tt>, you will find that it no
          longer has a conflict, so this solves the problem.
<P>
          However, if you look at the <tt>fc4a</tt> grammar closely, you will
          notice that +23.7K is not acceptable input, although common
          sense suggests that you ought to be able to use a plus sign
          on Kelvin temperatures. <tt>fc4b</tt> shows another way to fix the
          grammar which deals with this quibble. In this grammar,
          production P3a has been modified to allow an optional plus
          sign before a Kelvin temperature. If you analyze <tt>fc4b</tt>, you
          will find that this change also solves the problem.
<P>
          In both these instances, the technique used to resolve the
          conflicts was to rewrite the grammar so that there are no
          differences between constructs upstream of the point where
          they diverge. Another way to put it is this: In the original
          grammar Kelvin temperatures were distinguished from
          Fahrenheit and Celsius not only by the 'K' suffix, but also by the
          optional plus sign. The essence of both fixes to the problem
          is to remove this distinction.
<P>
          Other AnaGram windows available from the Auxiliary Windows
          popup menu for the Conflicts window such as Rule Derivation, Token
          Derivation, Conflict Trace and Problem States are helpful in
          tracking down conflicts. See the help messages and your
          AnaGram User's Guide for further details.
<P>

<H3>Testing FC4A and FC4B </H3>
          Build and compile <tt>fc4a</tt> and <tt>fc4b</tt> in the
          same way you built
          previous versions. When you run the parsers, try using
          Kelvin temperatures with and without leading plus signs.
<P>
          The "test file mask" was changed to use files with the
          extension ".fc4" in the File Trace. <tt>test.fc4</tt> can be
          used as
          input either for the parsers themselves or for the File
          Trace.
<P>


<H2> FC5 </H2>
          <tt>fc5</tt> illustrates the use of an event driven parser. The
          grammar is essentially the same as for <tt>fc4b</tt>. The primary
          difference is in the method of providing input to the
          parser. The following features of AnaGram are introduced in
          <tt>fc5</tt>:
<UL>
<LI>            event driven parser </LI>
<LI>            embedded C </LI>
<LI>            main program </LI>
<LI>            initializing and calling the parser </LI>
</UL>

          Statement C6 in the configuration section causes AnaGram to
          build an event driven parser. An event driven parser is
          first explicitly initialized and then called once for each
          input unit.
<P>
          A small main program has been included in a block of
          embedded C at the end of the file. This program calls the
          initializer and then reads characters from <CODE>stdin</CODE> and passes
          them on to the parser. Previous <tt>fc</tt> programs did not
          include a
          main program, but relied on AnaGram to create one. However,
          AnaGram does not automatically create a main program for
          parsers which are event driven, use pointer input, or have
          any embedded C. Therefore a main program is necessary in
          this syntax file.
<P>
          Note that the default names for the initializer and parser
          are <CODE>init_fc5</CODE> and <CODE>fc5</CODE> respectively.
          Only event driven parsers
          require that the user explicitly call the initializer
          function.
<P>
          In addition, a global constant, "zero", was defined in the
          embedded C to provide the value of absolute zero, and the
          reduction procedures were modified to refer to "zero"
          instead of the explicit value.
<P>
          This illustrates an important point about the C parser file
          that AnaGram builds: All blocks of embedded C precede the
          reduction procedures, so that the reduction procedures can
          access all variables and definitions included in embedded
          C, no matter where they are located in the file.
<P>

<H3>Testing FC5 </H3>
          Since <tt>fc5</tt> uses the same grammar as <tt>fc4</tt>,
          you can use the same
          test files for <tt>fc5</tt> as for <tt>fc4</tt>.
</P>

<BR>

<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
<P>
<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
                WIDTH=181 HEIGHT=25>
<BR CLEAR="right">
<P>
Back to <A HREF="../index.html">Index</A>
<P>
<ADDRESS><FONT SIZE="-1">
                  AnaGram parser generator - examples<BR>
                  Fahrenheit-Celsius Converter<BR>
                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
                  All Rights Reserved.<BR>
</FONT></ADDRESS>

</BODY>
</HTML>