AnaGram interim repo (temporary): doc/misc/html/examples/fc.html comparison

comparison doc/misc/html/examples/fc.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.

author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500 (2007-12-22)
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:13d2b8934445
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE> Fahrenheit-Celsius Converter</TITLE>
+</HEAD>
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+TEXT="#000000" LINK="#0033CC"
+VLINK="#CC0033" ALINK="#CC0099">
+<P>
+<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
+WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to <A HREF="../index.html">Index</A>
+<P>
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2  >
+<P>
+<H1> Fahrenheit-Celsius Converter</H1>
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2  >
+<P>
+<H2>Introduction</H2>
+<P>
+Conversion of temperatures from Fahrenheit to Celsius is a
+traditional starting point for learning how to use a
+programming language. This directory contains a graded sequence
+of Fahrenheit to Celsius conversion programs, starting with
+a very simple case and working up to one of some complexity.
+This sequence of programs illustrates an important aspect of
+syntax directed programming: In contrast to conventional
+programming methods it is quite easy to begin with a simple
+case and then extend it to more complex situations.
+<P>
+All of these programs accept input from <tt>stdin</tt> and write
+output to <tt>stdout</tt>. These programs are somewhat exceptional,
+since, except for FC5, they do not have any embedded C and
+therefore do not require explicit definition of a main
+program.
+<P>
+<tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
+conversion programs. It expects the user to type a positive
+integer value, assumed to be a Fahrenheit temperature, which
+it converts to Celsius. It then exits.
+<P>
+<tt>fc2</tt>, the next example, is a somewhat more interesting
+Fahrenheit-Celsius converter. This time the input stream may
+contain any number of temperatures, either Fahrenheit or
+Celsius, each terminated by a newline character. The
+program will continue until it encounters an end of file. If it
+encounters a syntax error in the input, it will skip to the
+next newline character and continue. <tt>fc2</tt> has been set up to
+illustrate the usage of the File Trace feature of AnaGram.
+<P>
+<tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
+floating point
+arithmetic, so that it can deal with non-integral values,
+and it allows optional white space in the input, except
+within numbers. In addition, it changes the output format,
+so that results are printed in degrees Kelvin, as well as in
+Fahrenheit and Celsius.
+<font size=-1>(Yes, we know that in Newspeak the official
+usage is "Kelvins", not "degrees Kelvin". Shush.)</font>
+<P>
+<tt>fc4</tt> illustrates a shift-reduce conflict which arose when
+modifying the <tt>fc3</tt> grammar to allow input in degrees Kelvin.
+You should probably skip this example until you encounter a
+shift-reduce conflict in one of your own grammars. <tt>fc4a</tt> and
+<tt>fc4b</tt> are two different resolutions of the conflict.
+<P>
+<tt>fc5</tt> illustrates the use of an event driven parser. The
+actual grammar is the same as <tt>fc4b</tt>. The only difference is
+in the method of providing input to the parser.
+<P>
+<H2>FC1</H2>
+<tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
+conversion programs. It expects the user to type an integer
+value, assumed to be a Fahrenheit temperature, which it
+converts to Celsius. It then exits.
+<P>
+The following features of AnaGram are introduced in <tt>fc1</tt>:
+<UL>
+<LI>     recursive definition of tokens </LI>
+<LI>     definition of a set as a range of characters</LI>
+<LI>     token type declaration </LI>
+<LI>     default token type </LI>
+<LI>     passing token values to reduction procedures </LI>
+<LI>     long and short form reduction procedures </LI>
+</UL>
+<P>
+<tt>fc1</tt> defines two nonterminal tokens, "grammar", which
+describes the entire input stream, and "integer", which
+describes a simple unsigned integer value. "grammar", defined
+by one production, describes the input as consisting of an
+"integer" followed by a newline character. There is a
+following reduction procedure to print out both the input
+and converted values.
+<P>
+"integer" is recursively defined by two productions. The
+first production says that an "integer" may be represented
+by a single decimal digit. '0-9' represents the set of ascii
+characters on the range '0' through '9'. The token can be
+matched by any character from this set. The second
+production contains the recursion. It says that the combination of
+any "integer" followed by another decimal digit is also an
+integer. Note that the left side is the same for these two
+productions and it need not be repeated.
+<P>
+Note the type cast preceding "integer". This type cast
+defines the data type of the semantic value of "integer" to
+be int. When the parser stores a token value for "integer"
+on its value stack or retrieves a value from the stack, the
+type of the data transmitted will be int.
+<P>
+Since there is no type cast for the token "grammar", the
+data type for the semantic value of "grammar" is given by
+the "default token type" configuration parameter, which
+defaults to void.
+<P>
+The semantic values of the tokens in a grammar rule may be
+passed to the associated reduction procedure. The name of
+the variable in the reduction procedure is simply appended
+to the token name or expression in the rule with a colon as
+a separator.
+<P>
+All three reduction procedures in <tt>fc1</tt> operate on
+the semantic
+values of tokens on the parser stack as parameters. In
+the first reduction procedure, the variable <tt>f</tt> represents the
+value of the integer typed by the user. It is taken as a
+Fahrenheit temperature and converted to Celsius by the
+reduction procedure. This reduction procedure uses the long
+form, consisting of an equal sign followed by a block of C
+code.
+<P>
+The reduction procedures for the two productions for "integer"
+convert the integer from ascii form as typed by the
+user to binary form. The first reduction procedure
+calculates the value of a single digit integer. The second
+reduction procedure calculates the value for an integer with more
+than one digit. Notice that these reduction procedures both
+use the short form: a C expression terminated by a semicolon.
+The value of the expression is saved as the semantic
+value of the reduced token.
+<P>
+<H3>Testing FC1</H3>
+Run AnaGram and build a parser, <tt>fc1.c</tt>. Compile it
+and link it
+with your C compiler. Run <tt>fc1</tt> from the command line.
+Type an integer and press
+Enter. <tt>fc1</tt> will print out Fahrenheit and Celsius
+temperatures.
+<P>
+<H2>FC2</H2>
+The next example of a syntax file, <tt>fc2.syn</tt>, is a somewhat
+more interesting Fahrenheit-Celsius converter. This time the
+input stream may contain any number of temperatures. The
+program will continue until it encounters an end of file. If
+it encounters a syntax error in the input, it will skip to
+the next newline character and continue. <tt>fc2</tt> has
+been set up
+to illustrate the File Trace feature of AnaGram.
+<P>
+The following features of AnaGram are introduced in <tt>fc2</tt>:
+<UL>
+<LI>          configuration section </LI>
+<LI>          character set defined by set union </LI>
+<LI>          end of file token </LI>
+<LI>          virtual productions </LI>
+<LI>          use of '?' to define a virtual production </LI>
+<LI>          error token resynchronization </LI>
+<LI>          File Trace </LI>
+</UL>
+<P>
+<tt>fc2</tt> allows the temperature values to be signed
+integers which
+may be either Fahrenheit or Celsius, as determined by a
+following 'f' or 'c'. Each temperature value to be
+converted must be followed by a newline. Spaces in the input
+are not allowed.
+<P>
+To facilitate testing, a configuration section has been
+added at the beginning of the file to set two configuration
+parameters, discussed below. The lines specifying the
+parameters have been labeled C1 and C2 at the right margin
+to make them easy to refer to in this documentation.
+Similarly, productions have been labeled P1, P2, etc.
+<P>
+After the configuration section, an end of file token, eof,
+is defined. Remember that when using stream I/O, the end of
+file is signalled by a -1.
+<P>
+The first production, P1, describes the entire input file as
+an optional sequence of temperatures followed by an end of
+file.
+<P>
+The expression <CODE>[temperature, '\n']...</CODE> is a "virtual
+production". The square brackets indicate the rule inside the
+brackets is optional. The ellipsis (<CODE>...</CODE>)
+indicates that the
+rule may be repeated an arbitrary number of times.
+<P>
+Productions P2 and P3 define "temperature" in the normal
+case. Production P4 controls error recovery, described
+below.In AnaGram, <CODE>'f' + 'F'</CODE> represents the set
+of characters
+containing both upper and lower case 'f'. (The plus sign is
+the union operator of set theory.) Either an upper or lower
+case 'f' in the input will match this set. <CODE>'c' +
+'C'</CODE> is
+similarly interpreted. Thus a temperature consists of a
+number followed by an 'f' or 'c' which can be either upper
+or lower case. The reduction procedures, of course, are
+different for 'f' and 'c'.
+<P>
+Productions P5 and P6 define "number" as consisting of an
+integer with an optional sign. <CODE>'+'?</CODE> is a virtual production.
+The question mark indicates that the preceding element, in
+this case the plus sign, is optional. These productions
+allow you to write numbers in the form 17, +17, or -17.
+<P>
+Productions P7 and P8 define "integer" exactly as in <tt>fc1</tt>.
+<P>
+<H3> Recovering and Continuing after a Syntax Error </H3>
+The production P4 controls error recovery. "error" is a
+special token in an AnaGram grammar, which can be matched by
+any sequence of input which contains a syntax error. If your
+grammar has an error token, when your parser encounters a
+syntax error it looks to see if there is an error token to
+match with the syntax error. If "error" is not admissible in
+the current state, it discards the previous token on the
+input stack and looks again. It continues until it gets back
+to a state where "error" is acceptable input or the stack is
+empty. If the stack is empty, it terminates the parse.
+Otherwise, it then looks to see if the next input token is
+admissible. If so, the parse continues. If not, the token is
+discarded and the parser reads input until it finds an
+acceptable token or the end of file. In this example, the
+parser will read characters until it finds a newline character
+At this point the parse will continue as though nothing
+had happened. This process is called "error token
+resynchronization". It is one of several ways to continue after a
+parser detects a syntax error in its input stream.
+<P>
+<H3>  Configuration Parameters </H3>
+Two configuration parameters have been set in the
+configuration section of <tt>fc2</tt> to facilitate testing
+using the File
+Trace. The first, test file mask, limits the choice of test
+files to be used with the File Trace option to files with
+the extension <tt>.fc2</tt>. The second, traditional engine, turns
+off certain optimizations AnaGram normally builds into its
+parsers. When the traditional engine switch is set, the
+parsers AnaGram builds use only the four traditional parser
+actions: shift, reduce, error, and accept. Otherwise,
+AnaGram parsers use a number of compound actions in order to
+reduce the size of the parsing tables and increase the speed
+of the parser. In this case, the traditional engine switch
+has been turned on in order to make the behavior of the
+parser as seen with the File Trace correspond to textbook
+behavior.
+<P>
+<H3> Testing FC2 </H3>
+Test <tt>fc2</tt> just as you did <tt>fc1</tt>: Run AnaGram
+and build a
+parser, <tt>fc2.c</tt>. Compile it and link it with your C
+compiler. Run
+<tt>fc2</tt> from the command line. Type an integer, with or
+without a sign, the letter 'f'
+or 'c', and press Enter. <tt>fc2</tt> will print out Fahrenheit and
+Celsius temperatures. Repeat until you are satisfied the
+program works. Try making a few deliberate typos to test the
+error token resynchronization. Note that the parser
+automatically provides error diagnostics. These diagnostics
+are created by the default <tt>SYNTAX_ERROR</tt> macro.
+<P>
+Type ^Z and Enter (Windows) or ^D (Unix) to generate an end
+	  of file and end the program. (Of course, you
+could also use ^C or ^Brk.)
+<P>
+Alternatively, you may use the test file, <tt>test.fc2</tt>,
+which has
+been provided for use with the File Trace (see below).
+Simply run <tt>fc2</tt> with input redirection to take input from
+<tt>test.fc2</tt>. At the command prompt, type:
+<PRE>
+fc2 &lt; test.fc2
+</PRE>
+<P>
+<H2>  File Trace </H2>
+The File Trace feature of AnaGram allows you to test a
+grammar without actually building a parser. This enables you to
+completely decouple the debugging of the grammar from the
+debugging of the reduction procedures. You can try out test
+files before you have written anything more than the
+grammar. This allows for very early testing in your projects.
+<P>
+File Trace allows you to see in fine detail just exactly how
+your parser will analyze an input file. A File Trace consists of  a
+window with various panes so you can see what is going on,
+and an interpretive parser which works in the background.
+<P>
+You can select File Trace from the Action
+menu once you have analyzed your grammar. For the <tt>fc2</tt>
+example, you will be offered a choice of test files with
+extension <tt>.fc2</tt>. A good choice would be <tt>test1.fc2</tt>.
+The File Trace window will show a Parser Stack
+pane to your left, a Test File pane, which shows you the
+input file you are parsing, to your right, and a Rule Stack
+pane across the bottom of the window.
+<P>
+The way the File Trace parser works is this: Initially none of
+the test file has been parsed. If you double-click  with the
+left mouse button at a point in the Test File pane, the parser
+parses to that point. The unparsed part of the file will be
+colored differently from the parsed part  (in the default
+color scheme, parsed characters
+have a lighter background). To back up the parse to a
+previous location, double-click at that spot, or single-click
+and press Enter or the Synch Parse button at the bottom of
+the File Trace window. To check a file for syntax errors, all
+you need to do is to click the Parse File button. If there is
+a syntax error, the parse will not advance beyond the error
+point. Normally, however, you will probably want to proceed
+more deliberately, moving the cursor one character at a
+time.
+<P>
+If you wish to see even finer detail, you may make the
+parser work in single step mode. by clicking on Single Step
+or pressing Enter. Each time you click on the Single Step
+button or press Enter, the parser will perform one parser
+action. Note that in its normal configuration, AnaGram
+produces parsers that use a number of parser actions more
+complex than the traditional shift and reduce actions.
+<P>
+The Parser Stack pane shows the levels of the parser stack, the
+state numbers on the stack and the tokens that have been
+recognized so far.
+As you advance through the test file, you can see by
+looking at the Parser Stack pane how the parser stack changes
+as characters are shifted in and reductions occur. The Rule
+Stack pane is an alternate view of the parser stack showing
+the grammar rules in play at any moment. Notice how the
+syntax file window is synched with the Rule Stack.
+<P>
+If the parse position is not located at the blinking cursor,
+the Single Step button will be changed to read "Synch Parse".
+Clicking on the button will move the parse position to the
+cursor.
+<P>
+If you now click on tokens at various levels in the Token
+Stack, the Test File characters corresponding to these tokens
+will be highlighted. You can restart the parse at any level
+by double-clicking just preceding the highlight. The Rule
+Stack is also synched with the Token Stack and Test File
+panes.
+<P>
+You may interrupt the File Trace at any time to inspect any
+other window without interfering with the File Trace.
+Whenever you come back to it, you can proceed as though
+nothing has happened.
+<P>
+If you have a long file, a complex grammar, or a (very) slow
+computer, it can sometimes take a while for the parse to catch
+up with the cursor. If you have a long test file and press
+Parse File to move the cursor to the end of the file, the
+parser has a lot of computation to do to catch up.
+<P>
+For further details about the File Trace, please refer to the
+AnaGram User's Guide and the on-line documentation.
+<P>
+<BR>
+<H2> FC3 </H2>
+<tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
+floating point
+arithmetic, so that it can deal with non-integral values,
+and it allows optional white space, including comments, in
+the input. In addition, it changes the output format, so
+that results are printed in degrees Kelvin, as well as in
+Fahrenheit and Celsius.
+<P>
+The following features of AnaGram are introduced in <tt>fc3</tt>:
+<UL>
+<LI>         Setting the default token type </LI>
+<LI>         The "disregard" statement </LI>
+<LI>         The "lexeme" statement </LI>
+<LI>         Right recursion </LI>
+<LI>         Default value of a reduction token </LI>
+</UL>
+<P>
+<H3> Floating Point Arithmetic </H3>
+To deal with floating point arithmetic, a number of new
+productions have been added and two productions have been
+changed. Productions P5 and P6 have been changed to define a
+"number" in terms of an "unsigned number" instead of an
+"integer". Productions P6a, P6b, and P6c define "unsigned
+number" in terms of its integer part and its fraction part.
+Productions P9 and P10 define the fraction part of the
+number. Note that "fraction" is described using right
+recursion rather than left recursion. This makes the reduction
+procedures neater. Since reduction does not occur until
+a rule is complete, note that with right recursion each new
+digit causes the stack depth to increase by one. You can
+observe this with the File Trace.
+<P>
+In order to replace integer arithmetic with floating point
+arithmetic, statement C3 was added to the configuration
+section of the grammar. It declares the default token type,
+the type assigned to nonterminal tokens absent a specific
+declaration, to be "double". Since we are not interested in
+values for "grammar" and "temperature", they have been
+explicitly cast to "void".
+<P>
+Note that production P6a does not have a reduction procedure.
+If a rule has no reduction procedure the value of the
+reduction token defaults to the value of the first element
+in the rule. In this case, the value of "unsigned number",
+in the absence of a reduction procedure, is taken to be the
+value of "integer".
+<P>
+<H3> Skipping White Space </H3>
+Two statements, C4 and C5, are used to skip over uninteresting
+white space in the input to <tt>fc3</tt>. The "disregard" statement
+instructs AnaGram to rewrite your grammar in a standard
+way so that your parser will skip over any instance of the
+"white space" token that occurs between lexemes, or lexical
+units, in the input to your parser. Of course, you can use
+any token name you wish in a "disregard" statement. You can
+even have multiple "disregard" statements and all of the
+tokens you specify will be disregarded.
+<P>
+The "lexeme" statement is used to declare that certain
+nonterminal tokens are each to be considered as indivisible
+lexical units, or lexemes, from the point of view of lexical
+analysis, so that the "disregard" statement is inoperative
+inside the nonterminal tokens listed. In this case, statement
+C5 simply guarantees that white space will not be
+allowed inside a number. All terminal tokens are automatically
+lexemes. <!-- when not part of a larger lexeme. -->
+<P>
+<!--
+It would be nice to rewrite this so the example expansion is
+vaguely close to syntactically legal.
+-->
+To make your parser skip white space, AnaGram renames and
+redefines the lexemes in your grammar and defines a number
+of new tokens. For example, if '+' is a lexeme in your
+grammar and your grammar is to disregard "white space",
+AnaGram will rename the plus sign token as '+'%. It then
+introduces a new production in your grammar as follows:
+<PRE>
+'+'
+-&gt; '+'%, white space?...
+</PRE>
+This means that a plus sign followed by some white space
+will now be treated the same, syntactically, as a plus sign
+alone. The percent sign (a degree sign in AnaGram 1.x) is
+used to indicate the original, or "pure",
+definition of the token.
+<P>
+Productions P11 and P12 together define what is meant by
+white space in this grammar. P11 defines white space to
+include blanks and tab characters. P12 includes C style
+comments (not nested).
+<P>
+The white space defined in P11 and P12 does not include
+newline characters. There is a good reason for this. The
+grammar uses newline characters as delimiters, marking the
+end of a "temperature". In order to allow blank lines,
+production P1 was modified to make the temperature optional.
+<P>
+To allow "//" style comments, a new token, "end of line",
+defined by production P13, was added. In production P1, '\n'
+was then replaced by "end of line".
+<P>
+<H3> Testing FC3 </H3>
+The "test file mask" was changed to use files with the
+extension ".fc3" in the File Trace. TEST.FC3 can be used as
+input for the File Trace.
+<P>
+Build and compile <tt>fc3</tt> in the same way as previous versions.
+When you run it, try using numbers with decimal fractions.
+Try typing blanks in various places to see how the parser
+deals with them. Try redirecting input from <tt>test.fc3</tt>.
+<P>
+<H2> FC4 </H2>
+<tt>fc4</tt> illustrates a shift-reduce conflict in a grammar. You
+should probably skip this example until you encounter a
+shift-reduce conflict in one of your own grammars. In the
+meantime, skip ahead to <tt>fc5</tt>.
+<P>
+The following features of AnaGram are introduced in <tt>fc4</tt>:
+<UL>
+<LI>            Conflicts window </LI>
+<LI>            Auxiliary Windows menu </LI>
+<LI>            State Definition window </LI>
+<LI>            Expansion Chain window </LI>
+</UL>
+<P>
+In <tt>fc3</tt>, the output was changed to provide results in degrees
+Kelvin, as well as in Fahrenheit and Celsius. In <tt>fc4</tt>, a
+production, P3a, is added to accept input in degrees Kelvin.
+Since there is no such thing as a negative temperature on
+the Kelvin scale, it seemed appropriate to require that the
+temperature be an unsigned number. This, however, caused a
+conflict, i.e. an ambiguity, in <tt>fc4</tt>. In fact, it caused four
+conflicts to be diagnosed, all of which have a common
+source.
+<P>
+<H3> Finding a Conflict </H3>
+If you run AnaGram and analyze <tt>fc4</tt> you will find that
+AnaGram finds conflicts in the grammar. To determine the nature
+of the conflicts, you should first open the Conflicts
+window. The Conflicts window is available via the Browse Menu
+or a Control Panel button.
+<P>
+The first thing you see is that there are conflicts in
+states S005 and S025. In each state, one conflict occurs
+because a decimal point can either be shifted, in accordance
+with rule R021, or it can reduce rule R014, an empty rule,
+or null production. The other conflict occurs because a
+decimal digit can either be shifted, in accordance with rule
+R022, or it can reduce rule R014, the same rule that gives
+trouble with the decimal point.
+<P>
+The first step in understanding the conflict is to see rules
+R014, R021, and R022 in context. The Conflicts window is
+synchronized with the syntax file window. Arrange these
+windows so you can see them both at once.
+Then, in the Conflicts window, move the cursor bar up and
+down. In the syntax file window, the cursor will move
+between productions for number, unsigned number and integer.
+<P>
+Although it is possible to recognize rules R021 and R022 in
+the grammar, there is no explicit null production in the
+grammar. To find out for sure what rule R014 is, pop up the
+Rule Table (listed in the Browse menu) and look for R014. It
+turns out that R014 is the null production that corresponds
+to an optional plus sign, written '+'? in the grammar.
+<P>
+To get a better idea of what is going on, it is worthwhile to
+find out what the parser is expecting to see in states S005
+or S025. To find out, click the right mouse button in the
+Conflicts window on any line describing a state S005
+conflict to pop up the Auxiliary Windows menu. Select State
+Definition to find out what state S005 is all about. It
+seems that state S005 occurs when the parser has skipped
+over the initial white space and is about to begin dealing
+with the actual input. But, looking at this, it is still not
+clear why a decimal digit or a decimal point is ambiguous.
+The Expansion Chain windows for rules R021, R022 and R014
+will show how these rules are derived from the
+characteristic rule for state S005.
+<P>
+Return to the Conflicts window. Click the right mouse button
+on rule R021, the rule that expects the decimal digit,
+to pop up the Auxiliary Windows menu. Select
+Expansion Chain. The Expansion Chain window shows how rule
+R021 derives from the characteristic rule for the state.
+Each line in this window is a grammar rule produced by the
+marked token  in the rule on
+the previous line.
+<P>
+Now return to the Conflicts window, and get the Expansion
+Chain window for rule R014. Rearrange the windows on the
+screen so you can compare the Expansion Chain windows for
+rules R014 and R021. Note that rule R014 derives from the
+production
+<PRE>
+temperature
+-&gt; number, 'c' + 'C'
+</PRE>
+and rule R021 derives from the production
+<PRE>
+temperature
+-&gt; unsigned number, 'k' + 'K'
+</PRE>
+<P>
+It is now possible to see the nature of the conflict. On the
+one hand, if the input is a supposed to be a Kelvin
+temperature, the parser can go right ahead accumulating a
+number. On the other hand, if the input is supposed to be a
+Celsius number, the parser has to first acknowledge the
+optional plus sign.
+<P>
+It is the nature of LALR parsers that they can keep track of
+many threads of possible parses simultaneously as long as
+they don't have to do any reductions. When they come to the
+end of a rule, however, they are forced to decide whether
+the rule has been successfully matched. In this case, the
+parser is at the end of a null production which arises from
+the virtual production <CODE> '+'? </CODE> in P6 and is forced to decide
+whether this null production has been matched. The conflict
+diagnostics discussed above say that if the next token is a
+digit or a decimal point, the parser cannot decide between
+several possibilities. That is, it cannot decide whether or
+not it has seen an elided plus sign. In effect the parser is
+being required, because of the null production, to make a
+premature decision as to whether a Kelvin or Celsius
+temperature is present in the input.
+<P>
+The conflicts can be eliminated by rewriting the grammar so
+the parser will not come to the end of a rule and be forced
+to choose among the several threads of the parse until it
+encounters the determining letter 'f', 'c', or 'k'.
+<P>
+Two ways to remove the conflict are illustrated. The first is
+found in FC4A. In this grammar, production P6 has been
+replaced with two productions, P6x and P6y. This is a
+standard method of rewriting a grammar to eliminate null
+productions. If you analyze <tt>fc4a</tt>, you will find that it no
+longer has a conflict, so this solves the problem.
+<P>
+However, if you look at the <tt>fc4a</tt> grammar closely, you will
+notice that +23.7K is not acceptable input, although common
+sense suggests that you ought to be able to use a plus sign
+on Kelvin temperatures. <tt>fc4b</tt> shows another way to fix the
+grammar which deals with this quibble. In this grammar,
+production P3a has been modified to allow an optional plus
+sign before a Kelvin temperature. If you analyze <tt>fc4b</tt>, you
+will find that this change also solves the problem.
+<P>
+In both these instances, the technique used to resolve the
+conflicts was to rewrite the grammar so that there are no
+differences between constructs upstream of the point where
+they diverge. Another way to put it is this: In the original
+grammar Kelvin temperatures were distinguished from
+Fahrenheit and Celsius not only by the 'K' suffix, but also by the
+optional plus sign. The essence of both fixes to the problem
+is to remove this distinction.
+<P>
+Other AnaGram windows available from the Auxiliary Windows
+popup menu for the Conflicts window such as Rule Derivation, Token
+Derivation, Conflict Trace and Problem States are helpful in
+tracking down conflicts. See the help messages and your
+AnaGram User's Guide for further details.
+<P>
+<H3>Testing FC4A and FC4B </H3>
+Build and compile <tt>fc4a</tt> and <tt>fc4b</tt> in the
+same way you built
+previous versions. When you run the parsers, try using
+Kelvin temperatures with and without leading plus signs.
+<P>
+The "test file mask" was changed to use files with the
+extension ".fc4" in the File Trace. <tt>test.fc4</tt> can be
+used as
+input either for the parsers themselves or for the File
+Trace.
+<P>
+<H2> FC5 </H2>
+<tt>fc5</tt> illustrates the use of an event driven parser. The
+grammar is essentially the same as for <tt>fc4b</tt>. The primary
+difference is in the method of providing input to the
+parser. The following features of AnaGram are introduced in
+<tt>fc5</tt>:
+<UL>
+<LI>            event driven parser </LI>
+<LI>            embedded C </LI>
+<LI>            main program </LI>
+<LI>            initializing and calling the parser </LI>
+</UL>
+Statement C6 in the configuration section causes AnaGram to
+build an event driven parser. An event driven parser is
+first explicitly initialized and then called once for each
+input unit.
+<P>
+A small main program has been included in a block of
+embedded C at the end of the file. This program calls the
+initializer and then reads characters from <CODE>stdin</CODE> and passes
+them on to the parser. Previous <tt>fc</tt> programs did not
+include a
+main program, but relied on AnaGram to create one. However,
+AnaGram does not automatically create a main program for
+parsers which are event driven, use pointer input, or have
+any embedded C. Therefore a main program is necessary in
+this syntax file.
+<P>
+Note that the default names for the initializer and parser
+are <CODE>init_fc5</CODE> and <CODE>fc5</CODE> respectively.
+Only event driven parsers
+require that the user explicitly call the initializer
+function.
+<P>
+In addition, a global constant, "zero", was defined in the
+embedded C to provide the value of absolute zero, and the
+reduction procedures were modified to refer to "zero"
+instead of the explicit value.
+<P>
+This illustrates an important point about the C parser file
+that AnaGram builds: All blocks of embedded C precede the
+reduction procedures, so that the reduction procedures can
+access all variables and definitions included in embedded
+C, no matter where they are located in the file.
+<P>
+<H3>Testing FC5 </H3>
+Since <tt>fc5</tt> uses the same grammar as <tt>fc4</tt>,
+you can use the same
+test files for <tt>fc5</tt> as for <tt>fc4</tt>.
+</P>
+<BR>
+<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
+WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+<P>
+Back to <A HREF="../index.html">Index</A>
+<P>
+<ADDRESS><FONT SIZE="-1">
+AnaGram parser generator - examples<BR>
+Fahrenheit-Celsius Converter<BR>
+Copyright &copy; 1993-1999, Parsifal Software. <BR>
+All Rights Reserved.<BR>
+</FONT></ADDRESS>
+</BODY>
+</HTML>

Mercurial > ~dholland > hg > ag > index.cgi

comparison doc/misc/html/examples/fc.html @ 0:13d2b8934445