comparison doc/misc/html/examples/fc.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2 <HTML>
3 <HEAD>
4 <TITLE> Fahrenheit-Celsius Converter</TITLE>
5 </HEAD>
6
7
8
9
10 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
11 TEXT="#000000" LINK="#0033CC"
12 VLINK="#CC0033" ALINK="#CC0099">
13
14 <P>
15 <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
16 WIDTH=124 HEIGHT=30 >
17 <BR CLEAR="all">
18 Back to <A HREF="../index.html">Index</A>
19 <P>
20 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
21 WIDTH=1010 HEIGHT=2 >
22 <P>
23
24 <H1> Fahrenheit-Celsius Converter</H1>
25 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
26 WIDTH=1010 HEIGHT=2 >
27 <P>
28 <H2>Introduction</H2>
29 <P>
30 Conversion of temperatures from Fahrenheit to Celsius is a
31 traditional starting point for learning how to use a
32 programming language. This directory contains a graded sequence
33 of Fahrenheit to Celsius conversion programs, starting with
34 a very simple case and working up to one of some complexity.
35 This sequence of programs illustrates an important aspect of
36 syntax directed programming: In contrast to conventional
37 programming methods it is quite easy to begin with a simple
38 case and then extend it to more complex situations.
39 <P>
40 All of these programs accept input from <tt>stdin</tt> and write
41 output to <tt>stdout</tt>. These programs are somewhat exceptional,
42 since, except for FC5, they do not have any embedded C and
43 therefore do not require explicit definition of a main
44 program.
45 <P>
46 <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
47 conversion programs. It expects the user to type a positive
48 integer value, assumed to be a Fahrenheit temperature, which
49 it converts to Celsius. It then exits.
50 <P>
51 <tt>fc2</tt>, the next example, is a somewhat more interesting
52 Fahrenheit-Celsius converter. This time the input stream may
53 contain any number of temperatures, either Fahrenheit or
54 Celsius, each terminated by a newline character. The
55 program will continue until it encounters an end of file. If it
56 encounters a syntax error in the input, it will skip to the
57 next newline character and continue. <tt>fc2</tt> has been set up to
58 illustrate the usage of the File Trace feature of AnaGram.
59 <P>
60 <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
61 floating point
62 arithmetic, so that it can deal with non-integral values,
63 and it allows optional white space in the input, except
64 within numbers. In addition, it changes the output format,
65 so that results are printed in degrees Kelvin, as well as in
66 Fahrenheit and Celsius.
67 <font size=-1>(Yes, we know that in Newspeak the official
68 usage is "Kelvins", not "degrees Kelvin". Shush.)</font>
69 <P>
70 <tt>fc4</tt> illustrates a shift-reduce conflict which arose when
71 modifying the <tt>fc3</tt> grammar to allow input in degrees Kelvin.
72 You should probably skip this example until you encounter a
73 shift-reduce conflict in one of your own grammars. <tt>fc4a</tt> and
74 <tt>fc4b</tt> are two different resolutions of the conflict.
75 <P>
76 <tt>fc5</tt> illustrates the use of an event driven parser. The
77 actual grammar is the same as <tt>fc4b</tt>. The only difference is
78 in the method of providing input to the parser.
79 <P>
80
81 <H2>FC1</H2>
82 <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius
83 conversion programs. It expects the user to type an integer
84 value, assumed to be a Fahrenheit temperature, which it
85 converts to Celsius. It then exits.
86 <P>
87 The following features of AnaGram are introduced in <tt>fc1</tt>:
88 <UL>
89 <LI> recursive definition of tokens </LI>
90 <LI> definition of a set as a range of characters</LI>
91 <LI> token type declaration </LI>
92 <LI> default token type </LI>
93 <LI> passing token values to reduction procedures </LI>
94 <LI> long and short form reduction procedures </LI>
95 </UL>
96
97 <P>
98 <tt>fc1</tt> defines two nonterminal tokens, "grammar", which
99 describes the entire input stream, and "integer", which
100 describes a simple unsigned integer value. "grammar", defined
101 by one production, describes the input as consisting of an
102 "integer" followed by a newline character. There is a
103 following reduction procedure to print out both the input
104 and converted values.
105 <P>
106 "integer" is recursively defined by two productions. The
107 first production says that an "integer" may be represented
108 by a single decimal digit. '0-9' represents the set of ascii
109 characters on the range '0' through '9'. The token can be
110 matched by any character from this set. The second
111 production contains the recursion. It says that the combination of
112 any "integer" followed by another decimal digit is also an
113 integer. Note that the left side is the same for these two
114 productions and it need not be repeated.
115 <P>
116 Note the type cast preceding "integer". This type cast
117 defines the data type of the semantic value of "integer" to
118 be int. When the parser stores a token value for "integer"
119 on its value stack or retrieves a value from the stack, the
120 type of the data transmitted will be int.
121 <P>
122 Since there is no type cast for the token "grammar", the
123 data type for the semantic value of "grammar" is given by
124 the "default token type" configuration parameter, which
125 defaults to void.
126 <P>
127 The semantic values of the tokens in a grammar rule may be
128 passed to the associated reduction procedure. The name of
129 the variable in the reduction procedure is simply appended
130 to the token name or expression in the rule with a colon as
131 a separator.
132 <P>
133 All three reduction procedures in <tt>fc1</tt> operate on
134 the semantic
135 values of tokens on the parser stack as parameters. In
136 the first reduction procedure, the variable <tt>f</tt> represents the
137 value of the integer typed by the user. It is taken as a
138 Fahrenheit temperature and converted to Celsius by the
139 reduction procedure. This reduction procedure uses the long
140 form, consisting of an equal sign followed by a block of C
141 code.
142 <P>
143 The reduction procedures for the two productions for "integer"
144 convert the integer from ascii form as typed by the
145 user to binary form. The first reduction procedure
146 calculates the value of a single digit integer. The second
147 reduction procedure calculates the value for an integer with more
148 than one digit. Notice that these reduction procedures both
149 use the short form: a C expression terminated by a semicolon.
150 The value of the expression is saved as the semantic
151 value of the reduced token.
152 <P>
153
154 <H3>Testing FC1</H3>
155 Run AnaGram and build a parser, <tt>fc1.c</tt>. Compile it
156 and link it
157 with your C compiler. Run <tt>fc1</tt> from the command line.
158 Type an integer and press
159 Enter. <tt>fc1</tt> will print out Fahrenheit and Celsius
160 temperatures.
161 <P>
162
163
164 <H2>FC2</H2>
165 The next example of a syntax file, <tt>fc2.syn</tt>, is a somewhat
166 more interesting Fahrenheit-Celsius converter. This time the
167 input stream may contain any number of temperatures. The
168 program will continue until it encounters an end of file. If
169 it encounters a syntax error in the input, it will skip to
170 the next newline character and continue. <tt>fc2</tt> has
171 been set up
172 to illustrate the File Trace feature of AnaGram.
173 <P>
174 The following features of AnaGram are introduced in <tt>fc2</tt>:
175 <UL>
176 <LI> configuration section </LI>
177 <LI> character set defined by set union </LI>
178 <LI> end of file token </LI>
179 <LI> virtual productions </LI>
180 <LI> use of '?' to define a virtual production </LI>
181 <LI> error token resynchronization </LI>
182 <LI> File Trace </LI>
183 </UL>
184 <P>
185 <tt>fc2</tt> allows the temperature values to be signed
186 integers which
187 may be either Fahrenheit or Celsius, as determined by a
188 following 'f' or 'c'. Each temperature value to be
189 converted must be followed by a newline. Spaces in the input
190 are not allowed.
191 <P>
192 To facilitate testing, a configuration section has been
193 added at the beginning of the file to set two configuration
194 parameters, discussed below. The lines specifying the
195 parameters have been labeled C1 and C2 at the right margin
196 to make them easy to refer to in this documentation.
197 Similarly, productions have been labeled P1, P2, etc.
198 <P>
199 After the configuration section, an end of file token, eof,
200 is defined. Remember that when using stream I/O, the end of
201 file is signalled by a -1.
202 <P>
203 The first production, P1, describes the entire input file as
204 an optional sequence of temperatures followed by an end of
205 file.
206 <P>
207 The expression <CODE>[temperature, '\n']...</CODE> is a "virtual
208 production". The square brackets indicate the rule inside the
209 brackets is optional. The ellipsis (<CODE>...</CODE>)
210 indicates that the
211 rule may be repeated an arbitrary number of times.
212 <P>
213 Productions P2 and P3 define "temperature" in the normal
214 case. Production P4 controls error recovery, described
215 below.In AnaGram, <CODE>'f' + 'F'</CODE> represents the set
216 of characters
217 containing both upper and lower case 'f'. (The plus sign is
218 the union operator of set theory.) Either an upper or lower
219 case 'f' in the input will match this set. <CODE>'c' +
220 'C'</CODE> is
221 similarly interpreted. Thus a temperature consists of a
222 number followed by an 'f' or 'c' which can be either upper
223 or lower case. The reduction procedures, of course, are
224 different for 'f' and 'c'.
225 <P>
226 Productions P5 and P6 define "number" as consisting of an
227 integer with an optional sign. <CODE>'+'?</CODE> is a virtual production.
228 The question mark indicates that the preceding element, in
229 this case the plus sign, is optional. These productions
230 allow you to write numbers in the form 17, +17, or -17.
231 <P>
232 Productions P7 and P8 define "integer" exactly as in <tt>fc1</tt>.
233 <P>
234
235 <H3> Recovering and Continuing after a Syntax Error </H3>
236 The production P4 controls error recovery. "error" is a
237 special token in an AnaGram grammar, which can be matched by
238 any sequence of input which contains a syntax error. If your
239 grammar has an error token, when your parser encounters a
240 syntax error it looks to see if there is an error token to
241 match with the syntax error. If "error" is not admissible in
242 the current state, it discards the previous token on the
243 input stack and looks again. It continues until it gets back
244 to a state where "error" is acceptable input or the stack is
245 empty. If the stack is empty, it terminates the parse.
246 Otherwise, it then looks to see if the next input token is
247 admissible. If so, the parse continues. If not, the token is
248 discarded and the parser reads input until it finds an
249 acceptable token or the end of file. In this example, the
250 parser will read characters until it finds a newline character
251 At this point the parse will continue as though nothing
252 had happened. This process is called "error token
253 resynchronization". It is one of several ways to continue after a
254 parser detects a syntax error in its input stream.
255 <P>
256
257
258 <H3> Configuration Parameters </H3>
259 Two configuration parameters have been set in the
260 configuration section of <tt>fc2</tt> to facilitate testing
261 using the File
262 Trace. The first, test file mask, limits the choice of test
263 files to be used with the File Trace option to files with
264 the extension <tt>.fc2</tt>. The second, traditional engine, turns
265 off certain optimizations AnaGram normally builds into its
266 parsers. When the traditional engine switch is set, the
267 parsers AnaGram builds use only the four traditional parser
268 actions: shift, reduce, error, and accept. Otherwise,
269 AnaGram parsers use a number of compound actions in order to
270 reduce the size of the parsing tables and increase the speed
271 of the parser. In this case, the traditional engine switch
272 has been turned on in order to make the behavior of the
273 parser as seen with the File Trace correspond to textbook
274 behavior.
275 <P>
276
277 <H3> Testing FC2 </H3>
278 Test <tt>fc2</tt> just as you did <tt>fc1</tt>: Run AnaGram
279 and build a
280 parser, <tt>fc2.c</tt>. Compile it and link it with your C
281 compiler. Run
282 <tt>fc2</tt> from the command line. Type an integer, with or
283 without a sign, the letter 'f'
284 or 'c', and press Enter. <tt>fc2</tt> will print out Fahrenheit and
285 Celsius temperatures. Repeat until you are satisfied the
286 program works. Try making a few deliberate typos to test the
287 error token resynchronization. Note that the parser
288 automatically provides error diagnostics. These diagnostics
289 are created by the default <tt>SYNTAX_ERROR</tt> macro.
290 <P>
291 Type ^Z and Enter (Windows) or ^D (Unix) to generate an end
292 of file and end the program. (Of course, you
293 could also use ^C or ^Brk.)
294 <P>
295 Alternatively, you may use the test file, <tt>test.fc2</tt>,
296 which has
297 been provided for use with the File Trace (see below).
298 Simply run <tt>fc2</tt> with input redirection to take input from
299 <tt>test.fc2</tt>. At the command prompt, type:
300 <PRE>
301 fc2 &lt; test.fc2
302 </PRE>
303 <P>
304
305 <H2> File Trace </H2>
306 The File Trace feature of AnaGram allows you to test a
307 grammar without actually building a parser. This enables you to
308 completely decouple the debugging of the grammar from the
309 debugging of the reduction procedures. You can try out test
310 files before you have written anything more than the
311 grammar. This allows for very early testing in your projects.
312 <P>
313
314 File Trace allows you to see in fine detail just exactly how
315 your parser will analyze an input file. A File Trace consists of a
316 window with various panes so you can see what is going on,
317 and an interpretive parser which works in the background.
318 <P>
319 You can select File Trace from the Action
320 menu once you have analyzed your grammar. For the <tt>fc2</tt>
321 example, you will be offered a choice of test files with
322 extension <tt>.fc2</tt>. A good choice would be <tt>test1.fc2</tt>.
323 The File Trace window will show a Parser Stack
324 pane to your left, a Test File pane, which shows you the
325 input file you are parsing, to your right, and a Rule Stack
326 pane across the bottom of the window.
327 <P>
328 The way the File Trace parser works is this: Initially none of
329 the test file has been parsed. If you double-click with the
330 left mouse button at a point in the Test File pane, the parser
331 parses to that point. The unparsed part of the file will be
332 colored differently from the parsed part (in the default
333 color scheme, parsed characters
334 have a lighter background). To back up the parse to a
335 previous location, double-click at that spot, or single-click
336 and press Enter or the Synch Parse button at the bottom of
337 the File Trace window. To check a file for syntax errors, all
338 you need to do is to click the Parse File button. If there is
339 a syntax error, the parse will not advance beyond the error
340 point. Normally, however, you will probably want to proceed
341 more deliberately, moving the cursor one character at a
342 time.
343 <P>
344 If you wish to see even finer detail, you may make the
345 parser work in single step mode. by clicking on Single Step
346 or pressing Enter. Each time you click on the Single Step
347 button or press Enter, the parser will perform one parser
348 action. Note that in its normal configuration, AnaGram
349 produces parsers that use a number of parser actions more
350 complex than the traditional shift and reduce actions.
351 <P>
352 The Parser Stack pane shows the levels of the parser stack, the
353 state numbers on the stack and the tokens that have been
354 recognized so far.
355 As you advance through the test file, you can see by
356 looking at the Parser Stack pane how the parser stack changes
357 as characters are shifted in and reductions occur. The Rule
358 Stack pane is an alternate view of the parser stack showing
359 the grammar rules in play at any moment. Notice how the
360 syntax file window is synched with the Rule Stack.
361 <P>
362 If the parse position is not located at the blinking cursor,
363 the Single Step button will be changed to read "Synch Parse".
364 Clicking on the button will move the parse position to the
365 cursor.
366 <P>
367 If you now click on tokens at various levels in the Token
368 Stack, the Test File characters corresponding to these tokens
369 will be highlighted. You can restart the parse at any level
370 by double-clicking just preceding the highlight. The Rule
371 Stack is also synched with the Token Stack and Test File
372 panes.
373 <P>
374 You may interrupt the File Trace at any time to inspect any
375 other window without interfering with the File Trace.
376 Whenever you come back to it, you can proceed as though
377 nothing has happened.
378 <P>
379 If you have a long file, a complex grammar, or a (very) slow
380 computer, it can sometimes take a while for the parse to catch
381 up with the cursor. If you have a long test file and press
382 Parse File to move the cursor to the end of the file, the
383 parser has a lot of computation to do to catch up.
384 <P>
385 For further details about the File Trace, please refer to the
386 AnaGram User's Guide and the on-line documentation.
387 <P>
388 <BR>
389
390
391 <H2> FC3 </H2>
392 <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses
393 floating point
394 arithmetic, so that it can deal with non-integral values,
395 and it allows optional white space, including comments, in
396 the input. In addition, it changes the output format, so
397 that results are printed in degrees Kelvin, as well as in
398 Fahrenheit and Celsius.
399 <P>
400 The following features of AnaGram are introduced in <tt>fc3</tt>:
401 <UL>
402 <LI> Setting the default token type </LI>
403 <LI> The "disregard" statement </LI>
404 <LI> The "lexeme" statement </LI>
405 <LI> Right recursion </LI>
406 <LI> Default value of a reduction token </LI>
407 </UL>
408 <P>
409
410 <H3> Floating Point Arithmetic </H3>
411 To deal with floating point arithmetic, a number of new
412 productions have been added and two productions have been
413 changed. Productions P5 and P6 have been changed to define a
414 "number" in terms of an "unsigned number" instead of an
415 "integer". Productions P6a, P6b, and P6c define "unsigned
416 number" in terms of its integer part and its fraction part.
417 Productions P9 and P10 define the fraction part of the
418 number. Note that "fraction" is described using right
419 recursion rather than left recursion. This makes the reduction
420 procedures neater. Since reduction does not occur until
421 a rule is complete, note that with right recursion each new
422 digit causes the stack depth to increase by one. You can
423 observe this with the File Trace.
424 <P>
425 In order to replace integer arithmetic with floating point
426 arithmetic, statement C3 was added to the configuration
427 section of the grammar. It declares the default token type,
428 the type assigned to nonterminal tokens absent a specific
429 declaration, to be "double". Since we are not interested in
430 values for "grammar" and "temperature", they have been
431 explicitly cast to "void".
432 <P>
433 Note that production P6a does not have a reduction procedure.
434 If a rule has no reduction procedure the value of the
435 reduction token defaults to the value of the first element
436 in the rule. In this case, the value of "unsigned number",
437 in the absence of a reduction procedure, is taken to be the
438 value of "integer".
439 <P>
440
441 <H3> Skipping White Space </H3>
442 Two statements, C4 and C5, are used to skip over uninteresting
443 white space in the input to <tt>fc3</tt>. The "disregard" statement
444 instructs AnaGram to rewrite your grammar in a standard
445 way so that your parser will skip over any instance of the
446 "white space" token that occurs between lexemes, or lexical
447 units, in the input to your parser. Of course, you can use
448 any token name you wish in a "disregard" statement. You can
449 even have multiple "disregard" statements and all of the
450 tokens you specify will be disregarded.
451 <P>
452 The "lexeme" statement is used to declare that certain
453 nonterminal tokens are each to be considered as indivisible
454 lexical units, or lexemes, from the point of view of lexical
455 analysis, so that the "disregard" statement is inoperative
456 inside the nonterminal tokens listed. In this case, statement
457 C5 simply guarantees that white space will not be
458 allowed inside a number. All terminal tokens are automatically
459 lexemes. <!-- when not part of a larger lexeme. -->
460 <P>
461 <!--
462 It would be nice to rewrite this so the example expansion is
463 vaguely close to syntactically legal.
464 -->
465 To make your parser skip white space, AnaGram renames and
466 redefines the lexemes in your grammar and defines a number
467 of new tokens. For example, if '+' is a lexeme in your
468 grammar and your grammar is to disregard "white space",
469 AnaGram will rename the plus sign token as '+'%. It then
470 introduces a new production in your grammar as follows:
471 <PRE>
472 '+'
473 -&gt; '+'%, white space?...
474 </PRE>
475
476 This means that a plus sign followed by some white space
477 will now be treated the same, syntactically, as a plus sign
478 alone. The percent sign (a degree sign in AnaGram 1.x) is
479 used to indicate the original, or "pure",
480 definition of the token.
481 <P>
482 Productions P11 and P12 together define what is meant by
483 white space in this grammar. P11 defines white space to
484 include blanks and tab characters. P12 includes C style
485 comments (not nested).
486 <P>
487 The white space defined in P11 and P12 does not include
488 newline characters. There is a good reason for this. The
489 grammar uses newline characters as delimiters, marking the
490 end of a "temperature". In order to allow blank lines,
491 production P1 was modified to make the temperature optional.
492 <P>
493 To allow "//" style comments, a new token, "end of line",
494 defined by production P13, was added. In production P1, '\n'
495 was then replaced by "end of line".
496 <P>
497
498 <H3> Testing FC3 </H3>
499 The "test file mask" was changed to use files with the
500 extension ".fc3" in the File Trace. TEST.FC3 can be used as
501 input for the File Trace.
502 <P>
503 Build and compile <tt>fc3</tt> in the same way as previous versions.
504 When you run it, try using numbers with decimal fractions.
505 Try typing blanks in various places to see how the parser
506 deals with them. Try redirecting input from <tt>test.fc3</tt>.
507 <P>
508
509
510 <H2> FC4 </H2>
511 <tt>fc4</tt> illustrates a shift-reduce conflict in a grammar. You
512 should probably skip this example until you encounter a
513 shift-reduce conflict in one of your own grammars. In the
514 meantime, skip ahead to <tt>fc5</tt>.
515 <P>
516 The following features of AnaGram are introduced in <tt>fc4</tt>:
517 <UL>
518 <LI> Conflicts window </LI>
519 <LI> Auxiliary Windows menu </LI>
520 <LI> State Definition window </LI>
521 <LI> Expansion Chain window </LI>
522 </UL>
523 <P>
524 In <tt>fc3</tt>, the output was changed to provide results in degrees
525 Kelvin, as well as in Fahrenheit and Celsius. In <tt>fc4</tt>, a
526 production, P3a, is added to accept input in degrees Kelvin.
527 Since there is no such thing as a negative temperature on
528 the Kelvin scale, it seemed appropriate to require that the
529 temperature be an unsigned number. This, however, caused a
530 conflict, i.e. an ambiguity, in <tt>fc4</tt>. In fact, it caused four
531 conflicts to be diagnosed, all of which have a common
532 source.
533 <P>
534
535 <H3> Finding a Conflict </H3>
536 If you run AnaGram and analyze <tt>fc4</tt> you will find that
537 AnaGram finds conflicts in the grammar. To determine the nature
538 of the conflicts, you should first open the Conflicts
539 window. The Conflicts window is available via the Browse Menu
540 or a Control Panel button.
541 <P>
542 The first thing you see is that there are conflicts in
543 states S005 and S025. In each state, one conflict occurs
544 because a decimal point can either be shifted, in accordance
545 with rule R021, or it can reduce rule R014, an empty rule,
546 or null production. The other conflict occurs because a
547 decimal digit can either be shifted, in accordance with rule
548 R022, or it can reduce rule R014, the same rule that gives
549 trouble with the decimal point.
550 <P>
551 The first step in understanding the conflict is to see rules
552 R014, R021, and R022 in context. The Conflicts window is
553 synchronized with the syntax file window. Arrange these
554 windows so you can see them both at once.
555 Then, in the Conflicts window, move the cursor bar up and
556 down. In the syntax file window, the cursor will move
557 between productions for number, unsigned number and integer.
558 <P>
559 Although it is possible to recognize rules R021 and R022 in
560 the grammar, there is no explicit null production in the
561 grammar. To find out for sure what rule R014 is, pop up the
562 Rule Table (listed in the Browse menu) and look for R014. It
563 turns out that R014 is the null production that corresponds
564 to an optional plus sign, written '+'? in the grammar.
565 <P>
566 To get a better idea of what is going on, it is worthwhile to
567 find out what the parser is expecting to see in states S005
568 or S025. To find out, click the right mouse button in the
569 Conflicts window on any line describing a state S005
570 conflict to pop up the Auxiliary Windows menu. Select State
571 Definition to find out what state S005 is all about. It
572 seems that state S005 occurs when the parser has skipped
573 over the initial white space and is about to begin dealing
574 with the actual input. But, looking at this, it is still not
575 clear why a decimal digit or a decimal point is ambiguous.
576 The Expansion Chain windows for rules R021, R022 and R014
577 will show how these rules are derived from the
578 characteristic rule for state S005.
579 <P>
580
581 Return to the Conflicts window. Click the right mouse button
582 on rule R021, the rule that expects the decimal digit,
583 to pop up the Auxiliary Windows menu. Select
584 Expansion Chain. The Expansion Chain window shows how rule
585 R021 derives from the characteristic rule for the state.
586 Each line in this window is a grammar rule produced by the
587 marked token in the rule on
588 the previous line.
589 <P>
590 Now return to the Conflicts window, and get the Expansion
591 Chain window for rule R014. Rearrange the windows on the
592 screen so you can compare the Expansion Chain windows for
593 rules R014 and R021. Note that rule R014 derives from the
594 production
595 <PRE>
596 temperature
597 -&gt; number, 'c' + 'C'
598 </PRE>
599 and rule R021 derives from the production
600 <PRE>
601 temperature
602 -&gt; unsigned number, 'k' + 'K'
603 </PRE>
604 <P>
605 It is now possible to see the nature of the conflict. On the
606 one hand, if the input is a supposed to be a Kelvin
607 temperature, the parser can go right ahead accumulating a
608 number. On the other hand, if the input is supposed to be a
609 Celsius number, the parser has to first acknowledge the
610 optional plus sign.
611 <P>
612 It is the nature of LALR parsers that they can keep track of
613 many threads of possible parses simultaneously as long as
614 they don't have to do any reductions. When they come to the
615 end of a rule, however, they are forced to decide whether
616 the rule has been successfully matched. In this case, the
617 parser is at the end of a null production which arises from
618 the virtual production <CODE> '+'? </CODE> in P6 and is forced to decide
619 whether this null production has been matched. The conflict
620 diagnostics discussed above say that if the next token is a
621 digit or a decimal point, the parser cannot decide between
622 several possibilities. That is, it cannot decide whether or
623 not it has seen an elided plus sign. In effect the parser is
624 being required, because of the null production, to make a
625 premature decision as to whether a Kelvin or Celsius
626 temperature is present in the input.
627 <P>
628 The conflicts can be eliminated by rewriting the grammar so
629 the parser will not come to the end of a rule and be forced
630 to choose among the several threads of the parse until it
631 encounters the determining letter 'f', 'c', or 'k'.
632 <P>
633 Two ways to remove the conflict are illustrated. The first is
634 found in FC4A. In this grammar, production P6 has been
635 replaced with two productions, P6x and P6y. This is a
636 standard method of rewriting a grammar to eliminate null
637 productions. If you analyze <tt>fc4a</tt>, you will find that it no
638 longer has a conflict, so this solves the problem.
639 <P>
640 However, if you look at the <tt>fc4a</tt> grammar closely, you will
641 notice that +23.7K is not acceptable input, although common
642 sense suggests that you ought to be able to use a plus sign
643 on Kelvin temperatures. <tt>fc4b</tt> shows another way to fix the
644 grammar which deals with this quibble. In this grammar,
645 production P3a has been modified to allow an optional plus
646 sign before a Kelvin temperature. If you analyze <tt>fc4b</tt>, you
647 will find that this change also solves the problem.
648 <P>
649 In both these instances, the technique used to resolve the
650 conflicts was to rewrite the grammar so that there are no
651 differences between constructs upstream of the point where
652 they diverge. Another way to put it is this: In the original
653 grammar Kelvin temperatures were distinguished from
654 Fahrenheit and Celsius not only by the 'K' suffix, but also by the
655 optional plus sign. The essence of both fixes to the problem
656 is to remove this distinction.
657 <P>
658 Other AnaGram windows available from the Auxiliary Windows
659 popup menu for the Conflicts window such as Rule Derivation, Token
660 Derivation, Conflict Trace and Problem States are helpful in
661 tracking down conflicts. See the help messages and your
662 AnaGram User's Guide for further details.
663 <P>
664
665 <H3>Testing FC4A and FC4B </H3>
666 Build and compile <tt>fc4a</tt> and <tt>fc4b</tt> in the
667 same way you built
668 previous versions. When you run the parsers, try using
669 Kelvin temperatures with and without leading plus signs.
670 <P>
671 The "test file mask" was changed to use files with the
672 extension ".fc4" in the File Trace. <tt>test.fc4</tt> can be
673 used as
674 input either for the parsers themselves or for the File
675 Trace.
676 <P>
677
678
679 <H2> FC5 </H2>
680 <tt>fc5</tt> illustrates the use of an event driven parser. The
681 grammar is essentially the same as for <tt>fc4b</tt>. The primary
682 difference is in the method of providing input to the
683 parser. The following features of AnaGram are introduced in
684 <tt>fc5</tt>:
685 <UL>
686 <LI> event driven parser </LI>
687 <LI> embedded C </LI>
688 <LI> main program </LI>
689 <LI> initializing and calling the parser </LI>
690 </UL>
691
692 Statement C6 in the configuration section causes AnaGram to
693 build an event driven parser. An event driven parser is
694 first explicitly initialized and then called once for each
695 input unit.
696 <P>
697 A small main program has been included in a block of
698 embedded C at the end of the file. This program calls the
699 initializer and then reads characters from <CODE>stdin</CODE> and passes
700 them on to the parser. Previous <tt>fc</tt> programs did not
701 include a
702 main program, but relied on AnaGram to create one. However,
703 AnaGram does not automatically create a main program for
704 parsers which are event driven, use pointer input, or have
705 any embedded C. Therefore a main program is necessary in
706 this syntax file.
707 <P>
708 Note that the default names for the initializer and parser
709 are <CODE>init_fc5</CODE> and <CODE>fc5</CODE> respectively.
710 Only event driven parsers
711 require that the user explicitly call the initializer
712 function.
713 <P>
714 In addition, a global constant, "zero", was defined in the
715 embedded C to provide the value of absolute zero, and the
716 reduction procedures were modified to refer to "zero"
717 instead of the explicit value.
718 <P>
719 This illustrates an important point about the C parser file
720 that AnaGram builds: All blocks of embedded C precede the
721 reduction procedures, so that the reduction procedures can
722 access all variables and definitions included in embedded
723 C, no matter where they are located in the file.
724 <P>
725
726 <H3>Testing FC5 </H3>
727 Since <tt>fc5</tt> uses the same grammar as <tt>fc4</tt>,
728 you can use the same
729 test files for <tt>fc5</tt> as for <tt>fc4</tt>.
730 </P>
731
732 <BR>
733
734 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
735 WIDTH=1010 HEIGHT=2 >
736 <P>
737 <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
738 WIDTH=181 HEIGHT=25>
739 <BR CLEAR="right">
740 <P>
741 Back to <A HREF="../index.html">Index</A>
742 <P>
743 <ADDRESS><FONT SIZE="-1">
744 AnaGram parser generator - examples<BR>
745 Fahrenheit-Celsius Converter<BR>
746 Copyright &copy; 1993-1999, Parsifal Software. <BR>
747 All Rights Reserved.<BR>
748 </FONT></ADDRESS>
749
750 </BODY>
751 </HTML>
752