Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/misc/html/examples/fc.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> | |
2 <HTML> | |
3 <HEAD> | |
4 <TITLE> Fahrenheit-Celsius Converter</TITLE> | |
5 </HEAD> | |
6 | |
7 | |
8 | |
9 | |
10 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" | |
11 TEXT="#000000" LINK="#0033CC" | |
12 VLINK="#CC0033" ALINK="#CC0099"> | |
13 | |
14 <P> | |
15 <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" | |
16 WIDTH=124 HEIGHT=30 > | |
17 <BR CLEAR="all"> | |
18 Back to <A HREF="../index.html">Index</A> | |
19 <P> | |
20 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
21 WIDTH=1010 HEIGHT=2 > | |
22 <P> | |
23 | |
24 <H1> Fahrenheit-Celsius Converter</H1> | |
25 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
26 WIDTH=1010 HEIGHT=2 > | |
27 <P> | |
28 <H2>Introduction</H2> | |
29 <P> | |
30 Conversion of temperatures from Fahrenheit to Celsius is a | |
31 traditional starting point for learning how to use a | |
32 programming language. This directory contains a graded sequence | |
33 of Fahrenheit to Celsius conversion programs, starting with | |
34 a very simple case and working up to one of some complexity. | |
35 This sequence of programs illustrates an important aspect of | |
36 syntax directed programming: In contrast to conventional | |
37 programming methods it is quite easy to begin with a simple | |
38 case and then extend it to more complex situations. | |
39 <P> | |
40 All of these programs accept input from <tt>stdin</tt> and write | |
41 output to <tt>stdout</tt>. These programs are somewhat exceptional, | |
42 since, except for FC5, they do not have any embedded C and | |
43 therefore do not require explicit definition of a main | |
44 program. | |
45 <P> | |
46 <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius | |
47 conversion programs. It expects the user to type a positive | |
48 integer value, assumed to be a Fahrenheit temperature, which | |
49 it converts to Celsius. It then exits. | |
50 <P> | |
51 <tt>fc2</tt>, the next example, is a somewhat more interesting | |
52 Fahrenheit-Celsius converter. This time the input stream may | |
53 contain any number of temperatures, either Fahrenheit or | |
54 Celsius, each terminated by a newline character. The | |
55 program will continue until it encounters an end of file. If it | |
56 encounters a syntax error in the input, it will skip to the | |
57 next newline character and continue. <tt>fc2</tt> has been set up to | |
58 illustrate the usage of the File Trace feature of AnaGram. | |
59 <P> | |
60 <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses | |
61 floating point | |
62 arithmetic, so that it can deal with non-integral values, | |
63 and it allows optional white space in the input, except | |
64 within numbers. In addition, it changes the output format, | |
65 so that results are printed in degrees Kelvin, as well as in | |
66 Fahrenheit and Celsius. | |
67 <font size=-1>(Yes, we know that in Newspeak the official | |
68 usage is "Kelvins", not "degrees Kelvin". Shush.)</font> | |
69 <P> | |
70 <tt>fc4</tt> illustrates a shift-reduce conflict which arose when | |
71 modifying the <tt>fc3</tt> grammar to allow input in degrees Kelvin. | |
72 You should probably skip this example until you encounter a | |
73 shift-reduce conflict in one of your own grammars. <tt>fc4a</tt> and | |
74 <tt>fc4b</tt> are two different resolutions of the conflict. | |
75 <P> | |
76 <tt>fc5</tt> illustrates the use of an event driven parser. The | |
77 actual grammar is the same as <tt>fc4b</tt>. The only difference is | |
78 in the method of providing input to the parser. | |
79 <P> | |
80 | |
81 <H2>FC1</H2> | |
82 <tt>fc1</tt> is the first and simplest of the Fahrenheit to Celsius | |
83 conversion programs. It expects the user to type an integer | |
84 value, assumed to be a Fahrenheit temperature, which it | |
85 converts to Celsius. It then exits. | |
86 <P> | |
87 The following features of AnaGram are introduced in <tt>fc1</tt>: | |
88 <UL> | |
89 <LI> recursive definition of tokens </LI> | |
90 <LI> definition of a set as a range of characters</LI> | |
91 <LI> token type declaration </LI> | |
92 <LI> default token type </LI> | |
93 <LI> passing token values to reduction procedures </LI> | |
94 <LI> long and short form reduction procedures </LI> | |
95 </UL> | |
96 | |
97 <P> | |
98 <tt>fc1</tt> defines two nonterminal tokens, "grammar", which | |
99 describes the entire input stream, and "integer", which | |
100 describes a simple unsigned integer value. "grammar", defined | |
101 by one production, describes the input as consisting of an | |
102 "integer" followed by a newline character. There is a | |
103 following reduction procedure to print out both the input | |
104 and converted values. | |
105 <P> | |
106 "integer" is recursively defined by two productions. The | |
107 first production says that an "integer" may be represented | |
108 by a single decimal digit. '0-9' represents the set of ascii | |
109 characters on the range '0' through '9'. The token can be | |
110 matched by any character from this set. The second | |
111 production contains the recursion. It says that the combination of | |
112 any "integer" followed by another decimal digit is also an | |
113 integer. Note that the left side is the same for these two | |
114 productions and it need not be repeated. | |
115 <P> | |
116 Note the type cast preceding "integer". This type cast | |
117 defines the data type of the semantic value of "integer" to | |
118 be int. When the parser stores a token value for "integer" | |
119 on its value stack or retrieves a value from the stack, the | |
120 type of the data transmitted will be int. | |
121 <P> | |
122 Since there is no type cast for the token "grammar", the | |
123 data type for the semantic value of "grammar" is given by | |
124 the "default token type" configuration parameter, which | |
125 defaults to void. | |
126 <P> | |
127 The semantic values of the tokens in a grammar rule may be | |
128 passed to the associated reduction procedure. The name of | |
129 the variable in the reduction procedure is simply appended | |
130 to the token name or expression in the rule with a colon as | |
131 a separator. | |
132 <P> | |
133 All three reduction procedures in <tt>fc1</tt> operate on | |
134 the semantic | |
135 values of tokens on the parser stack as parameters. In | |
136 the first reduction procedure, the variable <tt>f</tt> represents the | |
137 value of the integer typed by the user. It is taken as a | |
138 Fahrenheit temperature and converted to Celsius by the | |
139 reduction procedure. This reduction procedure uses the long | |
140 form, consisting of an equal sign followed by a block of C | |
141 code. | |
142 <P> | |
143 The reduction procedures for the two productions for "integer" | |
144 convert the integer from ascii form as typed by the | |
145 user to binary form. The first reduction procedure | |
146 calculates the value of a single digit integer. The second | |
147 reduction procedure calculates the value for an integer with more | |
148 than one digit. Notice that these reduction procedures both | |
149 use the short form: a C expression terminated by a semicolon. | |
150 The value of the expression is saved as the semantic | |
151 value of the reduced token. | |
152 <P> | |
153 | |
154 <H3>Testing FC1</H3> | |
155 Run AnaGram and build a parser, <tt>fc1.c</tt>. Compile it | |
156 and link it | |
157 with your C compiler. Run <tt>fc1</tt> from the command line. | |
158 Type an integer and press | |
159 Enter. <tt>fc1</tt> will print out Fahrenheit and Celsius | |
160 temperatures. | |
161 <P> | |
162 | |
163 | |
164 <H2>FC2</H2> | |
165 The next example of a syntax file, <tt>fc2.syn</tt>, is a somewhat | |
166 more interesting Fahrenheit-Celsius converter. This time the | |
167 input stream may contain any number of temperatures. The | |
168 program will continue until it encounters an end of file. If | |
169 it encounters a syntax error in the input, it will skip to | |
170 the next newline character and continue. <tt>fc2</tt> has | |
171 been set up | |
172 to illustrate the File Trace feature of AnaGram. | |
173 <P> | |
174 The following features of AnaGram are introduced in <tt>fc2</tt>: | |
175 <UL> | |
176 <LI> configuration section </LI> | |
177 <LI> character set defined by set union </LI> | |
178 <LI> end of file token </LI> | |
179 <LI> virtual productions </LI> | |
180 <LI> use of '?' to define a virtual production </LI> | |
181 <LI> error token resynchronization </LI> | |
182 <LI> File Trace </LI> | |
183 </UL> | |
184 <P> | |
185 <tt>fc2</tt> allows the temperature values to be signed | |
186 integers which | |
187 may be either Fahrenheit or Celsius, as determined by a | |
188 following 'f' or 'c'. Each temperature value to be | |
189 converted must be followed by a newline. Spaces in the input | |
190 are not allowed. | |
191 <P> | |
192 To facilitate testing, a configuration section has been | |
193 added at the beginning of the file to set two configuration | |
194 parameters, discussed below. The lines specifying the | |
195 parameters have been labeled C1 and C2 at the right margin | |
196 to make them easy to refer to in this documentation. | |
197 Similarly, productions have been labeled P1, P2, etc. | |
198 <P> | |
199 After the configuration section, an end of file token, eof, | |
200 is defined. Remember that when using stream I/O, the end of | |
201 file is signalled by a -1. | |
202 <P> | |
203 The first production, P1, describes the entire input file as | |
204 an optional sequence of temperatures followed by an end of | |
205 file. | |
206 <P> | |
207 The expression <CODE>[temperature, '\n']...</CODE> is a "virtual | |
208 production". The square brackets indicate the rule inside the | |
209 brackets is optional. The ellipsis (<CODE>...</CODE>) | |
210 indicates that the | |
211 rule may be repeated an arbitrary number of times. | |
212 <P> | |
213 Productions P2 and P3 define "temperature" in the normal | |
214 case. Production P4 controls error recovery, described | |
215 below.In AnaGram, <CODE>'f' + 'F'</CODE> represents the set | |
216 of characters | |
217 containing both upper and lower case 'f'. (The plus sign is | |
218 the union operator of set theory.) Either an upper or lower | |
219 case 'f' in the input will match this set. <CODE>'c' + | |
220 'C'</CODE> is | |
221 similarly interpreted. Thus a temperature consists of a | |
222 number followed by an 'f' or 'c' which can be either upper | |
223 or lower case. The reduction procedures, of course, are | |
224 different for 'f' and 'c'. | |
225 <P> | |
226 Productions P5 and P6 define "number" as consisting of an | |
227 integer with an optional sign. <CODE>'+'?</CODE> is a virtual production. | |
228 The question mark indicates that the preceding element, in | |
229 this case the plus sign, is optional. These productions | |
230 allow you to write numbers in the form 17, +17, or -17. | |
231 <P> | |
232 Productions P7 and P8 define "integer" exactly as in <tt>fc1</tt>. | |
233 <P> | |
234 | |
235 <H3> Recovering and Continuing after a Syntax Error </H3> | |
236 The production P4 controls error recovery. "error" is a | |
237 special token in an AnaGram grammar, which can be matched by | |
238 any sequence of input which contains a syntax error. If your | |
239 grammar has an error token, when your parser encounters a | |
240 syntax error it looks to see if there is an error token to | |
241 match with the syntax error. If "error" is not admissible in | |
242 the current state, it discards the previous token on the | |
243 input stack and looks again. It continues until it gets back | |
244 to a state where "error" is acceptable input or the stack is | |
245 empty. If the stack is empty, it terminates the parse. | |
246 Otherwise, it then looks to see if the next input token is | |
247 admissible. If so, the parse continues. If not, the token is | |
248 discarded and the parser reads input until it finds an | |
249 acceptable token or the end of file. In this example, the | |
250 parser will read characters until it finds a newline character | |
251 At this point the parse will continue as though nothing | |
252 had happened. This process is called "error token | |
253 resynchronization". It is one of several ways to continue after a | |
254 parser detects a syntax error in its input stream. | |
255 <P> | |
256 | |
257 | |
258 <H3> Configuration Parameters </H3> | |
259 Two configuration parameters have been set in the | |
260 configuration section of <tt>fc2</tt> to facilitate testing | |
261 using the File | |
262 Trace. The first, test file mask, limits the choice of test | |
263 files to be used with the File Trace option to files with | |
264 the extension <tt>.fc2</tt>. The second, traditional engine, turns | |
265 off certain optimizations AnaGram normally builds into its | |
266 parsers. When the traditional engine switch is set, the | |
267 parsers AnaGram builds use only the four traditional parser | |
268 actions: shift, reduce, error, and accept. Otherwise, | |
269 AnaGram parsers use a number of compound actions in order to | |
270 reduce the size of the parsing tables and increase the speed | |
271 of the parser. In this case, the traditional engine switch | |
272 has been turned on in order to make the behavior of the | |
273 parser as seen with the File Trace correspond to textbook | |
274 behavior. | |
275 <P> | |
276 | |
277 <H3> Testing FC2 </H3> | |
278 Test <tt>fc2</tt> just as you did <tt>fc1</tt>: Run AnaGram | |
279 and build a | |
280 parser, <tt>fc2.c</tt>. Compile it and link it with your C | |
281 compiler. Run | |
282 <tt>fc2</tt> from the command line. Type an integer, with or | |
283 without a sign, the letter 'f' | |
284 or 'c', and press Enter. <tt>fc2</tt> will print out Fahrenheit and | |
285 Celsius temperatures. Repeat until you are satisfied the | |
286 program works. Try making a few deliberate typos to test the | |
287 error token resynchronization. Note that the parser | |
288 automatically provides error diagnostics. These diagnostics | |
289 are created by the default <tt>SYNTAX_ERROR</tt> macro. | |
290 <P> | |
291 Type ^Z and Enter (Windows) or ^D (Unix) to generate an end | |
292 of file and end the program. (Of course, you | |
293 could also use ^C or ^Brk.) | |
294 <P> | |
295 Alternatively, you may use the test file, <tt>test.fc2</tt>, | |
296 which has | |
297 been provided for use with the File Trace (see below). | |
298 Simply run <tt>fc2</tt> with input redirection to take input from | |
299 <tt>test.fc2</tt>. At the command prompt, type: | |
300 <PRE> | |
301 fc2 < test.fc2 | |
302 </PRE> | |
303 <P> | |
304 | |
305 <H2> File Trace </H2> | |
306 The File Trace feature of AnaGram allows you to test a | |
307 grammar without actually building a parser. This enables you to | |
308 completely decouple the debugging of the grammar from the | |
309 debugging of the reduction procedures. You can try out test | |
310 files before you have written anything more than the | |
311 grammar. This allows for very early testing in your projects. | |
312 <P> | |
313 | |
314 File Trace allows you to see in fine detail just exactly how | |
315 your parser will analyze an input file. A File Trace consists of a | |
316 window with various panes so you can see what is going on, | |
317 and an interpretive parser which works in the background. | |
318 <P> | |
319 You can select File Trace from the Action | |
320 menu once you have analyzed your grammar. For the <tt>fc2</tt> | |
321 example, you will be offered a choice of test files with | |
322 extension <tt>.fc2</tt>. A good choice would be <tt>test1.fc2</tt>. | |
323 The File Trace window will show a Parser Stack | |
324 pane to your left, a Test File pane, which shows you the | |
325 input file you are parsing, to your right, and a Rule Stack | |
326 pane across the bottom of the window. | |
327 <P> | |
328 The way the File Trace parser works is this: Initially none of | |
329 the test file has been parsed. If you double-click with the | |
330 left mouse button at a point in the Test File pane, the parser | |
331 parses to that point. The unparsed part of the file will be | |
332 colored differently from the parsed part (in the default | |
333 color scheme, parsed characters | |
334 have a lighter background). To back up the parse to a | |
335 previous location, double-click at that spot, or single-click | |
336 and press Enter or the Synch Parse button at the bottom of | |
337 the File Trace window. To check a file for syntax errors, all | |
338 you need to do is to click the Parse File button. If there is | |
339 a syntax error, the parse will not advance beyond the error | |
340 point. Normally, however, you will probably want to proceed | |
341 more deliberately, moving the cursor one character at a | |
342 time. | |
343 <P> | |
344 If you wish to see even finer detail, you may make the | |
345 parser work in single step mode. by clicking on Single Step | |
346 or pressing Enter. Each time you click on the Single Step | |
347 button or press Enter, the parser will perform one parser | |
348 action. Note that in its normal configuration, AnaGram | |
349 produces parsers that use a number of parser actions more | |
350 complex than the traditional shift and reduce actions. | |
351 <P> | |
352 The Parser Stack pane shows the levels of the parser stack, the | |
353 state numbers on the stack and the tokens that have been | |
354 recognized so far. | |
355 As you advance through the test file, you can see by | |
356 looking at the Parser Stack pane how the parser stack changes | |
357 as characters are shifted in and reductions occur. The Rule | |
358 Stack pane is an alternate view of the parser stack showing | |
359 the grammar rules in play at any moment. Notice how the | |
360 syntax file window is synched with the Rule Stack. | |
361 <P> | |
362 If the parse position is not located at the blinking cursor, | |
363 the Single Step button will be changed to read "Synch Parse". | |
364 Clicking on the button will move the parse position to the | |
365 cursor. | |
366 <P> | |
367 If you now click on tokens at various levels in the Token | |
368 Stack, the Test File characters corresponding to these tokens | |
369 will be highlighted. You can restart the parse at any level | |
370 by double-clicking just preceding the highlight. The Rule | |
371 Stack is also synched with the Token Stack and Test File | |
372 panes. | |
373 <P> | |
374 You may interrupt the File Trace at any time to inspect any | |
375 other window without interfering with the File Trace. | |
376 Whenever you come back to it, you can proceed as though | |
377 nothing has happened. | |
378 <P> | |
379 If you have a long file, a complex grammar, or a (very) slow | |
380 computer, it can sometimes take a while for the parse to catch | |
381 up with the cursor. If you have a long test file and press | |
382 Parse File to move the cursor to the end of the file, the | |
383 parser has a lot of computation to do to catch up. | |
384 <P> | |
385 For further details about the File Trace, please refer to the | |
386 AnaGram User's Guide and the on-line documentation. | |
387 <P> | |
388 <BR> | |
389 | |
390 | |
391 <H2> FC3 </H2> | |
392 <tt>fc3</tt> adds two new features to <tt>fc2</tt>: It uses | |
393 floating point | |
394 arithmetic, so that it can deal with non-integral values, | |
395 and it allows optional white space, including comments, in | |
396 the input. In addition, it changes the output format, so | |
397 that results are printed in degrees Kelvin, as well as in | |
398 Fahrenheit and Celsius. | |
399 <P> | |
400 The following features of AnaGram are introduced in <tt>fc3</tt>: | |
401 <UL> | |
402 <LI> Setting the default token type </LI> | |
403 <LI> The "disregard" statement </LI> | |
404 <LI> The "lexeme" statement </LI> | |
405 <LI> Right recursion </LI> | |
406 <LI> Default value of a reduction token </LI> | |
407 </UL> | |
408 <P> | |
409 | |
410 <H3> Floating Point Arithmetic </H3> | |
411 To deal with floating point arithmetic, a number of new | |
412 productions have been added and two productions have been | |
413 changed. Productions P5 and P6 have been changed to define a | |
414 "number" in terms of an "unsigned number" instead of an | |
415 "integer". Productions P6a, P6b, and P6c define "unsigned | |
416 number" in terms of its integer part and its fraction part. | |
417 Productions P9 and P10 define the fraction part of the | |
418 number. Note that "fraction" is described using right | |
419 recursion rather than left recursion. This makes the reduction | |
420 procedures neater. Since reduction does not occur until | |
421 a rule is complete, note that with right recursion each new | |
422 digit causes the stack depth to increase by one. You can | |
423 observe this with the File Trace. | |
424 <P> | |
425 In order to replace integer arithmetic with floating point | |
426 arithmetic, statement C3 was added to the configuration | |
427 section of the grammar. It declares the default token type, | |
428 the type assigned to nonterminal tokens absent a specific | |
429 declaration, to be "double". Since we are not interested in | |
430 values for "grammar" and "temperature", they have been | |
431 explicitly cast to "void". | |
432 <P> | |
433 Note that production P6a does not have a reduction procedure. | |
434 If a rule has no reduction procedure the value of the | |
435 reduction token defaults to the value of the first element | |
436 in the rule. In this case, the value of "unsigned number", | |
437 in the absence of a reduction procedure, is taken to be the | |
438 value of "integer". | |
439 <P> | |
440 | |
441 <H3> Skipping White Space </H3> | |
442 Two statements, C4 and C5, are used to skip over uninteresting | |
443 white space in the input to <tt>fc3</tt>. The "disregard" statement | |
444 instructs AnaGram to rewrite your grammar in a standard | |
445 way so that your parser will skip over any instance of the | |
446 "white space" token that occurs between lexemes, or lexical | |
447 units, in the input to your parser. Of course, you can use | |
448 any token name you wish in a "disregard" statement. You can | |
449 even have multiple "disregard" statements and all of the | |
450 tokens you specify will be disregarded. | |
451 <P> | |
452 The "lexeme" statement is used to declare that certain | |
453 nonterminal tokens are each to be considered as indivisible | |
454 lexical units, or lexemes, from the point of view of lexical | |
455 analysis, so that the "disregard" statement is inoperative | |
456 inside the nonterminal tokens listed. In this case, statement | |
457 C5 simply guarantees that white space will not be | |
458 allowed inside a number. All terminal tokens are automatically | |
459 lexemes. <!-- when not part of a larger lexeme. --> | |
460 <P> | |
461 <!-- | |
462 It would be nice to rewrite this so the example expansion is | |
463 vaguely close to syntactically legal. | |
464 --> | |
465 To make your parser skip white space, AnaGram renames and | |
466 redefines the lexemes in your grammar and defines a number | |
467 of new tokens. For example, if '+' is a lexeme in your | |
468 grammar and your grammar is to disregard "white space", | |
469 AnaGram will rename the plus sign token as '+'%. It then | |
470 introduces a new production in your grammar as follows: | |
471 <PRE> | |
472 '+' | |
473 -> '+'%, white space?... | |
474 </PRE> | |
475 | |
476 This means that a plus sign followed by some white space | |
477 will now be treated the same, syntactically, as a plus sign | |
478 alone. The percent sign (a degree sign in AnaGram 1.x) is | |
479 used to indicate the original, or "pure", | |
480 definition of the token. | |
481 <P> | |
482 Productions P11 and P12 together define what is meant by | |
483 white space in this grammar. P11 defines white space to | |
484 include blanks and tab characters. P12 includes C style | |
485 comments (not nested). | |
486 <P> | |
487 The white space defined in P11 and P12 does not include | |
488 newline characters. There is a good reason for this. The | |
489 grammar uses newline characters as delimiters, marking the | |
490 end of a "temperature". In order to allow blank lines, | |
491 production P1 was modified to make the temperature optional. | |
492 <P> | |
493 To allow "//" style comments, a new token, "end of line", | |
494 defined by production P13, was added. In production P1, '\n' | |
495 was then replaced by "end of line". | |
496 <P> | |
497 | |
498 <H3> Testing FC3 </H3> | |
499 The "test file mask" was changed to use files with the | |
500 extension ".fc3" in the File Trace. TEST.FC3 can be used as | |
501 input for the File Trace. | |
502 <P> | |
503 Build and compile <tt>fc3</tt> in the same way as previous versions. | |
504 When you run it, try using numbers with decimal fractions. | |
505 Try typing blanks in various places to see how the parser | |
506 deals with them. Try redirecting input from <tt>test.fc3</tt>. | |
507 <P> | |
508 | |
509 | |
510 <H2> FC4 </H2> | |
511 <tt>fc4</tt> illustrates a shift-reduce conflict in a grammar. You | |
512 should probably skip this example until you encounter a | |
513 shift-reduce conflict in one of your own grammars. In the | |
514 meantime, skip ahead to <tt>fc5</tt>. | |
515 <P> | |
516 The following features of AnaGram are introduced in <tt>fc4</tt>: | |
517 <UL> | |
518 <LI> Conflicts window </LI> | |
519 <LI> Auxiliary Windows menu </LI> | |
520 <LI> State Definition window </LI> | |
521 <LI> Expansion Chain window </LI> | |
522 </UL> | |
523 <P> | |
524 In <tt>fc3</tt>, the output was changed to provide results in degrees | |
525 Kelvin, as well as in Fahrenheit and Celsius. In <tt>fc4</tt>, a | |
526 production, P3a, is added to accept input in degrees Kelvin. | |
527 Since there is no such thing as a negative temperature on | |
528 the Kelvin scale, it seemed appropriate to require that the | |
529 temperature be an unsigned number. This, however, caused a | |
530 conflict, i.e. an ambiguity, in <tt>fc4</tt>. In fact, it caused four | |
531 conflicts to be diagnosed, all of which have a common | |
532 source. | |
533 <P> | |
534 | |
535 <H3> Finding a Conflict </H3> | |
536 If you run AnaGram and analyze <tt>fc4</tt> you will find that | |
537 AnaGram finds conflicts in the grammar. To determine the nature | |
538 of the conflicts, you should first open the Conflicts | |
539 window. The Conflicts window is available via the Browse Menu | |
540 or a Control Panel button. | |
541 <P> | |
542 The first thing you see is that there are conflicts in | |
543 states S005 and S025. In each state, one conflict occurs | |
544 because a decimal point can either be shifted, in accordance | |
545 with rule R021, or it can reduce rule R014, an empty rule, | |
546 or null production. The other conflict occurs because a | |
547 decimal digit can either be shifted, in accordance with rule | |
548 R022, or it can reduce rule R014, the same rule that gives | |
549 trouble with the decimal point. | |
550 <P> | |
551 The first step in understanding the conflict is to see rules | |
552 R014, R021, and R022 in context. The Conflicts window is | |
553 synchronized with the syntax file window. Arrange these | |
554 windows so you can see them both at once. | |
555 Then, in the Conflicts window, move the cursor bar up and | |
556 down. In the syntax file window, the cursor will move | |
557 between productions for number, unsigned number and integer. | |
558 <P> | |
559 Although it is possible to recognize rules R021 and R022 in | |
560 the grammar, there is no explicit null production in the | |
561 grammar. To find out for sure what rule R014 is, pop up the | |
562 Rule Table (listed in the Browse menu) and look for R014. It | |
563 turns out that R014 is the null production that corresponds | |
564 to an optional plus sign, written '+'? in the grammar. | |
565 <P> | |
566 To get a better idea of what is going on, it is worthwhile to | |
567 find out what the parser is expecting to see in states S005 | |
568 or S025. To find out, click the right mouse button in the | |
569 Conflicts window on any line describing a state S005 | |
570 conflict to pop up the Auxiliary Windows menu. Select State | |
571 Definition to find out what state S005 is all about. It | |
572 seems that state S005 occurs when the parser has skipped | |
573 over the initial white space and is about to begin dealing | |
574 with the actual input. But, looking at this, it is still not | |
575 clear why a decimal digit or a decimal point is ambiguous. | |
576 The Expansion Chain windows for rules R021, R022 and R014 | |
577 will show how these rules are derived from the | |
578 characteristic rule for state S005. | |
579 <P> | |
580 | |
581 Return to the Conflicts window. Click the right mouse button | |
582 on rule R021, the rule that expects the decimal digit, | |
583 to pop up the Auxiliary Windows menu. Select | |
584 Expansion Chain. The Expansion Chain window shows how rule | |
585 R021 derives from the characteristic rule for the state. | |
586 Each line in this window is a grammar rule produced by the | |
587 marked token in the rule on | |
588 the previous line. | |
589 <P> | |
590 Now return to the Conflicts window, and get the Expansion | |
591 Chain window for rule R014. Rearrange the windows on the | |
592 screen so you can compare the Expansion Chain windows for | |
593 rules R014 and R021. Note that rule R014 derives from the | |
594 production | |
595 <PRE> | |
596 temperature | |
597 -> number, 'c' + 'C' | |
598 </PRE> | |
599 and rule R021 derives from the production | |
600 <PRE> | |
601 temperature | |
602 -> unsigned number, 'k' + 'K' | |
603 </PRE> | |
604 <P> | |
605 It is now possible to see the nature of the conflict. On the | |
606 one hand, if the input is a supposed to be a Kelvin | |
607 temperature, the parser can go right ahead accumulating a | |
608 number. On the other hand, if the input is supposed to be a | |
609 Celsius number, the parser has to first acknowledge the | |
610 optional plus sign. | |
611 <P> | |
612 It is the nature of LALR parsers that they can keep track of | |
613 many threads of possible parses simultaneously as long as | |
614 they don't have to do any reductions. When they come to the | |
615 end of a rule, however, they are forced to decide whether | |
616 the rule has been successfully matched. In this case, the | |
617 parser is at the end of a null production which arises from | |
618 the virtual production <CODE> '+'? </CODE> in P6 and is forced to decide | |
619 whether this null production has been matched. The conflict | |
620 diagnostics discussed above say that if the next token is a | |
621 digit or a decimal point, the parser cannot decide between | |
622 several possibilities. That is, it cannot decide whether or | |
623 not it has seen an elided plus sign. In effect the parser is | |
624 being required, because of the null production, to make a | |
625 premature decision as to whether a Kelvin or Celsius | |
626 temperature is present in the input. | |
627 <P> | |
628 The conflicts can be eliminated by rewriting the grammar so | |
629 the parser will not come to the end of a rule and be forced | |
630 to choose among the several threads of the parse until it | |
631 encounters the determining letter 'f', 'c', or 'k'. | |
632 <P> | |
633 Two ways to remove the conflict are illustrated. The first is | |
634 found in FC4A. In this grammar, production P6 has been | |
635 replaced with two productions, P6x and P6y. This is a | |
636 standard method of rewriting a grammar to eliminate null | |
637 productions. If you analyze <tt>fc4a</tt>, you will find that it no | |
638 longer has a conflict, so this solves the problem. | |
639 <P> | |
640 However, if you look at the <tt>fc4a</tt> grammar closely, you will | |
641 notice that +23.7K is not acceptable input, although common | |
642 sense suggests that you ought to be able to use a plus sign | |
643 on Kelvin temperatures. <tt>fc4b</tt> shows another way to fix the | |
644 grammar which deals with this quibble. In this grammar, | |
645 production P3a has been modified to allow an optional plus | |
646 sign before a Kelvin temperature. If you analyze <tt>fc4b</tt>, you | |
647 will find that this change also solves the problem. | |
648 <P> | |
649 In both these instances, the technique used to resolve the | |
650 conflicts was to rewrite the grammar so that there are no | |
651 differences between constructs upstream of the point where | |
652 they diverge. Another way to put it is this: In the original | |
653 grammar Kelvin temperatures were distinguished from | |
654 Fahrenheit and Celsius not only by the 'K' suffix, but also by the | |
655 optional plus sign. The essence of both fixes to the problem | |
656 is to remove this distinction. | |
657 <P> | |
658 Other AnaGram windows available from the Auxiliary Windows | |
659 popup menu for the Conflicts window such as Rule Derivation, Token | |
660 Derivation, Conflict Trace and Problem States are helpful in | |
661 tracking down conflicts. See the help messages and your | |
662 AnaGram User's Guide for further details. | |
663 <P> | |
664 | |
665 <H3>Testing FC4A and FC4B </H3> | |
666 Build and compile <tt>fc4a</tt> and <tt>fc4b</tt> in the | |
667 same way you built | |
668 previous versions. When you run the parsers, try using | |
669 Kelvin temperatures with and without leading plus signs. | |
670 <P> | |
671 The "test file mask" was changed to use files with the | |
672 extension ".fc4" in the File Trace. <tt>test.fc4</tt> can be | |
673 used as | |
674 input either for the parsers themselves or for the File | |
675 Trace. | |
676 <P> | |
677 | |
678 | |
679 <H2> FC5 </H2> | |
680 <tt>fc5</tt> illustrates the use of an event driven parser. The | |
681 grammar is essentially the same as for <tt>fc4b</tt>. The primary | |
682 difference is in the method of providing input to the | |
683 parser. The following features of AnaGram are introduced in | |
684 <tt>fc5</tt>: | |
685 <UL> | |
686 <LI> event driven parser </LI> | |
687 <LI> embedded C </LI> | |
688 <LI> main program </LI> | |
689 <LI> initializing and calling the parser </LI> | |
690 </UL> | |
691 | |
692 Statement C6 in the configuration section causes AnaGram to | |
693 build an event driven parser. An event driven parser is | |
694 first explicitly initialized and then called once for each | |
695 input unit. | |
696 <P> | |
697 A small main program has been included in a block of | |
698 embedded C at the end of the file. This program calls the | |
699 initializer and then reads characters from <CODE>stdin</CODE> and passes | |
700 them on to the parser. Previous <tt>fc</tt> programs did not | |
701 include a | |
702 main program, but relied on AnaGram to create one. However, | |
703 AnaGram does not automatically create a main program for | |
704 parsers which are event driven, use pointer input, or have | |
705 any embedded C. Therefore a main program is necessary in | |
706 this syntax file. | |
707 <P> | |
708 Note that the default names for the initializer and parser | |
709 are <CODE>init_fc5</CODE> and <CODE>fc5</CODE> respectively. | |
710 Only event driven parsers | |
711 require that the user explicitly call the initializer | |
712 function. | |
713 <P> | |
714 In addition, a global constant, "zero", was defined in the | |
715 embedded C to provide the value of absolute zero, and the | |
716 reduction procedures were modified to refer to "zero" | |
717 instead of the explicit value. | |
718 <P> | |
719 This illustrates an important point about the C parser file | |
720 that AnaGram builds: All blocks of embedded C precede the | |
721 reduction procedures, so that the reduction procedures can | |
722 access all variables and definitions included in embedded | |
723 C, no matter where they are located in the file. | |
724 <P> | |
725 | |
726 <H3>Testing FC5 </H3> | |
727 Since <tt>fc5</tt> uses the same grammar as <tt>fc4</tt>, | |
728 you can use the same | |
729 test files for <tt>fc5</tt> as for <tt>fc4</tt>. | |
730 </P> | |
731 | |
732 <BR> | |
733 | |
734 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
735 WIDTH=1010 HEIGHT=2 > | |
736 <P> | |
737 <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" | |
738 WIDTH=181 HEIGHT=25> | |
739 <BR CLEAR="right"> | |
740 <P> | |
741 Back to <A HREF="../index.html">Index</A> | |
742 <P> | |
743 <ADDRESS><FONT SIZE="-1"> | |
744 AnaGram parser generator - examples<BR> | |
745 Fahrenheit-Celsius Converter<BR> | |
746 Copyright © 1993-1999, Parsifal Software. <BR> | |
747 All Rights Reserved.<BR> | |
748 </FONT></ADDRESS> | |
749 | |
750 </BODY> | |
751 </HTML> | |
752 |