Mercurial > ~dholland > hg > ag > index.cgi
comparison anagram/guisupport/helpdata.src @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 Accept Action | |
2 | |
3 The accept action is one of the four actions of a | |
4 traditional ©parsing engineª. The accept action is | |
5 performed when the ©parserª has succeeded in identifying | |
6 the goal, or ©grammar tokenª for the ©grammarª. When | |
7 the parser executes the accept action, it sets the ©exit_flagª | |
8 field in the ©parser control blockª to AG_SUCCESS_CODE and returns | |
9 to the calling program. The accept action is thus the last action of | |
10 the parsing engine and occurs only once for each successful execution | |
11 of the parser. | |
12 | |
13 If the grammar token has a non-void value, you may | |
14 obtain its value by calling the ©parser value functionª | |
15 whose name is given by <parser name>_value, that is, | |
16 by appending "_value" to the ©parser nameª. | |
17 ## | |
18 | |
19 Parser Value Function, Return Value | |
20 | |
21 The value assigned to the ©grammar tokenª in your parser | |
22 may be retrieved by calling the parser value function after | |
23 the parser has finished. The name of this function is given | |
24 by <©parser nameª>_value. The return type of the function | |
25 is the type assigned to the grammar token. | |
26 | |
27 If you have set the ©reentrant parserª switch, the parser | |
28 value function takes a pointer to the ©parser control blockª | |
29 as its sole argument. Otherwise, it takes no arguments. The | |
30 value function is not defined if the grammar token has type "void". | |
31 ## | |
32 | |
33 AG_PLACEMENT_DELETE_REQUIRED | |
34 | |
35 When the ©wrapperª option is specified, the wrapper | |
36 template class that AnaGram defines uses a "placement | |
37 new" operator to construct the wrapper object on the | |
38 ©parser value stackª. The MSVC++ 6.0 compiler requires, | |
39 in this situation, that a corresponding "placement | |
40 delete" operator be defined. Other C++ compilers, | |
41 notably MSVC++ 5.0, generate an error message if | |
42 they encounter the definition of a "placement delete" | |
43 operator. | |
44 | |
45 Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine | |
46 whther a "placement delete" operator should be defined. | |
47 | |
48 AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++ | |
49 6.0 or greater, 0 otherwise. You can override the automatic definition of | |
50 AG_PLACEMENT_DELETE_REQUIRED by defining it in the ©C prologueª section | |
51 of your grammar. Set it to a non-zero value to force the "placement | |
52 delete" definition, zero to skip the definition. | |
53 | |
54 ## | |
55 | |
56 ag_tcv | |
57 | |
58 ag_tcv is an array AnaGram includes in your ©parserª. | |
59 Your parser uses ag_tcv to translate external codes to | |
60 the internal token numbers that AnaGram uses. It uses | |
61 the actual input code to index the ag_tcv array to | |
62 fetch a ©token numberª. The token number is then used | |
63 to identify the input token. | |
64 ## | |
65 | |
66 Allow macros | |
67 | |
68 "Allow macros" is a ©configuration switchª which | |
69 defaults to on. When it is set, i.e., on, ©reduction | |
70 procedureªs will be implemented as macros if they are | |
71 sufficiently simple. This makes your ©parserª somewhat | |
72 more compact but makes it somewhat more difficult to | |
73 debug. It's a good idea to turn this switch off for | |
74 debugging. | |
75 ## | |
76 | |
77 Analyze Grammar | |
78 | |
79 The Analyze Grammar command will scan and | |
80 analyze your ©syntax fileª, and create a number of | |
81 tables summarizing your grammar. | |
82 | |
83 Analyze Grammar does not create any ©output filesª. | |
84 To create a ©parserª, use the ©Build Parserª command. | |
85 You would probably use Analyze Grammar, rather than Build Parser, during | |
86 initial development of your ©grammarª. | |
87 | |
88 You can use ©File Traceª and ©Grammar Traceª as soon as you have | |
89 analyzed your grammar. It is not necessary to build a parser first. | |
90 ## | |
91 | |
92 Attribute Statement | |
93 | |
94 Attribute statements are used in ©configuration | |
95 sectionsª of your ©syntax fileª to specify certain | |
96 properties for ©tokenªs, ©character setªs, or other | |
97 units of your grammar. The attribute statements | |
98 available are: | |
99 ©disregardª | |
100 ©distinguish keywordsª | |
101 ©enumª | |
102 ©extend pcbª | |
103 ©hiddenª | |
104 ©leftª | |
105 ©lexemeª | |
106 ©nonassocª | |
107 ©rename macroª | |
108 ©reserve keywordsª | |
109 ©rightª | |
110 ©stickyª | |
111 ©subgrammarª | |
112 ©wrapperª | |
113 ## | |
114 | |
115 Auto init | |
116 | |
117 Auto init is a ©configuration switchª which defaults to | |
118 on. It controls the initialization of any ©parserª that | |
119 it is not ©event drivenª. When it is set to on, your | |
120 parser is automatically initialized every time it is | |
121 called. This is the situation you will normally use. On | |
122 occasion, however, it is desirable to call a parser | |
123 several times without reinitializing it. In this case, | |
124 you may set the auto init parameter to off and then | |
125 call the ©initializerª yourself whenever it is | |
126 appropriate. | |
127 ## | |
128 | |
129 Auto resynch | |
130 | |
131 "Auto resynch" is a ©configuration switchª which | |
132 defaults to off. You may use it to specify ©automatic | |
133 resynchronizationª as an ©error recoveryª mechanism. | |
134 | |
135 Setting the "auto resynch" switch causes AnaGram to | |
136 include an automatic ©resynchronizationª procedure in | |
137 your ©parserª. The resynchronization procedure will be | |
138 invoked when your parser encounters a ©syntax errorª | |
139 and will skip over input until it finds input | |
140 characters or ©tokensª consistent with its state at the | |
141 time of the error. | |
142 | |
143 An alternate technique, ©error token resynchronizationª, | |
144 uses an ©error tokenª which you include in your grammar. | |
145 ## | |
146 | |
147 Automatic Resynchronization | |
148 | |
149 Automatic ©resynchronizationª is one of several ©error | |
150 recoveryª options available as part of parsers built by | |
151 AnaGram. You enable automatic resynchronization by | |
152 setting the ©auto resynchª ©configuration switchª. If | |
153 your parser includes automatic resynchronization it will | |
154 incorporate a heuristic procedure which will skip over | |
155 input tokens until it finds a token which makes sense | |
156 with respect to one or another of the ©productionªs | |
157 active at the time of the ©syntax errorª. | |
158 | |
159 The purpose of the resynchronization procedure is to | |
160 provide a simple way for your parser to proceed in the | |
161 event of syntax errors so that it can find more than one | |
162 syntax error on a given pass. The resynchronization | |
163 procedure uses a heuristic based on your own syntax. | |
164 AnaGram itself uses this technique to resynchronize | |
165 after syntax errors in its input. | |
166 | |
167 A disadvantage to using this resynchronization technique | |
168 is that the resynchronization procedure turns off all | |
169 ©reduction procedureªs. Because of the error, a number | |
170 of reduction procedures, which normally would be | |
171 executed, will be skipped. The parameters for any | |
172 reduction procedures that might be called later would be | |
173 suspect and could cause serious problems. It seems more | |
174 prudent simply to shut them down. | |
175 | |
176 If you use the automatic resynchronization procedure, | |
177 you must also specify an ©eof tokenª so that the | |
178 synchronizer doesn't inadvertently skip over the end of | |
179 file. | |
180 | |
181 An alternative technique for resynchronization is called | |
182 ©error token resynchronizationª. | |
183 ## | |
184 | |
185 Auxiliary Trace | |
186 | |
187 An Auxiliary Trace is a pre-built grammar trace which | |
188 you may select from the ©Auxiliary Windowsª popup menu for | |
189 most windows which display parser state information. | |
190 The Auxiliary Trace provides a path to the state | |
191 specified in the highlighted line of the primary window. | |
192 | |
193 When obtained | |
194 from the Parser Stack pane of the ©File Traceª or ©Grammar Traceª, the | |
195 Auxiliary Trace is simply a copy of the current status of these | |
196 traces so you can explore your alternatives while still retaining the | |
197 status of the original trace for reference. | |
198 ## | |
199 | |
200 Auxiliary Windows | |
201 | |
202 From most AnaGram windows you can pop up an Auxiliary Windows | |
203 menu by clicking the right mouse button or by pressing Shift F10. | |
204 Auxiliary Windows may | |
205 have Auxiliary Windows of their own. | |
206 | |
207 Windows with a cursor bar (highlighted line): | |
208 The windows available in the Auxiliary Windows menu depend on the | |
209 grammar elements identified by the cursor bar in the parent window. If | |
210 the cursor bar identifies a ©parser stateª, there will be windows that | |
211 describe the state. If the cursor bar identifies a ©grammar ruleª, | |
212 there will be windows that describe the rule. If the cursor bar | |
213 identifies a ©tokenª, there will be windows that describe the token. In | |
214 the case of a ©marked ruleª, token windows will describe the marked | |
215 token, if any. In some cases, specialized pre-built grammar traces | |
216 such as the ©Conflict Traceª or ©Auxiliary Traceª are on the menu. | |
217 | |
218 Help windows: | |
219 For Help windows, the Auxiliary Windows menu will show all the | |
220 available links to other ©Help topicsª from this window. ©Using Helpª | |
221 is always available. | |
222 ## | |
223 | |
224 Backtrack | |
225 | |
226 If your ©parserª does not continue after encountering a | |
227 ©syntax errorª, you can speed it up and make it a | |
228 little smaller by turning off the backtrack | |
229 ©configuration switchª. If backtrack is on, AnaGram | |
230 configures your parser so that in case of syntax error | |
231 it can undo any ©default reductionsª it might have made | |
232 as a consequence of the erroneous input. The purpose of | |
233 such an undo function is to identify the proper ©error | |
234 frameª and to maximize the probability of being able to | |
235 recover gracefully. | |
236 ## | |
237 | |
238 Empty Recursion | |
239 | |
240 This warning message tells you that the recursive step of the | |
241 specified ©recursive ruleª can be completely matched by ©zero | |
242 lengthª tokens, i.e., by nothing at all. | |
243 The result is potentially an infinite loop in the generated ©parserª. | |
244 The specified rule is an expansion rule of the specified token. | |
245 | |
246 Because of the possibility of encountering an infinite loop while parsing, | |
247 AnaGram turns off its ©keyword anomalyª analysis if empty recursion is | |
248 found. The ©File Traceª function is also disabled for the same reason. | |
249 | |
250 The ©circular definitionª of a token has the same effect as an | |
251 empty recursion, in that no additional input is required to match | |
252 the recursive rule. | |
253 | |
254 ## | |
255 Keyword Anomaly analysis aborted: empty recursion | |
256 | |
257 The ©keyword anomalyª analysis has been turned off, since the presence of | |
258 ©recursive ruleªs with ©empty recursionª can cause infinite loops in the analysis. | |
259 | |
260 ## | |
261 | |
262 Keyword Anomaly analysis aborted: circular definition | |
263 | |
264 The ©keyword anomalyª analysis has been turned off, since the presence of | |
265 a ©circular definitionª can cause infinite loops in the analysis. | |
266 | |
267 ## | |
268 | |
269 File Trace disabled: empty recursion | |
270 | |
271 Because of the presence of ©recursive ruleªs with ©empty recursionª in this grammar and | |
272 the infinite loops that can ensue, the ©File Traceª function has been | |
273 disabled. | |
274 | |
275 ## | |
276 | |
277 File Trace disabled: circular definition | |
278 | |
279 Because of the presence of a ©circular definitionª in this grammar and | |
280 the infinite loops that can ensue, the ©File Traceª function has been | |
281 disabled. | |
282 | |
283 ## | |
284 | |
285 | |
286 | |
287 Both Error Token Resynch and Auto Resynch Specified | |
288 | |
289 | |
290 | |
291 This ©warningª message indicates that your ©grammarª | |
292 defines an ©error tokenª and also requests ©automatic | |
293 resynchronizationª. AnaGram will ignore the request | |
294 for automatic resynchronization and will provide ©error | |
295 token resynchronizationª. If you named a token "error" | |
296 but do not wish ©error token resynchronizationª, you can | |
297 either rename "error", or, in a ©configuration | |
298 sectionª, you may explicitly specify the error token to | |
299 be something you don't otherwise use in your grammar: | |
300 [ error token = not used ] | |
301 ## | |
302 | |
303 Bottom Margin | |
304 | |
305 "Bottom margin" is an ©obsolete configuration parameterª. | |
306 ## | |
307 | |
308 Bright Background | |
309 | |
310 "Bright background" is a ©configuration switchª which | |
311 was used in the DOS version of AnaGram. It is no longer | |
312 used, but is still recognized for the sake of upward | |
313 compatibility with old ©configuration fileªs. | |
314 ## | |
315 | |
316 Build Parser | |
317 | |
318 You use the Build Parser command to create a ©parserª based on your | |
319 ©grammarª. The parser is a C file consisting of the ©embedded Cª (which | |
320 may include C++) code in your ©syntax fileª, your ©reduction | |
321 procedureªs, a number of tables derived from your grammar | |
322 specification, and a ©parsing engineª customized to your requirements. | |
323 | |
324 If you only wish to investigate your grammar and do not | |
325 wish to create ©output filesª, use the ©Analyze | |
326 Grammarª command. | |
327 ## | |
328 | |
329 Build <file name> | |
330 | |
331 This item on the ©Action Menuª is available when you have analyzed a | |
332 ©grammarª but you have not yet built it. It builds the grammar | |
333 without reloading the ©syntax fileª from the disk. | |
334 ## | |
335 | |
336 Cannot Make Wrapper for Default Token Type | |
337 | |
338 This ©warningª message occurs when AnaGram finds a token type that has | |
339 been previously defined as the ©default token typeª | |
340 listed in a ©wrapperª statement. If a wrapper is needed for a | |
341 particular type, you must specify the ©data typeª explicitly | |
342 for each relevant ©tokenª. | |
343 | |
344 As a result, a wrapper class has not been created for the specified token type. | |
345 ## | |
346 | |
347 Token with Wrapper cannot be Default Token Type | |
348 | |
349 This ©warningª message indicates that an attempt has been made | |
350 to specify a class that has previously been listed in a ©wrapperª | |
351 statement as the ©default token typeª. | |
352 If a wrapper is needed for a particular type, you must specify the | |
353 ©data typeª explicitly for each relevant ©tokenª. | |
354 | |
355 As a result, the default token type has not been set. | |
356 ## | |
357 | |
358 Case Sensitive | |
359 | |
360 "Case sensitive" is a ©configuration switchª which | |
361 defaults to on. When it is on, it instructs AnaGram to | |
362 build a parser for which all input is case sensitive. | |
363 When it is off, the AnaGram builds a parser which | |
364 ignores case for all input. | |
365 | |
366 If the ©iso latin 1ª configuration switch is turned | |
367 off, case conversion will be limited to characters | |
368 in the normal ascii range. When it is on, case | |
369 conversion will be done for all iso latin 1 characters. | |
370 | |
371 If you have other requirements for case conversion, | |
372 you may provide your own definition in your ©embedded cª for the | |
373 ©CONVERT_CASEª macro which is invoked to perform case | |
374 conversion on input characters. | |
375 | |
376 Note that the value of an input token is unaffected | |
377 by the case sensitive switch. When case sensitive is | |
378 off, 'a' and 'A' will be treated as the same input | |
379 token by the parser, but the ©token valueªs will | |
380 nevertheless be different. | |
381 ## | |
382 | |
383 C Prologue | |
384 | |
385 If you include a block of ©embedded Cª code at the very | |
386 beginning of your syntax file, it is called the "C | |
387 prologue". It will be copied to your ©parser fileª | |
388 before any of the code generated by AnaGram. You can | |
389 use the C prologue to ensure that copyright notices, | |
390 #include directives, or type definitions, for example, | |
391 occur at the very beginning of your parser file. | |
392 | |
393 If you specify a C or C++ type of your own definition, | |
394 you must provide a definition in the C prologue. | |
395 ## | |
396 | |
397 CHANGE_REDUCTION | |
398 | |
399 CHANGE_REDUCTION(t) is a macro which AnaGram defines in | |
400 your ©parser fileª if your ©parserª uses ©semantically | |
401 determined productionsª. In your ©reduction procedureª, | |
402 when you need to change the ©reduction tokenª you can | |
403 easily do so by calling CHANGE_REDUCTION with the name | |
404 of the desired token as the argument. If the token name | |
405 has embedded spaces, replace the embedded spaces with | |
406 underline characters. | |
407 ## | |
408 | |
409 Character Constant | |
410 | |
411 You may represent single characters in your ©grammarª by | |
412 using character constants. The rules for character | |
413 constants are the same as in C. The escape sequences | |
414 are as follows: | |
415 \a alert (bell) character | |
416 \b backspace | |
417 \f formfeed | |
418 \n newline | |
419 \r carriage return | |
420 \t horizontal tab | |
421 \v vertical tab | |
422 \\ backslash | |
423 \? question mark | |
424 \' single quote | |
425 \" double quote | |
426 \ooo octal number | |
427 \xhh hexadecimal number | |
428 | |
429 AnaGram treats a single | |
430 character as a ©character setª | |
431 which contains only the specified character. Therefore you | |
432 can use a character constant in a ©set expressionª. | |
433 ## | |
434 | |
435 Character Map | |
436 | |
437 The Character Map table shows you the mapping of input | |
438 characters to ©token numbersª. The ©ag_tcvª table in | |
439 your parser is based on the information in this table. | |
440 | |
441 The fields in this table are: | |
442 character code | |
443 display character, if any (what Windows displays for this code) | |
444 ©partition set numberª | |
445 ©token numberª | |
446 ©token representationª | |
447 | |
448 The display character will be what Windows displays for the character | |
449 code in the Data Tables font you have chosen. | |
450 ## | |
451 | |
452 Character Range | |
453 | |
454 A "character range" is a simple way to specify a | |
455 ©character setª. There are two ways to represent a | |
456 character range in an AnaGram ©syntax fileª. | |
457 | |
458 The first way is like a ©character constantª: 'a-z'. | |
459 | |
460 The second way allows somewhat greater freedom: | |
461 'a'..'z' | |
462 'a'..255 | |
463 ^Z..037 | |
464 -1..0xff | |
465 Here you use two arbitrary ©character representationsª | |
466 separated by two dots. If the two characters are out of | |
467 order, AnaGram will reverse the order, but will give | |
468 you a ©warningª. | |
469 | |
470 More complex ©character setsª may be specified by using | |
471 ©unionª, ©differenceª, ©intersectionª, or ©complementª | |
472 operators. | |
473 ## | |
474 | |
475 Character Representation | |
476 | |
477 In an AnaGram ©syntax fileª you may represent a | |
478 character literally with a ©character constantª or | |
479 numerically using decimal, octal or hexadecimal | |
480 representations following the conventions for C. Thus | |
481 'A', 65, 0101, and 0x41 all represent the same | |
482 character. Control characters can be represented using | |
483 the '^' character and either an upper or lower case | |
484 letter. Thus ^j and ^J are acceptable representations | |
485 of the ascii newline code. The rules for character | |
486 constants are identical to those in C, and the same | |
487 escape sequences are recognized. | |
488 ## | |
489 | |
490 Character Set | |
491 | |
492 In AnaGram grammars you can conveniently specify whole | |
493 sets of characters at a time. This avoids | |
494 needless repetition and complexity. | |
495 | |
496 Sets of characters may be defined in an AnaGram ©syntax | |
497 fileª in any of a number of ways. A single character is | |
498 taken to represent a character set consisting of a | |
499 single element. (See ©character representationª.) You | |
500 can also specify a set consisting of a range of | |
501 characters (see ©character rangeª) and perform the | |
502 familiar set operations, union, intersection, difference | |
503 and complement. | |
504 | |
505 All the sets you define in your syntax file are | |
506 summarized in the ©Character Setsª window. | |
507 | |
508 The ©unionª of two character sets, represented by a '+', | |
509 contains all characters that are in one or another of | |
510 the two sets. Thus, 'A-Z' + 'a-z' represents the set of | |
511 all upper and lower case letters. | |
512 | |
513 The ©intersectionª of two character sets, represented | |
514 by a '&', contains all characters that are in both | |
515 sets. Thus, suppose you have the ©definitionsª | |
516 letter = 'A-Z' + 'a-z' | |
517 hex digit = '0-9' + 'A-F' + 'a-f' | |
518 Then (letter & hex digit) contains precisely upper and | |
519 lower case a to f. | |
520 | |
521 The ©differenceª of two character sets, represented by | |
522 a '-', contains all characters that are in the first | |
523 set but not in the second set. Thus, using the same | |
524 definitions as above, (letter - hex digit) contains | |
525 precisely upper and lower case g to z. | |
526 | |
527 The ©complementª of a character set, represented by a | |
528 preceding '~', represents all characters in the | |
529 ©character universeª which are not in the given set. | |
530 Suppose you have defined a set, ©eofª, which consists of | |
531 the characters which represent end of file. Then, in | |
532 your grammar where you wish to accept an arbitrary | |
533 character, what you really want is anything but an end | |
534 of file character. You can define it thus: | |
535 anything = ~eof | |
536 ## | |
537 | |
538 Character Sets | |
539 | |
540 This window lists all of the distinct ©character setªs | |
541 which you defined, implicitly or explicitly, in your | |
542 ©grammarª. Each line in the table describes one such | |
543 set. | |
544 | |
545 The description takes the form of the internal set | |
546 number and the defining ©expressionª. The ©Auxiliary | |
547 Windowsª menu will allow you to see the ©Partition | |
548 Setsª which cover the character set, and the ©Set | |
549 Elementsª which it comprises, as well as the ©Token Usageª. | |
550 ## | |
551 | |
552 Character Universe, Universe | |
553 | |
554 The character universe, or set of all expected input | |
555 characters to your parser, is defined as all characters | |
556 in the range given by a particular lower bound and a | |
557 particular upper bound, as described below. | |
558 | |
559 The character universe is used for two things in | |
560 AnaGram. The first use is for calculating the | |
561 ©complementª of a character set. The second use is in | |
562 the input processing of your parser. Input characters | |
563 will be used to index a ©token conversionª table to | |
564 convert character codes to token numbers. The length of | |
565 this table will be given by the size of the character | |
566 universe. If you have set the ©test rangeª | |
567 ©configuration switchª you parser will verify that the | |
568 input character is within the range of the conversion | |
569 table. Otherwise, the character code will not be | |
570 checked for validity. In this case, an out-of-range | |
571 character will lead to undefined behavior. | |
572 | |
573 If you have not used any characters with negative codes | |
574 in your grammar, the lower bound is zero. Otherwise, it | |
575 is the most negative such character. | |
576 | |
577 If the highest character code you have used is less | |
578 than or equal to 255, the upper bound will be 255. | |
579 | |
580 If you have used a character code greater than 255, the | |
581 upper bound will be the largest such code which appears | |
582 in your syntax file. | |
583 ## | |
584 | |
585 Characteristic Rule | |
586 | |
587 Each ©parser stateª is characterized by a particular | |
588 set of ©grammar rulesª, and for each such rule, a | |
589 marked token which is the next ©tokenª expected. The | |
590 combination of a grammar rule and its marked token is often | |
591 called a ©marked ruleª. A marked rule which | |
592 characterizes a state is called a "characteristic | |
593 rule". In the course of doing ©grammar analysisª, | |
594 AnaGram determines the characteristic rules for each | |
595 ©parser stateª. After analyzing your grammar, you may | |
596 inspect the ©State Definition Tableª to see the | |
597 characteristic rules for any state in your parser. | |
598 ## | |
599 | |
600 Characteristic Token | |
601 | |
602 Every state in a ©parserª, except state 0, can be | |
603 characterized by the one, unique ©tokenª which causes a | |
604 jump to that state. That token is called the | |
605 ©characteristic tokenª of the state, because to get to | |
606 that ©parser stateª you must have just seen precisely | |
607 that token in the input. Note that several states could | |
608 have the same characteristic token. | |
609 | |
610 When you have a list of states, such as is given by the | |
611 ©parser state stackª, it is equivalent to a list of | |
612 characteristic tokens. This list of tokens is the list | |
613 of tokens that have been recognized so far by the | |
614 parser. | |
615 ## | |
616 | |
617 Circular Definition | |
618 | |
619 If the ©expansion ruleªs for a ©tokenª contain a ©grammar ruleª that | |
620 consists only of the token itself, the definition of the | |
621 token is circular. A circular definition is an extreme | |
622 case of ©empty recursionª. | |
623 | |
624 As in cases of empty recursion, the generated parser may contain | |
625 infinite loops. When such a condition is detected, therefore, | |
626 ©keyword anomalyª analysis the ©File Traceª option are disabled. | |
627 | |
628 ## | |
629 | |
630 column | |
631 | |
632 "column" is an integer field in your ©parser control | |
633 blockª used for keeping track of the column number of | |
634 the current character in your input. Line and column | |
635 numbers are tracked only if the ©lines and columnsª | |
636 ©configuration switchª has been set. | |
637 ## | |
638 | |
639 Command Line | |
640 | |
641 If you provide the name of a syntax file on the | |
642 command line when you start AnaGram, it will open | |
643 the file and run either ©Analyze Grammarª or ©Build | |
644 Parserª depending on the setting of the ©Autobuildª | |
645 switch. | |
646 ## | |
647 | |
648 Command Line Version, agcl.exe | |
649 | |
650 The command line version of AnaGram, agcl.exe, can be | |
651 used in make files. It takes the name of a single syntax | |
652 file on the command | |
653 line. Error and ©warningª messages are written to stdout. | |
654 | |
655 Normally you would only use the command line version once you | |
656 have finished developing your ©parserª and are integrating | |
657 it with the rest of your program. | |
658 | |
659 The command line version of AnaGram is not included with | |
660 trial copies. | |
661 ## | |
662 | |
663 Comment | |
664 | |
665 You may incorporate comments in your syntax file using | |
666 either of two conventions. The first is the normal C | |
667 convention for comments which begin with "/*" and end | |
668 with "*/". Such comments may be of arbitrary length. By | |
669 setting or resetting the ©nest commentsª switch, you | |
670 may control whether they may be nested or not. | |
671 | |
672 The second convention for comments is the C++ comment | |
673 convention. In this case the comment begins with "//" | |
674 and ends with a newline. | |
675 | |
676 When writing a ©grammarª, you may wish to allow a user | |
677 to comment his input freely without your having to | |
678 explicitly allow for comments in your grammar. You may | |
679 accomplish this by using the ©disregardª statement. | |
680 ## | |
681 | |
682 Compile Command | |
683 | |
684 "Compile command" is a ©configuration parameterª which | |
685 takes a string value. This parameter was used in the | |
686 DOS version of AnaGram, but is ignored in the Windows | |
687 version. | |
688 ## | |
689 | |
690 Complement | |
691 | |
692 In set theory, the complement of a set, S, is the set | |
693 of all elements of the ©universeª which are not members | |
694 of the set S. | |
695 | |
696 In AnaGram, the complement operator for ©character | |
697 setsª is given by '~' and has higher precedence than | |
698 ©differenceª, ©intersectionª, or ©unionª. | |
699 | |
700 In AnaGram, the most useful complement is that of the | |
701 end of file character set. For ordinary ascii files it | |
702 is often convenient to read the entire file into | |
703 memory, append a zero byte to the end, and define the | |
704 end of file set thus: | |
705 eof = 0 + ^Z. | |
706 Then, ~©eofª represents all legitimate input characters. | |
707 | |
708 You can then use set differences to specify certain | |
709 useful sets without tedious enumeration. For example, a | |
710 comment that is to be terminated by the end of line | |
711 then consists of characters from the set | |
712 comment char = ~'\n' & ~eof | |
713 This set could also be written | |
714 comment char = ~('\n' + eof) | |
715 ## | |
716 | |
717 Completed Rule | |
718 | |
719 A "completed rule" is a ©characteristic ruleª which has no ©marked | |
720 tokenª. In other words, it has been completely matched and will be | |
721 reduced by the next input. | |
722 | |
723 If there is more than one completed rule in a state, | |
724 the decision as to which to reduce is made based on the | |
725 next input token. If there is only one completed rule | |
726 in a state, it will be reduced by default unless the | |
727 ©default reductionsª switch has been reset, i.e., | |
728 turned off. | |
729 ## | |
730 | |
731 Configuration File | |
732 | |
733 If it can find them, AnaGram reads two configuration | |
734 files to set up ©configuration parameterªs. At program | |
735 initialization, it will first attempt to read a | |
736 configuration file in the directory that contains | |
737 the AnaGram executable file you are running. Then it | |
738 will read a configuration file in your working | |
739 directory. Both files should have the name | |
740 "AnaGram.cfg" if they exist. Neither is necessary. | |
741 | |
742 If a parameter is specified in both files, the | |
743 specification in the file from the working directory | |
744 takes precedence. | |
745 | |
746 The effect of this two stage process is to allow you to | |
747 set your standard preferences in the principal | |
748 directory, with specific overrides in your working | |
749 directories. | |
750 | |
751 The values for configuration parameters in ©syntax | |
752 filesª override those read from configuration files. | |
753 | |
754 AnaGram does not save configuration parameters in | |
755 the Windows registry, nor does it provide any | |
756 mechanism for setting or changing the values of | |
757 configuration parameters within AnaGram itself. | |
758 ## | |
759 | |
760 Configuration Parameter | |
761 | |
762 Configuration parameters may be specified either in | |
763 ©configuration filesª or in your ©syntax fileª. In your | |
764 syntax files, configuration parameters are specified, | |
765 one per line, in a ©configuration sectionª. | |
766 | |
767 AnaGram ignores case when identifying a configuration | |
768 parameter, so that "ALLOW MACROS", "Allow Macros", and | |
769 "allow macros" are all equivalent forms. | |
770 | |
771 There may be any number of configuration sections in a | |
772 ©syntax fileª. Any parameter may be specified any | |
773 number of times. Since AnaGram maintains only one value | |
774 in storage for these parameters, whenever it refers to | |
775 one it will see the most recently specified value. | |
776 Every configuration parameter has a default value which | |
777 has been chosen to correspond to a standard if it | |
778 exists, customary usage if such can be determined, or | |
779 otherwise to the most likely usage. | |
780 | |
781 Before executing an Analyze Grammar or Build Parser command, AnaGram | |
782 resets configuration parameters to their initial values, as | |
783 determined by the built in defaults and the configuration files read | |
784 at program initialization. | |
785 | |
786 The ©Configuration Parameters Windowª shows the current settings of all | |
787 of the configuration parameters. When this window is active you may | |
788 press ©F1ª or click with the ©help cursorª to pop up a help window | |
789 describing the parameter under the cursor bar. | |
790 | |
791 There are several varieties of configuration | |
792 parameters. Some simply set or reset a condition. These | |
793 need simply be stated to set the condition or negated | |
794 with the tilde (~) to reset the condition. Thus | |
795 [ nest comments ] | |
796 causes AnaGram to allow nested comments, and | |
797 [ ~nest comments ] | |
798 causes AnaGram to disallow nested comments. | |
799 | |
800 If you prefer you may explicitly specify a switch value as on or off: | |
801 [ nest comments = on] | |
802 | |
803 A second kind | |
804 of configuration parameter takes a value | |
805 which is the name of a token. Thus | |
806 [ grammar token = c grammar] | |
807 specifies that the token, c grammar, is the ©grammar | |
808 tokenª which is to be analyzed. | |
809 | |
810 A third variety of configuration parameter takes a | |
811 value which is a C data type. Thus | |
812 [ default token type = unsigned char *] | |
813 signifies that the ©semantic valueª of a token, unless | |
814 otherwise specified is a pointer to an unsigned char. | |
815 | |
816 A fourth variety of configuration parameter takes a | |
817 string value to set some ascii string used by AnaGram. | |
818 Thus | |
819 [ header file name = "widget.h" ] | |
820 signifies that the header file created by AnaGram | |
821 should be called "widget.h". | |
822 | |
823 In string-valued parameters used to specify the names | |
824 of output files or the name of your parser, you may use | |
825 the '#' character to indicate the name of your syntax | |
826 file: When the string is actually used, AnaGram will | |
827 substitute the syntax file name for the '#'. | |
828 | |
829 In string-valued parameters used to specify the names | |
830 of functions or variables that AnaGram generates, you | |
831 may use '$' to specify the name of your parser. When | |
832 the string is actually used, AnaGram will substitute | |
833 the name of your parser for the '$'. | |
834 | |
835 In the "©enum constant nameª" configuration parameter | |
836 you may use '%' to specify where a token name is to be | |
837 substituted. | |
838 | |
839 The final variety of configuration parameter takes a | |
840 numeric value. The value may be decimal, octal | |
841 or hexadecimal, following the C conventions, and may | |
842 have an optional sign. Thus | |
843 [parser stack size = 50] | |
844 tells AnaGram to allocate space for at least fifty stack entries | |
845 when it creates your parser. | |
846 ## | |
847 | |
848 Configuration Parameters Window | |
849 | |
850 The Configuration Parameters window lists the | |
851 ©configuration parameterªs AnaGram accepts with their | |
852 current values, as set by the ©configuration filesª it | |
853 has read and by the most recent ©syntax fileª it has | |
854 analyzed. Configuration parameters cannot be changed | |
855 from within AnaGram. | |
856 ## | |
857 | |
858 Configuration Section | |
859 | |
860 A configuration section is one of the main divisions of | |
861 your ©syntax fileª. It begins with a left square | |
862 bracket on a fresh line. It then contains definitions | |
863 of ©configuration parameterªs, ©configuration switchª | |
864 settings and ©attribute statementªs. These | |
865 specifications must each start on a new line. The | |
866 configuration section is closed with a right bracket. | |
867 Any further component of your syntax file, other than a | |
868 ©commentª, must start on a fresh line. | |
869 | |
870 There can be any number of configuration sections in a | |
871 syntax file. | |
872 ## | |
873 | |
874 Configuration Switch | |
875 | |
876 A configuration switch is a ©configuration parameterª | |
877 which can take on only the two values true and false, | |
878 or on and off. You set a configuration switch, or turn | |
879 it on, by simply naming it in your ©configuration fileª | |
880 or in a ©configuration sectionª of your ©syntax fileª. | |
881 You turn it off, or "reset" it, by use of the tilde: | |
882 "~nest comments", for example, resets, or turns off, | |
883 the ©nest commentsª switch. If you prefer, you may | |
884 assign the value "on" to set the switch, or "off" to | |
885 reset it. For example: | |
886 nest comments = on | |
887 ## | |
888 | |
889 Conflict | |
890 | |
891 "Conflicts" arise during the ©grammar analysisª when | |
892 AnaGram cannot determine how to treat a given input | |
893 token. There are two sorts of conflicts: ©shift-reduce | |
894 conflictsª and ©reduce-reduce conflictsª. Conflicts may | |
895 arise either because the grammar is inherently | |
896 ambiguous, or simply because the grammar analyzer | |
897 cannot look far enough ahead to resolve the conflict. | |
898 In the latter case, it is often possible to rewrite the | |
899 grammar in such a way as to eliminate the conflict. In | |
900 particular, ©null productionsª are a common source of | |
901 conflicts. | |
902 | |
903 When AnaGram analyzes your grammar, it lists all | |
904 unresolved conflicts in the ©Conflictsª window. A number | |
905 of ©Auxiliary Windowsª available from the Conflicts window | |
906 provide help in identifying the source of the conflict. | |
907 | |
908 There are a number of ways to deal with conflicts. If | |
909 you understand the conflict well, you may simply choose | |
910 to ignore it. When AnaGram encounters a shift-reduce | |
911 conflict while building parse tables it resolves it by | |
912 choosing the ©shift actionª. When AnaGram encounters a | |
913 reduce-reduce conflict while building parse tables, it | |
914 resolves it by selecting the ©grammar ruleª which | |
915 occurred first in the grammar. | |
916 | |
917 A second way to deal with conflicts is to set ©operator | |
918 precedenceª parameters. If you set these parameters, | |
919 AnaGram will use them preferentially to resolve | |
920 conflicts. Any conflicts so resolved will be listed in | |
921 the ©Resolved Conflictsª window. | |
922 | |
923 A third way to resolve a conflict is to declare some | |
924 tokens as ©stickyª. This is particularly useful for | |
925 ©productionªs whose sole purpose is to skip over | |
926 uninteresting input. | |
927 | |
928 A fourth way to resolve conflicts is to declare a token | |
929 to be a ©subgrammarª. When you do this, AnaGram does | |
930 not look beyond the definition of the subgrammar token | |
931 itself for reducing tokens. This is not a particularly | |
932 selective way to resolve conflicts and should be used | |
933 only when the subgrammar token is naturally defined | |
934 only by internal criteria. The tokens identified by | |
935 lexical scanners are prime examples of this genre. | |
936 | |
937 The fifth way to deal with conflicts is to rewrite the | |
938 grammar to eliminate them. Many people prefer this | |
939 approach since it yields the highest level of | |
940 confidence in the resulting program. | |
941 | |
942 Please refer to the AnaGram User's Guide for more information about | |
943 dealing with conflicts. | |
944 ## | |
945 | |
946 Conflicts | |
947 | |
948 If there are ©conflictªs in your grammar which are not | |
949 resolved by ©precedence rulesª, they will be listed in | |
950 the Conflicts window. The Conflicts window will also be | |
951 listed in the ©Browse Menuª. Conflicts which have been | |
952 resolved by ©precedence rulesª are listed in the | |
953 ©Resolved Conflictsª window. | |
954 | |
955 The Conflicts window lists the conflicts, or | |
956 ambiguities, which AnaGram found in your grammar. The | |
957 table identifies the ©parser statesª in which it found | |
958 conflicts, the ©conflict tokenªs for which it had more | |
959 than one option, and the ©marked rulesª for each such | |
960 option. If one of the rules for a particular conflict | |
961 has a ©marked tokenª, the conflict is | |
962 a ©shift-reduce conflictª. The marked token is the token | |
963 to be shifted. If none of the rules has a marked token the conflict is | |
964 a ©reduce-reduce conflictª. | |
965 | |
966 AnaGram provides a number of ©Auxiliary Windowsª to help | |
967 you find and fix the source of the conflict. The | |
968 ©Conflict Traceª window is a pre-built ©Grammar Traceª | |
969 window which shows you one of perhaps many ways to | |
970 encounter the conflict. The ©Reduction Traceª window | |
971 shows the result of reducing a particular ambiguous | |
972 rule. | |
973 | |
974 In addition, the ©Rule Derivationª and ©Token | |
975 Derivationª windows show you why the conflict token is a | |
976 ©reducing tokenª. They are particularly useful for | |
977 shift-reduce conflicts. | |
978 | |
979 The ©Expansion Chainª window is helpful for understanding | |
980 reduce-reduce conflicts. | |
981 | |
982 Other Auxiliary Windows which are often useful are the | |
983 ©State Definitionª window, the ©Reduction Statesª | |
984 window, and the ©Problem Statesª window. | |
985 | |
986 Please refer to the AnaGram User's Guide for more information on how to | |
987 deal with conflicts. | |
988 ## | |
989 | |
990 Conflicts Resolved by Precedence Rules | |
991 | |
992 This ©warningª message indicates that AnaGram has | |
993 resolved conflicts in your grammar by using ©precedence | |
994 rulesª: guidelines you supplied either by explicit | |
995 ©precedence declarationsª, by using a ©stickyª | |
996 statement or ©distinguish lexemesª statement, or | |
997 implicitly by using a ©disregardª statement. These | |
998 conflicts are listed in the ©Resolved Conflictsª | |
999 window, and are not listed in the ©Conflictsª window. | |
1000 ## | |
1001 | |
1002 Conflict Token | |
1003 | |
1004 In any given ©conflictª, there is a ©tokenª for which | |
1005 an unambiguous ©parser actionª cannot be determined. | |
1006 This token is called the "conflict token". | |
1007 ## | |
1008 | |
1009 Conflict Trace | |
1010 | |
1011 The Conflict Trace is a ready-made ©Grammar Traceª | |
1012 which shows you one of perhaps many ways to get to the | |
1013 state which has the ©conflictª selected by the cursor | |
1014 bar. The Conflict Trace window is an option in the | |
1015 ©Auxiliary Windowsª menu for the ©Conflictsª window and | |
1016 the ©Resolved Conflictsª window. | |
1017 ## | |
1018 | |
1019 Const Data | |
1020 | |
1021 The const data ©configuration switchª controls the use | |
1022 of CONST qualifiers in generated code. If the switch is | |
1023 set, all fixed data arrays in the ©parser fileª will be | |
1024 qualified as CONST, unless the ©old styleª switch is | |
1025 set. The default setting is ON. Other configuration | |
1026 switches which control declaration qualifiers in the | |
1027 parser file are ©near functionsª and ©far tablesª. | |
1028 ## | |
1029 | |
1030 CONTEXT | |
1031 | |
1032 "CONTEXT" is a macro which AnaGram defines for you if | |
1033 you have defined a ©context typeª. It provides access | |
1034 to the top value of the ©context stackª. Your | |
1035 ©GET_CONTEXTª macro may store the current context by | |
1036 assigning a value to CONTEXT. Suppose your parser uses | |
1037 ©pointer inputª, and you wish to know the value of the | |
1038 ©pointerª for every production. You could define | |
1039 GET_CONTEXT thus: | |
1040 #define GET_CONTEXT CONTEXT = PCB.pointer | |
1041 | |
1042 In ©reduction procedureªs, you may use the CONTEXT | |
1043 macro to find the context for the rule you are | |
1044 reducing, that is to say, the value the context | |
1045 variables had when the first token in the rule was | |
1046 encountered. | |
1047 ## | |
1048 | |
1049 Context Stack | |
1050 | |
1051 It is often convenient, when writing ©reduction | |
1052 procedureªs, to know the actual context of the ©grammar | |
1053 ruleª your procedure is reducing. To do this you need | |
1054 to know the values that certain variables, such as | |
1055 stack pointers, or input pointers, in your program had | |
1056 at various stages as your parser matched the rule. You | |
1057 can accomplish this by maintaining a context stack. | |
1058 | |
1059 If you wish, AnaGram will keep track, on a stack, of any | |
1060 context variables you wish. To do so, define a structure | |
1061 which can hold all the values you need to stack. Use the | |
1062 ©context typeª ©configuration parameterª to tell AnaGram | |
1063 how to declare the stack. Then define the ©GET_CONTEXTª | |
1064 macro to gather the appropriate values and store them on | |
1065 the stack. The ©CONTEXTª macro evaluates to the proper | |
1066 location into which the GET_CONTEXT macro should store | |
1067 the context value. AnaGram will invoke the GET_CONTEXT | |
1068 macro whenever necessary to make sure the right values | |
1069 are stacked. In a reduction procedure, you can then use | |
1070 the macro ©RULE_CONTEXTª to find the value of the | |
1071 context structure as of the beginning of each token in | |
1072 the rule you are reducing. | |
1073 | |
1074 If your parser is ©event drivenª, store the context of | |
1075 the input token in PCB.input_context. The default | |
1076 version of GET_CONTEXT will stack the context as | |
1077 appropriate. | |
1078 | |
1079 If your parser should encounter an error, you may use | |
1080 ©ERROR_CONTEXTª to determine the values of the context | |
1081 variables at the beginning of the aborted grammar rule. | |
1082 ## | |
1083 | |
1084 context type | |
1085 | |
1086 "Context type" is a ©configuration parameterª whose | |
1087 value is a C type name, possibly as defined by a | |
1088 typedef statement. By default, "context type" is | |
1089 undefined. If you define it, AnaGram will set up a | |
1090 ©context stackª in your ©parser control blockª so you | |
1091 can track the context of ©productionªs. | |
1092 | |
1093 Each time your parser pushes values onto the state | |
1094 stack and value stack it will invoke the ©GET_CONTEXTª | |
1095 macro to store the current context on the context | |
1096 stack. The macro ©CONTEXTª names the current stack | |
1097 location. In your GET_CONTEXT macro you can use it as | |
1098 the destination for the current context. In a | |
1099 ©reduction procedureª, CONTEXT names the context as of | |
1100 the beginning of the production. Two other macros are | |
1101 available to inspect the values of the context stack. | |
1102 In a reduction procedure, you may use ©RULE_CONTEXTª[k] | |
1103 to determine the value of the context variable as it | |
1104 was as of the (k+1)th token in the rule. In particular, | |
1105 RULE_CONTEXT[0] is the value the context variable had | |
1106 when the first token in the rule was seen. | |
1107 | |
1108 If you enable the ©error frameª ©configuration switchª, | |
1109 you may use ©ERROR_CONTEXTª to determine the context of | |
1110 the production your parser was trying to identify at | |
1111 the time of the error. | |
1112 ## | |
1113 | |
1114 CONVERT_CASE | |
1115 | |
1116 CONVERT_CASE is a user definable macro which AnaGram | |
1117 invokes to convert the case of input characters when | |
1118 the ©case sensitiveª switch has been turned off. If | |
1119 you do not define the macro yourself, AnaGram will | |
1120 provide a macro which will convert case correctly | |
1121 for characters in the ASCII character range and | |
1122 also for ©ISO latin 1ª characters if the corresponding | |
1123 ©configuration switchª is on. | |
1124 | |
1125 ## | |
1126 | |
1127 Coverage File Name | |
1128 | |
1129 If you have set the ©rule coverageª ©configuration | |
1130 switchª to include coverage analysis in your parser, | |
1131 AnaGram uses the value of the coverage file name | |
1132 ©configuration parameterª to find the results of your | |
1133 testing. The value of the parameter is a string. The | |
1134 default value is "#.nrc", where '#' represents the name | |
1135 of your syntax file. | |
1136 ## | |
1137 | |
1138 cs | |
1139 | |
1140 cs is a field in a ©parser control blockª which | |
1141 contains your ©context stackª. cs will be defined only | |
1142 if you have defined the ©configuration parameterª | |
1143 ©context typeª. | |
1144 ## | |
1145 | |
1146 Current Grammar | |
1147 | |
1148 The Current Grammar is the ©grammarª you presently have | |
1149 loaded. Its name is displayed on the title bar of | |
1150 each AnaGram window. | |
1151 | |
1152 A status field at the right center of the ©Control Panelª | |
1153 indicates the state of processing that has been | |
1154 carried out on the grammar. | |
1155 | |
1156 "Loaded" means that the ©syntax fileª has been read | |
1157 into memory, but that syntax errors have been found. | |
1158 | |
1159 "Parsed" means that AnaGram has tried to analyze the | |
1160 grammar, but got into some kind of difficulty and did | |
1161 not complete the job. The explanation should be | |
1162 apparent from the messages in the ©Warningsª window. | |
1163 | |
1164 "Analyzed" means that a ©grammar analysisª has been | |
1165 completed, but no ©output filesª have been written. | |
1166 | |
1167 "Built" means that an analysis has been completed and | |
1168 output files have been written. | |
1169 ## | |
1170 | |
1171 Data Type | |
1172 | |
1173 The ©tokensª in your ©parserª usually have ©semantic | |
1174 valuesª. The data types for these values will be | |
1175 determined by the ©default input typeª and ©default | |
1176 token typeª ©configuration parameterªs unless you | |
1177 explicitly provide ©token declarationsª in your grammar. | |
1178 You may also define the data type for any ©nonterminalª | |
1179 token by preceding the token name with an ordinary C | |
1180 cast when you write a production. For example: | |
1181 | |
1182 (int) integer | |
1183 -> '0-9':d =d-'0'; | |
1184 -> integer:n, '0-9':d =10*n + d - '0'; | |
1185 | |
1186 The data type may be any simple C or C++ data type, with | |
1187 arbitrary indirection and qualification. You may also | |
1188 use any type you have defined by means of typedef, | |
1189 struct or class definitions. Template classes may also | |
1190 be used. If you specify a type of your own definition, | |
1191 you must provide a definition in the ©C prologueª at the | |
1192 beginning of your ©syntax fileª. | |
1193 | |
1194 A token may have the type "void" if its value has no | |
1195 interest for the parser. Since your parser will not | |
1196 stack a value for a void token, your parser may run | |
1197 somewhat faster when tokens are declared as void. | |
1198 ## | |
1199 | |
1200 Declare pcb | |
1201 | |
1202 "Declare pcb" is a ©configuration switchª that defaults | |
1203 to on. If this switch is set when you invoke the ©Build | |
1204 Parserª command, AnaGram will automatically declare a | |
1205 ©parser control blockª for you, at the beginning of | |
1206 your parser file. If you have used data types that you | |
1207 define yourself, the typedef statements need to precede | |
1208 the parser control block declaration. In this case, you | |
1209 should turn "declare pcb" off and declare it yourself. | |
1210 | |
1211 For more information, see the AnaGram User's Guide. | |
1212 ## | |
1213 | |
1214 Default Input Type | |
1215 | |
1216 The default input type is a ©configuration parameterª | |
1217 which determines the ©data typeª for the ©semantic | |
1218 valueªs of ©terminal tokensª if they are not explicitly | |
1219 declared. Normally, you would explicitly declare | |
1220 terminal tokens only when you have set the ©input | |
1221 valuesª ©configuration switchª. If you do not set the | |
1222 default input type, it will default to "int". | |
1223 | |
1224 The default data type for the values of ©nonterminal | |
1225 tokensª is given by the ©default token typeª | |
1226 configuration parameter. | |
1227 ## | |
1228 | |
1229 Default Reduction | |
1230 | |
1231 "Default reductions" is a ©configuration switchª which | |
1232 defaults to on. | |
1233 | |
1234 A "default reduction" is a ©parser actionª which may be | |
1235 used in your parser in any state which has precisely | |
1236 one ©completed ruleª. | |
1237 | |
1238 If a given ©parser stateª has, among its ©characteristic | |
1239 rulesª, exactly one completed rule, it is usually faster | |
1240 to reduce it on any input than to check specifically for | |
1241 correct input before reducing it. The only time this | |
1242 default reduction causes trouble is in the event of a | |
1243 ©syntax errorª. In this situation you may get an | |
1244 erroneous reduction. Normally when you are parsing a | |
1245 file, this is inconsequential because you are not going | |
1246 to continue semantic action in the presence of error. | |
1247 But, if you are using your parser to handle real-time | |
1248 interactive input, you have to be able to continue | |
1249 semantic processing after notifying your user that he | |
1250 has entered erroneous input. In this case you would want | |
1251 default reductions to have been turned off so that | |
1252 ©productionªs are reduced only when there is correct | |
1253 input. | |
1254 ## | |
1255 | |
1256 Default reduction value | |
1257 | |
1258 If a ©grammar ruleª does not have a ©reduction procedureª | |
1259 the ©semantic valueª of the first token in the rule will | |
1260 be taken as the semantic value of the token on the left | |
1261 hand side. If these tokens do not have the same ©data typeª | |
1262 a ©warningª will be given. | |
1263 ## | |
1264 | |
1265 Default Token Type | |
1266 | |
1267 "Default token type" is a ©configuration parameterª | |
1268 which determines the ©data typeª for the ©semantic | |
1269 valueª of a ©nonterminal tokenª if no other type is | |
1270 explicitly specified. It defaults to void. Therefore, if | |
1271 any ©reduction procedureª returns a value, you must | |
1272 either explicitly set the type of the ©reduction tokenª | |
1273 or you must set default token type to an appropriate | |
1274 value. | |
1275 | |
1276 The default token type cannot have a ©wrapperª class | |
1277 defined. | |
1278 | |
1279 The default data type for the value of a ©terminal | |
1280 tokenª is given by the ©default input typeª | |
1281 configuration parameter. | |
1282 ## | |
1283 | |
1284 Definition, Definition Statement | |
1285 | |
1286 AnaGram syntax files may contain definition statements | |
1287 which assign new names to ©character setsª, ©virtual | |
1288 productionsª, ©keyword stringsª, ©immediate actionsª, | |
1289 or ©tokensª. Definitions have the form | |
1290 name = <character set> | |
1291 name = <virtual production> | |
1292 name = <keyword string> | |
1293 name = <immediate action> | |
1294 name = <token name> | |
1295 | |
1296 For example, | |
1297 letter = 'a-z' + 'A-Z' | |
1298 statement list = statement?... | |
1299 include = "include" | |
1300 | |
1301 The symbols thus defined may be used anywhere the | |
1302 expression on the right hand side might be used. Such | |
1303 definitions, in and of themselves, do not define tokens. | |
1304 Tokens are defined only by their usage in productions. | |
1305 | |
1306 ## | |
1307 | |
1308 DELETE_WRAPPERS | |
1309 | |
1310 If your parser uses ©wrapperªs and exits with an error condition, there | |
1311 may be objects remaining on the ©parser value stackª. The DELETE_WRAPPERS macro | |
1312 can be used to delete any remaining objects on the stack. | |
1313 If you have enabled | |
1314 ©auto resynchª, DELETE_WRAPPERS will be invoked automatically. | |
1315 ## | |
1316 | |
1317 Diagnose Errors | |
1318 | |
1319 "Diagnose errors" is a ©configuration switchª which | |
1320 defaults to on. When this switch is on, AnaGram includes a | |
1321 function, ag_diagnose(), in your parser which provides simple | |
1322 syntax error disgnoses. When your parser encounters a | |
1323 syntax error, this function will be called immediately prior | |
1324 to the invocation of the ©SYNTAX_ERRORª macro. A pointer to the message will be | |
1325 stored in the ©error_messageª field of the ©parser control blockª. | |
1326 | |
1327 If you wish to implement your own ©error diagnosisª, you | |
1328 should turn this switch off, and include a call to your | |
1329 own diagnostic procedure in your SYNTAX_ERROR macro. | |
1330 | |
1331 ag_diagnose() provides three possible error messages, | |
1332 governed by three macros: ©MISSING_FORMATª, ©UNEXPECTED_FORMATª, and | |
1333 ©UNNAMED_TOKENª. You may override the definitions of | |
1334 these macros with your own definitions if you wish | |
1335 to provide diagnostics in another language | |
1336 | |
1337 If you have set the ©error frameª | |
1338 switch it will also set the ©error_frame_tokenª field. | |
1339 The "error_frame_token" is the non-terminal token which | |
1340 the parser was trying to complete when the error was | |
1341 encountered. | |
1342 | |
1343 When the "diagnose errors" switch is set, AnaGram also | |
1344 includes the a ©token namesª table in the parser which | |
1345 contains the ascii names of the tokens in the grammar, | |
1346 including entries for character constants and keywords. | |
1347 | |
1348 Use the ©token names onlyª switch to limit the table | |
1349 to explicitly named tokens only. | |
1350 ## | |
1351 | |
1352 MISSING_FORMAT | |
1353 | |
1354 MISSING_FORMAT is a macro that is used by the error | |
1355 diagnositic function created by the ©diagnose errorsª | |
1356 switch. If you do not define it in your parser, | |
1357 AnaGram will define it thus: | |
1358 #define MISSING_FORMAT "Missing %s" | |
1359 | |
1360 This format is used when the diagnostic function can | |
1361 identify a unique terminal or nonterminal token that | |
1362 would satisfy the syntactic rules and is named | |
1363 in the ©token namesª table. | |
1364 ## | |
1365 | |
1366 UNEXPECTED_FORMAT | |
1367 | |
1368 UNEXPECTED_FORMAT is a macro that is used by the error | |
1369 diagnositic function created by the ©diagnose errorsª | |
1370 switch. If you do not define it in your parser, | |
1371 AnaGram will define it thus: | |
1372 #define UNEXPECTED_FORMAT "Unexpected %s" | |
1373 | |
1374 This format is used when the diagnostic function cannot | |
1375 identify a named, unique terminal or nonterminal token that | |
1376 would satisfy the syntactic rules and finds an | |
1377 incorrect token, the name of which can be found | |
1378 in the ©token namesª table. | |
1379 ## | |
1380 | |
1381 UNNAMED_TOKEN | |
1382 | |
1383 UNNAMED_TOKEN is a macro that is used by the error | |
1384 diagnositic function created by the ©diagnose errorsª | |
1385 switch. If you do not define it in your parser, | |
1386 AnaGram will define it thus: | |
1387 #define UNNAMED_TOKEN "input" | |
1388 | |
1389 This macro is used as argument for the ©UNEXPECTED_FORMATª | |
1390 macro when the actual, erroneous input cannot be identified. | |
1391 ## | |
1392 | |
1393 Difference | |
1394 | |
1395 In set theory, the difference of two sets, A and B, is | |
1396 defined to be the set of all elements of A that are not | |
1397 elements of B. In an AnaGram ©syntax fileª, you | |
1398 represent the difference of two ©character setsª by | |
1399 using the '-' operator. Thus the difference of A and B | |
1400 is A - B. The difference operator is ©left | |
1401 associativeª. | |
1402 ## | |
1403 | |
1404 Disregard | |
1405 | |
1406 The purpose of the "disregard" statement is to skip over | |
1407 uninteresting ©white spaceª and comments in your input | |
1408 file. It allows you to specify a token that should be | |
1409 passed over in the input to your parser. The statement | |
1410 takes the form: | |
1411 disregard ws | |
1412 where "ws" is a token name or character set. Disregard | |
1413 statements, like other ©attribute statementªs, may be | |
1414 placed in any ©configuration sectionª. | |
1415 | |
1416 You may have more than one disregard statement in your | |
1417 ©grammarª. If you do, AnaGram will create a shell | |
1418 production. For example, suppose you write: | |
1419 [ disregard alpha | |
1420 disregard beta ] | |
1421 AnaGram will proceed as though you had written: | |
1422 gamma -> alpha | beta | |
1423 [ disregard gamma ] | |
1424 | |
1425 It frequently happens that you wish your ©parserª to | |
1426 disregard blanks or comments, except that ©white spaceª | |
1427 within names, numbers, strings, and other elementary | |
1428 constructs is subject to special rules and thus should | |
1429 not be disregarded blindly. In this case, you can use | |
1430 the "©lexemeª" statement to declare these constructs off | |
1431 limits for the disregard statement. Within these | |
1432 constructs, the disregard statement will be inoperative | |
1433 and the admissibility of white space is determined | |
1434 solely by the productions which define these constructs. | |
1435 | |
1436 Outside those productions which define lexemes, you | |
1437 should not generally use a token which is supposed to be | |
1438 disregarded. If you do, your grammar will have | |
1439 ©conflictªs, since the token could satisfy both the | |
1440 explicit usage, as well as the implicit rules set up by | |
1441 the disregard statement. Such conflicts, however, are | |
1442 resolved automatically in favor of your explicit use of | |
1443 the token. The conflicts will appear in the ©Resolved | |
1444 Conflictsª window. | |
1445 | |
1446 If you have "open ended" lexemes in your grammar such | |
1447 as variable names or numeric constants, your grammar | |
1448 will detect a conflict if one of these lexemes may | |
1449 follow another such lexeme immediately. To deal with | |
1450 these conflicts, you should turn on the "©Distinguish | |
1451 Lexemesª" configuration switch. It will cause white | |
1452 space to be required as a separator between the | |
1453 lexemes. | |
1454 | |
1455 In order to implement the "disregard" statement AnaGram | |
1456 will redefine some tokens in your grammar. For example, | |
1457 '+' may be redefined to consist of a simple plus sign | |
1458 followed by optional white space: | |
1459 '+' -> '+'%, white space?... | |
1460 The ©percent signª is used to indicate the original, | |
1461 simple plus without the optional white space attached. | |
1462 You will probably notice the percent sign appearing in | |
1463 some windows and traces. | |
1464 ## | |
1465 | |
1466 distinguish keywords | |
1467 | |
1468 "distinguish keywords" is an ©attribute statementª | |
1469 which you may include in a ©configuration sectionª. It | |
1470 is used to tell AnaGram how to distinguish ©keywordªs | |
1471 from similar sequences of characters in your input | |
1472 stream. For example, you may want your parser to | |
1473 recognize "int" as a keyword when it appears in the | |
1474 following context: | |
1475 int x; | |
1476 but not when in appears in the middle of such words as | |
1477 "integral" and "intolerant". The operand of | |
1478 "distinguish keywords" is a list of character set | |
1479 ©expressionªs separated by commas and enclosed in braces | |
1480 ({ }). | |
1481 | |
1482 Once AnaGram has read your entire syntax file, it | |
1483 evaluates all of these character sets and tests each | |
1484 keyword string against the character sets in the order | |
1485 in which they were encountered in the program. If all | |
1486 the characters which constitute a particular keyword | |
1487 are members of the specified set, the keyword logic is | |
1488 set up so that it will recognize the keyword only if | |
1489 the immediately following character is not in the set. | |
1490 | |
1491 In the example above, | |
1492 [distinguish keywords {'a-z'} ] | |
1493 will do the trick. | |
1494 | |
1495 The "©stickyª" statement also affects the recognition | |
1496 of keywords. | |
1497 ## | |
1498 | |
1499 Distinguish Lexemes | |
1500 | |
1501 The "distinguish lexemes" ©configuration switchª is | |
1502 used in conjunction with the "©disregardª" statement | |
1503 and the "©lexemeª" statement to resolve the | |
1504 ©shift-reduce conflictªs which often crop up when | |
1505 suppressing white space. | |
1506 | |
1507 The difficulty with suppressing white space is that you | |
1508 wish it to be optional in cases like "x+y", where it is | |
1509 not necessary in order to parse correctly, but you want | |
1510 to require it in situations such as "mytype x", where | |
1511 it is necessary to separate otherwise indistinguishable | |
1512 constructs. If the white space were optional, it would | |
1513 be necessary to allow for "mytypex", but it would be | |
1514 impossible to determine if this were to be interpreted as | |
1515 "mytype x", "mytyp ex", or any of the many other | |
1516 possibilities. | |
1517 | |
1518 The distinguish lexemes switch causes AnaGram to make | |
1519 the white space optional where doing so causes no | |
1520 ambiguity and makes it mandatory where to make it | |
1521 optional would lead to ambiguity. In the example given | |
1522 above, "mytypex" would be treated as a single name, and | |
1523 another name would have to follow separating white | |
1524 space. | |
1525 | |
1526 The default value for distinguish lexemes is OFF. It is | |
1527 anticipated that this will be changed to ON in future | |
1528 releases of AnaGram. | |
1529 ## | |
1530 | |
1531 Duplicate Production | |
1532 | |
1533 This ©warningª message appears when a ©productionª | |
1534 appears twice in your ©grammarª. You will have a | |
1535 number of ©reduce-reduce conflictªs as a consequence. | |
1536 Eliminate the duplicate, and the conflicts it caused | |
1537 will go away. | |
1538 ## | |
1539 | |
1540 Edit Command | |
1541 | |
1542 "Edit command" is a ©configuration parameterª which | |
1543 accepts a string value. It is no longer used and is | |
1544 retained only for file compatiblity with the DOS | |
1545 version of AnaGram. | |
1546 ## | |
1547 | |
1548 Embedded C | |
1549 | |
1550 You may encapsulate pieces of C or C++ code in your ©syntax | |
1551 fileª more or less arbitrarily. Such pieces of code will | |
1552 simply be copied to the ©parser fileª in the order in | |
1553 which they are encountered. Each such piece of code must | |
1554 be enclosed with braces({}). The left brace must be on a | |
1555 new line, and nothing except comments may follow the | |
1556 right brace. AnaGram does not inspect the interior of | |
1557 such a piece of C code except to identify character | |
1558 constants, strings, comments and blocks surrounded with | |
1559 braces so that it does not identify the end of the | |
1560 embedded C prematurely. Note that AnaGram will use the | |
1561 status of the ©nest commentsª ©configuration switchª in | |
1562 effect at the beginning of the embedded C. | |
1563 | |
1564 AnaGram, of course, can be confused by unterminated | |
1565 strings, unbalanced brackets, and unterminated comments. | |
1566 The most likely outcome, in such a situation, is that | |
1567 AnaGram will encounter an end of file looking for the | |
1568 end of the embedded C. Should this happen, AnaGram will | |
1569 identify the beginning of the piece of embedded C which | |
1570 caused the problem. | |
1571 | |
1572 If your syntax file begins with a block of embedded C, | |
1573 called the "©C prologueª", it will be copied to the very | |
1574 beginning of the parser file, preceding all of AnaGram's | |
1575 output. You may use such an initial block of embedded C | |
1576 to guarantee that program title comments, copyright | |
1577 notices and important definitions are at the very | |
1578 beginning of your parser file. | |
1579 | |
1580 The code you include as embedded C, of course, has to | |
1581 coexist with the code AnaGram generates. In order to | |
1582 keep the potential for name conflicts to a minimum, all | |
1583 variables and functions which AnaGram defines begin with | |
1584 the letters "ag_". You should avoid variable names which | |
1585 begin with these letters. | |
1586 | |
1587 If AnaGram finds no embedded C in a syntax file, and you | |
1588 ask it to build a parser, it will automatically generate | |
1589 a main program that calls your parser. If you don't want | |
1590 it to do this, you may turn off the ©main programª | |
1591 ©configuration switchª. | |
1592 ## | |
1593 | |
1594 Empty Keyword String | |
1595 | |
1596 This ©warningª appears when you have a keyword string | |
1597 that contains no characters whatsoever. ©Keyword | |
1598 stringsª must contain at least one character. If you | |
1599 wish a null match, use a ©null productionª instead. | |
1600 ## | |
1601 | |
1602 Enable Mouse | |
1603 | |
1604 "Enable mouse" is a ©configuration switchª that defaults | |
1605 to on. It is not used in the Windows version of AnaGram | |
1606 and has been retained only for file compatibility with | |
1607 the DOS version. | |
1608 ## | |
1609 | |
1610 Enum Constant Name | |
1611 | |
1612 The "enum constant name" ©configuration parameterª | |
1613 allows you to select the name AnaGram will use for the | |
1614 set of enumeration constants it defines in the ©parser | |
1615 headerª file for your ©parserª. The value of "enum | |
1616 constant name" should be a string containing the '%' | |
1617 character. AnaGram will substitute each token name in | |
1618 turn into this template as it creates the list of | |
1619 enumeration constants. If it finds a '$' character it | |
1620 will substitute the name of your parser. The default | |
1621 value of "enum constant name" is "$_%_token". | |
1622 ## | |
1623 | |
1624 Enumeration Constants | |
1625 | |
1626 In your ©parser headerª file, AnaGram includes a typedef | |
1627 enum statement which provides enumeration constants | |
1628 corresponding to all the named constants in your | |
1629 grammar. The names of the enumeration constants | |
1630 themselves are defined by the ©enum constant nameª | |
1631 ©configuration parameterª. These constants are useful | |
1632 when dealing with ©semantically determined productionsª. | |
1633 ## | |
1634 | |
1635 Enum | |
1636 | |
1637 Within a ©configuration sectionª, you may use an "enum" | |
1638 statement to define numeric values for any number of | |
1639 tokens just as you define enumeration constants in C. | |
1640 The syntax is effectively the same as the enum statement | |
1641 in C: | |
1642 | |
1643 [ | |
1644 enum { | |
1645 first = 60, | |
1646 second, | |
1647 third, | |
1648 fourth = 'a', | |
1649 fifth, | |
1650 } | |
1651 ] | |
1652 | |
1653 is exactly equivalent to | |
1654 first = 60 | |
1655 second = 61 | |
1656 third = 62 | |
1657 fourth = 'a' | |
1658 fifth = 'b' | |
1659 ## | |
1660 | |
1661 eof | |
1662 | |
1663 "eof" is a quasi reserved word in AnaGram, used to | |
1664 specify an end of file token. You may use another token | |
1665 as an end of file delimiter by setting the ©Eof Tokenª | |
1666 ©configuration parameterª. eof is not required unless | |
1667 you use ©automatic resynchronizationª in your ©parserª. | |
1668 | |
1669 If you have not defined eof or specified an Eof Token | |
1670 parameter, ©File Traceª may show a syntax error when it | |
1671 encounters the end of a test file. | |
1672 | |
1673 There are various ascii values that are commonly used | |
1674 to represent an end of file. The end of a string in | |
1675 memory is commonly 0, DOS uses ^Z, Unix uses ^D, and | |
1676 Unix style stream I/O uses -1. It is often convenient | |
1677 then to define | |
1678 | |
1679 eof = -1 + 0 + ^D + ^Z | |
1680 ## | |
1681 | |
1682 Eof Token | |
1683 | |
1684 "Eof token" is a ©configuration parameterª which accepts | |
1685 a token name as a value. There is no default value. | |
1686 AnaGram does not need a specification for the eof token | |
1687 unless you are using its ©automatic resynchronizationª | |
1688 facility. | |
1689 | |
1690 If you use the ©automatic resynchronizationª capability | |
1691 of AnaGram, you must specify explicitly an end of file | |
1692 token. You can do this either by defining a ©terminal | |
1693 tokenª in your ©grammarª called eof or by using the "eof | |
1694 token" parameter to identify some other terminal token | |
1695 to be used as the end of file marker. You would do this | |
1696 only if you must use the name "©eofª" for some other | |
1697 purpose. | |
1698 | |
1699 Note that "eof" is case sensitive. Neither Eof nor | |
1700 EOF will qualify as end of file tokens unless you | |
1701 explicitly specify them using the eof token parameter. | |
1702 ## | |
1703 | |
1704 Eof Token Not Defined | |
1705 | |
1706 This ©warningª appears if you have requested either | |
1707 ©error token resynchronizationª or ©automatic | |
1708 resynchronizationª and you have not defined an ©eof | |
1709 tokenª. The resynchronization procedure will not work | |
1710 correctly at end of file. | |
1711 ## | |
1712 | |
1713 Error Action | |
1714 | |
1715 The error action is one of the four ©parser actionªs of a | |
1716 traditional ©parsing engineª. The error action is | |
1717 performed when the parser has encountered an input | |
1718 token which is not admissible in the current state. | |
1719 The further behavior of a traditional parser is | |
1720 undefined. | |
1721 ## | |
1722 | |
1723 Error Defining | |
1724 | |
1725 "Error defining TXXX: <token representation>" is a | |
1726 ©warningª message which appears if errors are encountered | |
1727 while attempting to evaluate the ©character setª for | |
1728 the specified ©tokenª. This warning is always generated | |
1729 in addition to more detailed warnings that are made | |
1730 when the actual errors are encountered. | |
1731 ## | |
1732 | |
1733 Error frame | |
1734 | |
1735 "Error frame" is a ©configuration switchª which defaults | |
1736 to off. You use this switch to specify the ©error | |
1737 diagnosisª capabilities of your parser. If this switch | |
1738 is set and the ©diagnose errorsª switch is set, i.e., | |
1739 on, your parser will include a function which will | |
1740 determine the "context" of any ©syntax errorª, that is, | |
1741 the token the parser was trying to complete. | |
1742 | |
1743 To determine the context of an error, your parser will | |
1744 scan backwards through the ©parser state stackª, | |
1745 examining ©characteristic rulesª until it finds a state | |
1746 which can accept a unique ©nonterminalª reduction token | |
1747 that you have not marked as ©hiddenª. It will then set | |
1748 PCB.©error_frame_ssxª to the ©parser stack indexª for | |
1749 that level. | |
1750 ## | |
1751 | |
1752 ERROR_CONTEXT | |
1753 | |
1754 ERROR_CONTEXT is a macro AnaGram defines for you. If | |
1755 your parser encounters a ©syntax errorª, you have | |
1756 enabled the ©error frameª ©configuration switchª, and | |
1757 you have defined a ©context typeª, ERROR_CONTEXT will | |
1758 enable you to access the ©contextª as of when the parser | |
1759 encountered the beginning of the ©error_frame_tokenª. | |
1760 ## | |
1761 | |
1762 Error Diagnosis | |
1763 | |
1764 "Error diagnosis" and ©error recoveryª are the two | |
1765 aspects of ©error handlingª. If in the ©embedded Cª | |
1766 portion of your syntax file you define a macro called | |
1767 ©SYNTAX_ERRORª, it will be invoked by the parser when a | |
1768 ©syntax errorª is encountered. If you have set the | |
1769 ©diagnose errorsª ©configuration switchª, the | |
1770 ©error_messageª field of the ©parser control blockª will | |
1771 contain a pointer to a string containing a diagnostic | |
1772 message. The diagnostic is of the form "Missing <token | |
1773 name>" or "Unexpected <token name>". | |
1774 | |
1775 If you do not define SYNTAX_ERROR it will be | |
1776 automatically defined so that a message will be written | |
1777 to stderr. | |
1778 | |
1779 If the ©lines and columnsª switch has been set you will | |
1780 have the current line number and column number available | |
1781 for your diagnostic message. | |
1782 | |
1783 If you have set the ©error frameª switch as well as the | |
1784 diagnose errors switch, the variable | |
1785 PCB.©error_frame_tokenª will identify the ©nonterminal | |
1786 tokenª the parser was trying to recognize when the | |
1787 error was encountered. | |
1788 | |
1789 Of course, if your parser is controlling direct keyboard | |
1790 input, a diagnosis might be unnecessary. In this case | |
1791 you might define SYNTAX_ERROR so that it simply beeps at | |
1792 the user and let it go at that. | |
1793 ## | |
1794 | |
1795 Error Handling | |
1796 | |
1797 Rarely is a parser built to read an arbitrary input | |
1798 file. The normal situation is that the parser is built | |
1799 to read files that conform to the rules specified in a | |
1800 grammar, rules that describe a class of input files | |
1801 rather than all possible input files. If the input file | |
1802 does not conform to the grammar, the parser will detect | |
1803 a ©syntax errorª. | |
1804 | |
1805 There are two aspects to error handling in your parser: | |
1806 ©error diagnosisª and ©error recoveryª. Error diagnosis | |
1807 consists in informing your user that something | |
1808 unexpected has happened. Error recovery consists in | |
1809 either aborting the parse, or getting it started again | |
1810 in some reasonable manner. AnaGram provides several | |
1811 options for both error diagnosis and error recovery. | |
1812 | |
1813 When a syntax error is encountered, first your error | |
1814 diagnosis option is executed and then your error | |
1815 recovery option is executed. | |
1816 ## | |
1817 | |
1818 error_message | |
1819 | |
1820 error_message is a field in a ©parser control blockª to | |
1821 which your ©error handlingª procedures may refer. If you | |
1822 have set the ©diagnose errorsª ©configuration switchª, | |
1823 on encountering a ©syntax errorª your ©parserª will | |
1824 create a string containing an appropriate diagnostic | |
1825 message and store a pointer to it into | |
1826 PCB.error_message. | |
1827 ## | |
1828 | |
1829 Error Trace | |
1830 | |
1831 "Error Trace" is both a ©configuration switchª and the | |
1832 name of an option in the ©Action Menuª. If the switch | |
1833 is on, AnaGram adds code to your parser to capture | |
1834 state information to a file in case of a ©syntax errorª. The Error | |
1835 Trace option can then read this information and prepare a pre-built | |
1836 ©Grammar Traceª showing you the state of the parser at the time of | |
1837 the error. | |
1838 | |
1839 The name of the file is determined by the macro | |
1840 ©AG_TRACE_FILE_NAMEª. AnaGram will provide a default | |
1841 definition for the macro consisting of the name of | |
1842 your ©syntax fileª plus the extension ".etr". You | |
1843 may override this definition by defining AG_TRACE_FILE_NAME | |
1844 in your ©embedded Cª. | |
1845 | |
1846 If error trace is enabled, AnaGram will also enable the | |
1847 Error Trace option on the ©Action Menuª. If you select | |
1848 Error Trace AnaGram will initialize a ©Grammar Traceª | |
1849 window from the error trace file you select. The parser | |
1850 stack of the trace will be as it was when the error | |
1851 occurred. The last line of the parser stack pane will | |
1852 show the ©lookahead tokenª that caused the syntax error. You may | |
1853 then use the Grammar Trace to explore the nature of | |
1854 the syntax error your parser encountered. | |
1855 | |
1856 AnaGram will | |
1857 warn you if the error trace file is older than | |
1858 the syntax file, since under those conditions, the | |
1859 error trace file might be invalid. | |
1860 ## | |
1861 | |
1862 AG_TRACE_FILE_NAME | |
1863 | |
1864 AG_TRACE_FILE_NAME is a C macro used to determine the | |
1865 name of the file your parser will write when it | |
1866 encounters a ©syntax errorª if you have enabled | |
1867 the ©error traceª ©configuration switchª. | |
1868 | |
1869 You may define AG_TRACE_FILE_NAME in your ©embedded Cª. | |
1870 AnaGram provides a default definition given by the | |
1871 name of your ©syntax fileª with the extension ".etr". | |
1872 ## | |
1873 | |
1874 Error Recovery | |
1875 | |
1876 Error recovery is the process of continuing after a | |
1877 ©syntax errorª. AnaGram offers several options. These | |
1878 are controlled by ©configuration parameterªs and by | |
1879 your grammar. | |
1880 | |
1881 If you do not specify any error recovery, your parser | |
1882 will simply return to the calling program when it | |
1883 encounters a syntax error. ©PCBª.©exit_flagª will be set | |
1884 to two, to indicate termination on syntax error. | |
1885 | |
1886 If you wish your parser to simply ignore the erroneous | |
1887 token and continue, set PCB.exit_flag to zero in your | |
1888 ©SYNTAX_ERRORª macro. You might use this option if your | |
1889 parser is dealing directly with keyboard input. | |
1890 | |
1891 You may wish to use YACC type error handling. To do | |
1892 this, simply incorporate a token called "error" in your | |
1893 grammar, or specify some other token as an ©error | |
1894 tokenª. On syntax error, your parser will back up to | |
1895 the most recent state where "error" was acceptable | |
1896 input, treat the bad input as an instance of error, and | |
1897 then skip all input until it finds an acceptable input | |
1898 token. At that point it will proceed as though nothing | |
1899 had happened. | |
1900 | |
1901 AnaGram also provides an ©automatic resynchronizationª | |
1902 option, which uses a complex heuristic to compare input | |
1903 tokens against all stacked states in order to find the | |
1904 best state from which to continue. | |
1905 ## | |
1906 | |
1907 Error Token Resynchronization | |
1908 | |
1909 One of your options for ©error recoveryª after a ©syntax | |
1910 errorª is a technique similar to that provided in YACC. | |
1911 You include a terminal token called "error" in your | |
1912 grammar. (Or, use the ©error tokenª configuration | |
1913 parameter to specify some other token to serve this | |
1914 purpose.) When the parser encounters an error in the | |
1915 input, after invoking the ©SYNTAX_ERRORª macro, it backs | |
1916 up the ©parser state stackª to the most recent state in | |
1917 which "error" was an acceptable input. It then shifts to | |
1918 the new state as though it had seen an actual "error" | |
1919 token. At this point, it skips over any character in the | |
1920 input which is not an acceptable input character for | |
1921 this state. Once it does find an acceptable input | |
1922 character, it continues processing as though nothing had | |
1923 happened. | |
1924 ## | |
1925 | |
1926 error_frame_ssx | |
1927 | |
1928 error_frame_ssx is a field in a ©parser control blockª | |
1929 to which your ©error handlingª routines may refer. When | |
1930 your ©SYNTAX_ERRORª macro is called, if you have set | |
1931 both the ©diagnose errorsª and ©error frameª | |
1932 configuration switches, error_frame_ssx will contain the | |
1933 value of the ©parser stack indexª at the beginning of | |
1934 the ©error_frame_tokenª. For example, if in a syntax | |
1935 file, you fail to close a comment, AnaGram will | |
1936 encounter an illegal end of file in the comment. In this | |
1937 situation, error_frame_token is the token for a comment, | |
1938 and error_frame_ssx gives the parser stack depth at the | |
1939 beginning of the comment. | |
1940 ## | |
1941 | |
1942 error_frame_token | |
1943 | |
1944 error_frame_token is a field in a ©parser control blockª | |
1945 to which your ©error handlingª routines may refer. If | |
1946 you have set both the ©diagnose errorsª and ©error | |
1947 frameª ©configuration switchªes, when your | |
1948 ©SYNTAX_ERRORª macro is called, it will contain the | |
1949 ©token numberª of the error_frame_token. | |
1950 ## | |
1951 | |
1952 error, Error Token | |
1953 | |
1954 "Error token" is a ©configuration parameterª that takes | |
1955 a token name for a value. It has no default value. If | |
1956 you do not specify it, and your grammar has a terminal | |
1957 token called "error", it will be used as the error | |
1958 token. If you have an error token defined your parser | |
1959 will presume that you wish to use the ©error token | |
1960 resynchronizationª method of ©error recoveryª. | |
1961 ## | |
1962 | |
1963 Escape Backslashes | |
1964 | |
1965 "©Escape backslashesª" is a ©configuration switchª that | |
1966 defaults to off. When turned on, the ©line numbersª switch | |
1967 will write pathnames with doubled backslashes. The switch | |
1968 is no longer necessary, since AnaGram now uses forward slashes | |
1969 in the pathnames in #line directives rather than backslashes.switch. | |
1970 ## | |
1971 | |
1972 Event Driven | |
1973 | |
1974 It is often convenient to configure your parser to be | |
1975 "event driven". In this situation, instead of calling | |
1976 your parser once to process the entire input, you call | |
1977 an ©initializerª to initialize the parser, and then you | |
1978 call the parser once for each input token. Each time you | |
1979 call it, the parser processes the single input token | |
1980 until it can do no more. | |
1981 | |
1982 You can interrogate the ©exit_flagª field of the | |
1983 ©parser control blockª to determine whether the parse is | |
1984 complete or whether the parser encountered an error. | |
1985 | |
1986 Event driven parsers are especially convenient for | |
1987 dealing with terminal input or communications protocols. | |
1988 ## | |
1989 | |
1990 Event Driven Parser Cannot Use Pointer Input | |
1991 | |
1992 This ©warningª message appears if you specify pointer | |
1993 input for your ©parserª and also specify that it should | |
1994 be event driven. If you are going to use ©pointer | |
1995 inputª, you should not specify your ©parserª as event | |
1996 driven. Conversely, if you really want an ©event | |
1997 drivenª parser, you cannot specify pointer input. | |
1998 ## | |
1999 | |
2000 Excessive Recursion | |
2001 | |
2002 This ©warningª message appears if an internal stack in | |
2003 AnaGram overflows because of the complexity of an | |
2004 expression in your ©grammarª. Simplify your grammar by | |
2005 using ©definitionª statements to name subexpressions. | |
2006 ## | |
2007 | |
2008 exit_flag | |
2009 | |
2010 exit_flag is a field in the ©parser control blockª. | |
2011 When your parser returns, PCB.exit_flag contains an exit | |
2012 code describing the outcome of the parse. Mnemonic | |
2013 values for the exit codes are defined in the parser | |
2014 header file AnaGram generates. These mnemonics, their | |
2015 values and their meanings are: | |
2016 AG_RUNNING_CODE = 0: Parse is not yet complete | |
2017 AG_SUCCESS_CODE = 1: Parse terminated successfully | |
2018 AG_SYNTAX_ERROR_CODE = 2: Syntax error encountered | |
2019 AG_REDUCTION_ERROR_CODE = 3: Bad reduction token encountered | |
2020 AG_STACK_ERROR_CODE = 4: Parser stack overflowed | |
2021 AG_SEMANTIC_ERROR_CODE = 5: Semantic error, user defined | |
2022 | |
2023 An AnaGram parser checks exit_flag on return | |
2024 from every ©reduction procedureª. AnaGram will exit with | |
2025 the flag unchanged if it is non-zero. To halt a parse | |
2026 from a reduction procedure, then, you need only set the | |
2027 exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value | |
2028 greater than zero that suits your needs. | |
2029 ## | |
2030 | |
2031 Expansion, Expansion Rule | |
2032 | |
2033 In analyzing a ©grammarª, we are often interested in the | |
2034 full range of input that can be expected at a certain | |
2035 point. The expansion of a ©tokenª or state shows us | |
2036 all the expected input. An expansion yields a set of | |
2037 ©marked ruleªs. The ©marked tokenª in each rule | |
2038 shows us what input to expect. | |
2039 | |
2040 The set of expansion rules of a (©nonterminalª) token | |
2041 shows all the expected input that can occur whenever the | |
2042 token appears in the grammar. The set consists of all | |
2043 the ©grammar ruleªs produced by the token, plus all the | |
2044 rules produced by the first token of any rule in the | |
2045 set. A ©marked tokenª for an expansion rule of a token | |
2046 is the first element in the rule. | |
2047 | |
2048 The expansion of a state consists of its ©characteristic | |
2049 ruleªs plus the expansion rules of the marked token in each | |
2050 characteristic rule. | |
2051 ## | |
2052 | |
2053 Expansion Chain | |
2054 | |
2055 You may select an Expansion Chain window from the | |
2056 ©Auxiliary Windowsª popup menu of most windows that contain | |
2057 ©expansion ruleªs. | |
2058 | |
2059 The Expansion Chain window is extremely useful for | |
2060 indicating why a particular ©grammar ruleª is an | |
2061 ©expansion ruleª in a particular state. To see a chain | |
2062 of productions that produces a desired expansion rule, | |
2063 select the expansion rule with the cursor bar, press | |
2064 the right mouse button for the Auxiliary Windows menu, and select | |
2065 Expansion Chain. | |
2066 | |
2067 The Expansion Chain window will then present a sequence | |
2068 of expansion rules, using the same format as the | |
2069 Expansion Rules window, but subject to the constraint | |
2070 that each rule is produced by the ©marked tokenª in the previous line. | |
2071 | |
2072 The first rule in the window is a ©characteristic ruleª | |
2073 for the given state. The last rule in the window is | |
2074 the rule selected by the cursor bar in the window from | |
2075 which you chose the Expansion Chain. It should be noted | |
2076 that this expansion is not unique. There may be other | |
2077 derivations. | |
2078 ## | |
2079 | |
2080 Expansion Rules | |
2081 | |
2082 You may select an Expansion Rules window from the | |
2083 ©Auxiliary Windowsª popup menu of most windows which display | |
2084 ©marked rulesª. The Expansion Rules window shows the | |
2085 complete set of ©expansion ruleªs for the ©marked | |
2086 tokenª in the highlighted rule. | |
2087 | |
2088 In other windows, including all trace windows, the | |
2089 Expansion Rules window shows the expansion of the token | |
2090 on the highlighted line. | |
2091 ## | |
2092 | |
2093 F1 | |
2094 | |
2095 Use the F1 key to bring up a context sensitive help window. Because of | |
2096 various peculiarities of the Windows API, there are a few contexts | |
2097 where the F1 key does not work; however, generally the ©help cursorª | |
2098 works where F1 does not and vice versa. | |
2099 | |
2100 ©Helpª windows have hypertext links to related help windows. | |
2101 In a help window, the right mouse button pops up a menu of | |
2102 all the links for the window. | |
2103 ## | |
2104 | |
2105 extend pcb | |
2106 | |
2107 The "extend pcb" statement is an ©attribute statementª that allows you to | |
2108 add declarations of your own to the ©parser control blockª. With this | |
2109 feature, data needed by ©reduction procedureªs can be stored in the pcb | |
2110 rather than in global or static storage. This capability greatly | |
2111 facilitates the construction of ©thread safe parsersª. | |
2112 | |
2113 The extend pcb statement may be used in any configuration section. | |
2114 The format is as follows: | |
2115 extend pcb { <C or C++ declaration>... } | |
2116 | |
2117 It may, of course, extend over multiple lines and may contain any | |
2118 C or C++ declarations. AnaGram will append it to the end of the parser | |
2119 control block declaration in the generated parser ©header fileª. There may | |
2120 be any number of extend pcb statements. The extensions are appended to | |
2121 the pcb in the order in which they occur in the syntax file. | |
2122 | |
2123 The extend pcb statement is compatible with both C and C++ parsers. Note | |
2124 that even if you are deriving your own class from the parser control | |
2125 block, you might want to use the extend pcb to provide virtual function | |
2126 definitions or other declarations appropriate to a base class. | |
2127 ## | |
2128 | |
2129 Far Tables | |
2130 | |
2131 "Far tables" is a ©configuration switchª which defaults | |
2132 to off. If it is set, when AnaGram builds a ©parserª it | |
2133 will declare the larger tables it builds as FAR. This | |
2134 can be a convenience when using some memory models with | |
2135 8086 architecture. | |
2136 ## | |
2137 | |
2138 Fatal Syntax Errors | |
2139 | |
2140 This ©warningª message occurs when AnaGram cannot | |
2141 complete the ©Analyze Grammarª command on your ©syntax | |
2142 fileª because of errors in your syntax file. | |
2143 ## | |
2144 | |
2145 File Trace | |
2146 | |
2147 You can use the File Trace facility to verify your grammar, | |
2148 even before you have implemented ©reduction proceduresª or | |
2149 any other code. Thus you can defer writing procedural code | |
2150 until you have the grammar working to your specifications. | |
2151 | |
2152 To run File Trace, select | |
2153 File Trace from the ©Action Menuª or click on the File Trace button. | |
2154 | |
2155 Select a test file. When the ©File Trace Windowª appears, | |
2156 double click at any point in the ©test file paneª, or | |
2157 click the ©Parse Fileª button to parse the entire file. | |
2158 AnaGram will parse up to the point you have selected | |
2159 according to the rules in your ©grammarª. If the test file does not | |
2160 conform to the rules of the grammar, the parse will halt with a | |
2161 ©syntax errorª. You can then inspect the ©Parser Stack paneª and the | |
2162 ©Rule Stack paneª to get an idea of the nature of the problem. | |
2163 | |
2164 | |
2165 AnaGram uses different colors to | |
2166 distinguish the portion of the test file that has | |
2167 been parsed from the portion that has not been parsed, | |
2168 so the location of the error should be readily apparent. | |
2169 | |
2170 Since the syntax error often occurs somewhat downstream | |
2171 from the actual error, you may need to back the parse up | |
2172 and approach the error slowly. In the Test File pane, | |
2173 double click at any point prior to the error to back | |
2174 the parse up to that point. You can then click on the | |
2175 ©Single Stepª button to perform a single parser action. | |
2176 | |
2177 You may also use the cursor keys to control the parse. | |
2178 As long as no error is encountered, the parse is locked | |
2179 to the blinking cursor. If you cursor past the syntax | |
2180 error, however, the parse can no longer track the cursor | |
2181 so the cursor location will differ from the parse location . The | |
2182 cursor and parse locations will also differ after you single click | |
2183 at any point other than the current parse location. | |
2184 | |
2185 When the cursor and the parse location are thus out of synch, the | |
2186 Single Step button is replaced with a ©Synch Parseª button. You | |
2187 can click on Synch Parse to get the parse back in synch with the | |
2188 cursor. | |
2189 | |
2190 The File Trace option will be greyed out on the ©Action Menuª | |
2191 if your grammar has ©empty recursionª, since | |
2192 such a grammar may cause infinite loops in the parser. | |
2193 | |
2194 Because a File Trace is based on character codes, it will also be greyed out | |
2195 on the ©Action Menuª if your parser uses ©token inputª rather than | |
2196 character input. | |
2197 | |
2198 All parser actions performed by a File Trace update the ©trace | |
2199 coverageª counts, enabling you to verify the extent to which | |
2200 your test files exercise your parser. | |
2201 | |
2202 Normally, AnaGram reads test files in "text" mode, | |
2203 discarding carriage return characters. If your parser | |
2204 needs to recognize carriage return characters | |
2205 explicitly, you should turn the "©test file binaryª" | |
2206 switch on. | |
2207 ## | |
2208 | |
2209 File Trace Window | |
2210 | |
2211 The ©File Traceª window normally consists of three panes: | |
2212 The ©Parser Stack paneª | |
2213 The ©Test File paneª | |
2214 The ©Rule Stack paneª | |
2215 | |
2216 If your grammar uses ©semantically determined productionsª, | |
2217 the ©Reduction Choices paneª will appear when necessary | |
2218 to allow you to select a ©reduction tokenª. The choice that | |
2219 you make will be remembered and reused if you should back up | |
2220 the parse and parse past this point again. The remombered choice | |
2221 is not made automatically when you use ©Single Stepª. Thus, | |
2222 if you wish to | |
2223 change your choice, position the cursor at the location where | |
2224 the choice must be made and Single Step past the choice. | |
2225 | |
2226 If you ©reloadª the test file, the choices you have made will | |
2227 be discarded. | |
2228 | |
2229 The active pane has | |
2230 a distinctively colored title panel and cursor bar. You can | |
2231 use the tab key to tab among the panes. The function of | |
2232 other keyboard keys depends on which pane is active. | |
2233 | |
2234 Along the bottom of the File Trace Window is a toolbar with | |
2235 two status boxes: | |
2236 ©Parse Locationª | |
2237 ©Parse Statusª | |
2238 and five buttons: | |
2239 ©Single Stepª | |
2240 ©Parse Fileª | |
2241 ©Resetª | |
2242 ©Reloadª | |
2243 ©Helpª | |
2244 | |
2245 If the blinking cursor loses synch with the current | |
2246 parse location, the Single Step button is replaced with | |
2247 the ©Synch Parseª button. | |
2248 ## | |
2249 | |
2250 Grammar Trace Window | |
2251 | |
2252 The ©Grammar Traceª window normally consists of three panes: | |
2253 The ©Parser Stack paneª | |
2254 The ©Allowable Input paneª | |
2255 The ©Rule Stack paneª | |
2256 | |
2257 If your grammar uses ©semantically determined productionsª, | |
2258 the ©Reduction Choices paneª will appear when necessary | |
2259 to allow you to select a ©reduction tokenª. | |
2260 | |
2261 The active pane has | |
2262 a distinctively colored column header and cursor bar. You | |
2263 can use the tab key to tab among the panes. The function of other | |
2264 keyboard keys depends on which pane is active. | |
2265 | |
2266 Along the bottom of the Grammar Trace Window is a toolbar with | |
2267 a ©Parse Statusª box, a ©text entryª field | |
2268 and four buttons: | |
2269 ©Proceedª | |
2270 ©Single Stepª | |
2271 ©Resetª | |
2272 ©Helpª | |
2273 | |
2274 In the ©Parser Stack paneª you can see a | |
2275 representation of the ©parser state stackª and ©parser stateª as they | |
2276 might appear in the course of execution of your ©parserª. You can | |
2277 examine the ©allowable inputª tokens and see the changes to the | |
2278 state and the state stack caused by any input token you | |
2279 choose. The ©Rule Stack paneª shows the relationship between the | |
2280 contents of the parser stack and your ©grammarª. If your grammar | |
2281 uses ©semantically determined productionsª, you can select the | |
2282 appropriate ©reduction tokenª from the ©Reduction Choices paneª. | |
2283 | |
2284 You can enter text characters directly in the ©text entryª | |
2285 field. This means you can run a Grammar Trace like a ©File Traceª | |
2286 where the test file is replaced by the characters you type in the | |
2287 text entry field. This is a very convenient way to check out your | |
2288 grammar. | |
2289 ## | |
2290 | |
2291 Test File, Test File Pane | |
2292 | |
2293 In the ©File Traceª, the file under test is displayed in the | |
2294 upper right pane. To parse to a specific point, double | |
2295 click at that point. | |
2296 | |
2297 As long as the parse location and the cursor are synchronized, | |
2298 when you use the cursor keys to | |
2299 move the cursor, the parse will track the cursor. | |
2300 | |
2301 If the parse encounters a ©syntax errorª, it will not be able | |
2302 to go beyond the location of the error. In this situation, | |
2303 moving the cursor right or down will cause the cursor position to | |
2304 differ from the parse location. The parse and cursor positions can also | |
2305 differ if you single click anywhere in the Test File pane. | |
2306 | |
2307 If the | |
2308 parse location and the cursor are thus not synchronized, the | |
2309 ©Single Stepª button will be replaced with a ©Synch Parseª | |
2310 button. Click on the Synch Parse button to get the cursor | |
2311 and the parse back in synch. Of course, the parse will still | |
2312 not be able to proceed past a syntax error. | |
2313 | |
2314 In the default color scheme, parsed text is shown on a lighter | |
2315 background than is unparsed text. | |
2316 | |
2317 If your grammar uses ©semantically determined productionªs, | |
2318 the parse will halt when one is encountered and the ©reduction | |
2319 choices paneª will be displayed so you may select the appropriate | |
2320 ©reduction tokenª. | |
2321 | |
2322 At any time you can click on the ©Reset buttonª to reset the parse to | |
2323 the beginning of the test file. If you modify the test file, you | |
2324 can click on the ©Reload buttonª to load the modified file and | |
2325 reset the parse. | |
2326 | |
2327 Normally, AnaGram reads test files in "text" mode, discarding carriage | |
2328 return characters. If your parser needs to recognize carriage return | |
2329 characters explicitly, you should turn the ©test file binaryª | |
2330 ©configuration switchª on. | |
2331 | |
2332 Sample test files are provided with the FFCALC and FC ©examplesª. | |
2333 ## | |
2334 | |
2335 Parse Location | |
2336 | |
2337 The current location of the ©File Traceª parser in the | |
2338 ©test file paneª. The format is <line number>:<column number>. | |
2339 ## | |
2340 | |
2341 Parse Status | |
2342 | |
2343 The current state of the ©File Traceª or ©Grammar Traceª parser. | |
2344 | |
2345 Ready: The parser is ready for input. | |
2346 Running: The parser is processing input. | |
2347 Parse Complete: The parser has reached the end of the input. Click | |
2348 on ©resetª or ©reloadª to restart the parse. | |
2349 Syntax error: A syntax error has been encountered. The parser cannot | |
2350 go any further. | |
2351 Unexpected end of file: The parser has reached the end of the actual | |
2352 input but the grammar still expects more. | |
2353 Select reduction token: The parser encountered a ©semantically determined | |
2354 productionª. Select a ©reduction tokenª from the ©Reduction Choices paneª. | |
2355 Selection error: The reduction token selected from the Reduction Choices | |
2356 pane was not allowable input in the present state. Select another | |
2357 reduction token. | |
2358 ## | |
2359 | |
2360 Parse File | |
2361 | |
2362 Use the Parse File button in the ©File Traceª to parse all the way | |
2363 to the end of file. The parse will not stop until it encounters a | |
2364 ©syntax errorª, a ©semantically determined productionª, or the end of file. | |
2365 ## | |
2366 | |
2367 Reset | |
2368 | |
2369 Use the Reset button in the ©File Traceª or ©Grammar Traceª to reset | |
2370 the parse to its initial state. This is most convenient when using | |
2371 a ©Conflict Traceª, ©Error Traceª, or other ©Auxiliary Traceª | |
2372 since these traces seldom begin at state 0. | |
2373 ## | |
2374 | |
2375 Reload | |
2376 | |
2377 The Reload button in the ©File Trace Windowª rereads the test file. | |
2378 This is convenient if you modify the test file while you are testing | |
2379 the ©grammarª. | |
2380 ## | |
2381 | |
2382 Lookahead Token | |
2383 | |
2384 In an ©LALR-1 parserª the "lookahead token" is the next token to be | |
2385 processed. For each ©parser stateª there is a list of tokens that | |
2386 may be seen in this state. For each token there is a corresponding | |
2387 ©parser actionª. The parser scans the list looking for the lookahead | |
2388 token and then performs the corresponding parser action. If the | |
2389 lookahead token cannot be found and there is no ©default reductionª, | |
2390 the parser signals a ©syntax errorª. | |
2391 | |
2392 In File Trace, and in some circumstances in Grammar Trace, the | |
2393 lookahead token can be seen on the last line of the | |
2394 ©Parser Stack paneª. | |
2395 ## | |
2396 | |
2397 GET_CONTEXT | |
2398 | |
2399 If you have defined a "©context typeª" ©configuration | |
2400 parameterª, and wish to maximize the performance of your | |
2401 parser, you should write a GET_CONTEXT macro which | |
2402 stores the context of the input token directly in | |
2403 ©CONTEXTª, the current stack location. Otherwise, you | |
2404 can write your ©GET_INPUTª macro so that it stores | |
2405 context into ©PCBª.©input_contextª. The default | |
2406 definition for GET_CONTEXT will then copy | |
2407 PCB.input_context to the ©context stackª at the | |
2408 appropriate time. | |
2409 ## | |
2410 | |
2411 GET_INPUT | |
2412 | |
2413 GET_INPUT is a macro which you should define to control | |
2414 ©parser inputª if your | |
2415 parser is not ©event drivenª and you are not using | |
2416 ©pointer inputª. If you don't define it, AnaGram will | |
2417 define it by default to read a single character from | |
2418 stdin: | |
2419 | |
2420 #define GET_INPUT (PCB.input_code = getchar()) | |
2421 | |
2422 ©PCBª.©input_codeª is an integer field in the ©parser control blockª | |
2423 which is used to hold the current character code. You | |
2424 may also want GET_INPUT to set the values of ©input_contextª or | |
2425 ©input_valueª. It may call an input function, or it may execute | |
2426 in-line code when it is invoked. | |
2427 ## | |
2428 | |
2429 iso latin 1 | |
2430 | |
2431 The "iso latin 1" ©configuration switchª controls case | |
2432 conversion on input characters when the ©case sensitiveª | |
2433 switch is set to off. When "iso latin 1" is set, the | |
2434 default ©CONVERT_CASEª macro is defined to convert | |
2435 correctly all characters in the latin 1 character set. | |
2436 When the switch is off, only characters in the ASCII | |
2437 range (0-127) are converted. | |
2438 ## | |
2439 | |
2440 Dragon Book | |
2441 | |
2442 The "dragon book" is the classic reference on formal parsing: | |
2443 Compilers: Principles, Techniques, and Tools | |
2444 Aho, Sethi, and Ullman | |
2445 Addison-Wesley, 1986. | |
2446 | |
2447 It is called the "dragon book" because of its | |
2448 colorful cover illustration showing a knight in | |
2449 armour ("data flow analysis") armed with sword | |
2450 ("©LALR parser generatorª") and shield ("syntax | |
2451 directed translation") at his PC attacking a | |
2452 bright red dragon ("complexity of compiler design"). | |
2453 ## | |
2454 | |
2455 LALR-1 Parser | |
2456 | |
2457 An LALR-1 parser is a ©parserª created from a | |
2458 ©grammarª by an ©LALR parser generatorª. | |
2459 ## | |
2460 | |
2461 LALR Parser Generator | |
2462 | |
2463 LALR(k) (LookAhead Left-to-right Rightmost derivation) | |
2464 parser generators are | |
2465 programs that create parsers algorithmically from | |
2466 formal grammars. The (k) refers to the number of | |
2467 lookahead symbols used to make parsing decisions. | |
2468 Normally, k = 1. | |
2469 | |
2470 LALR parsers are a subset of the class of | |
2471 so-called LR parsers. LALR parsers are generally more compact | |
2472 and less costly to create. These advantages are | |
2473 obtained at a slight sacrifice in generality. Although | |
2474 is possible to contrive an LR grammar which has | |
2475 ©conflictªs when analyzed with the LALR algorithm, | |
2476 such situations rarely occur in practice, and can | |
2477 be easily resolved by rewriting a few rules. | |
2478 | |
2479 In the ©dragon bookª, section 4.7, the authors list the following | |
2480 attractive properties of LR parsing: | |
2481 LR parsers can be constructed to recognize virtually | |
2482 all programming-language constructs for which context-free | |
2483 grammars can be written. | |
2484 The LR parsing method is the most general nonbacktracking | |
2485 shift-reduce parsing method known, yet it can be implemented as | |
2486 efficiently as other shift-reduce methods. | |
2487 The class of grammars that can be parsed using LR methods is | |
2488 a superset of the class of grammars that can be parsed with | |
2489 predictive parsers. | |
2490 An LR parser can detect a syntactic error as soon as it is | |
2491 possible to do so on a left-to-right scan of the input. | |
2492 ## | |
2493 | |
2494 Getting Started | |
2495 | |
2496 AnaGram is an ©LALR parser generatorª. Its input is | |
2497 a ©syntax fileª, which you prepare with an ordinary | |
2498 programming editor. Its output is a ©parser fileª. which | |
2499 you can compile with a C or C++ compiler on any platform | |
2500 and link into your program. To compile on Unix platforms, set | |
2501 the ©no crª ©configuration switchª. | |
2502 | |
2503 AnaGram has extensive context-sensitive hypertext | |
2504 ©helpª. In any AnaGram window, press ©F1ª or select an item with the | |
2505 ©Help Cursorª. Further documentation in HTML format, including | |
2506 documentation of examples, is found in the html subdirectory. AnaGram | |
2507 also has a comprehensive hard-copy manual, the AnaGram User's Guide. | |
2508 | |
2509 If you are new to AnaGram, you might begin by reviewing the Help | |
2510 Topics ©How AnaGram Worksª and ©Program Developmentª, and looking at | |
2511 An Annotated Example and Summary of AnaGram Notation in the HTML | |
2512 documentation. | |
2513 | |
2514 If you are not already familiar with formal parsing techniques, you | |
2515 may want to read Introduction to Syntax Directed Parsing in the HTML | |
2516 documentation. Note also the Fahrenheit to Celsius conversion | |
2517 examples in the examples/fc directory, which comprise a graded | |
2518 sequence of syntax files illustrating most of the basic | |
2519 principles of ©syntax directed parsingª in easy steps. Documentation | |
2520 is in html/fc.html. | |
2521 | |
2522 AnaGram has many features, many of which are not | |
2523 commonly found in parser generators: | |
2524 the ©configuration sectionª | |
2525 ©thread safe parsersª | |
2526 C++ support | |
2527 the ©disregardª and ©lexemeª statements | |
2528 ©event drivenª parsers | |
2529 ©character setsª | |
2530 ©virtual productionsª | |
2531 ©File Traceª, ©Grammar Traceª | |
2532 ©automatic resynchronizationª | |
2533 ©error token resynchronizationª | |
2534 | |
2535 To familiarize yourself with the many options available for configuring | |
2536 your parsers, select ©Configuration Parametersª from the ©Browse Menuª. | |
2537 Use ©F1ª or the ©Help Cursorª to pop up explanations of the various | |
2538 parameters. | |
2539 | |
2540 | |
2541 If you don't find the information you need, please visit the | |
2542 AnaGram web page at http://www.parsifalsoft.com for further | |
2543 information and support. | |
2544 | |
2545 ## | |
2546 | |
2547 How AnaGram Works | |
2548 | |
2549 AnaGram contains an ©LALR Parser Generatorª which creates a | |
2550 table driven ©LALR-1 parserª from a ©grammarª written in a variant | |
2551 of Backus-Naur Form. AnaGram works in two steps. In the | |
2552 first step, or analysis phase, it reads a ©syntax fileª and | |
2553 compiles a number of tables describing the grammar. In the | |
2554 second step, or build phase, it writes two output files: | |
2555 a ©parser fileª written in C or C++ and a ©header fileª. | |
2556 | |
2557 Syntax files normally have the extension .syn. The rules for | |
2558 writing syntax files are given in the AnaGram User's Guide | |
2559 and in the Summary of AnaGram Notation in the HTML documentation. | |
2560 | |
2561 The header file contains definitions and declarations, including | |
2562 the definition of a ©parser control blockª. | |
2563 | |
2564 The parser file consists of: | |
2565 The ©C prologueª, if any. | |
2566 Definitions and declarations provided by AnaGram. | |
2567 ©Reduction procedureªs. | |
2568 a customized ©parsing engineª. | |
2569 a ©parse functionª to be called when input is to be parsed. | |
2570 | |
2571 The name of the parser file is controlled by the ©parser | |
2572 file nameª ©configuration parameterª. The name of the | |
2573 parse function itself is controlled by ©parser nameª. In the | |
2574 default case, the parser file will have the same name as | |
2575 the syntax file, with the extension .c. The name of the | |
2576 parse function is given by the ©parser nameª parameter. It defaults | |
2577 to the name of the syntax file. | |
2578 ## | |
2579 | |
2580 Examples | |
2581 | |
2582 The EXAMPLES directory of the AnaGram distribution disk | |
2583 contains a number of examples to help you get started. | |
2584 Documentation for the examples, in HTML format, is located | |
2585 in the html directory (start at index.html or examples.html). | |
2586 | |
2587 The traditional Hello, World, in examples/hw, is a good | |
2588 example for getting familiar with the mechanical | |
2589 procedures of building both C and C++ parsers from | |
2590 ©syntax fileªs. | |
2591 | |
2592 The Fahrenheit/Celsius conversion examples in the | |
2593 examples/fc directory on your AnaGram diskette comprise | |
2594 a graded sequence of syntax files which illustrate | |
2595 most of the basic principles of ©syntax directed | |
2596 parsingª in easy steps. In addition, these examples | |
2597 demonstrate many features of AnaGram which are not | |
2598 found in other parser generators: | |
2599 the ©configuration sectionª | |
2600 ©character setsª | |
2601 ©virtual productionsª | |
2602 ©error token resynchronizationª | |
2603 ©File Traceª | |
2604 the ©disregardª and ©lexemeª statements | |
2605 ©event drivenª parsers | |
2606 | |
2607 The Four Function Calculator (examples/ffcalc) is used | |
2608 traditionally to demonstrate parser generators. If you | |
2609 are already familiar with ©syntax directed parsingª this | |
2610 example will give you a good overview of the basics of | |
2611 AnaGram. An annotated version of this example may be | |
2612 found in AnaGram's HTML documentation. | |
2613 The FFCALC example illustrates the use of ©precedence | |
2614 rulesª to resolve ©conflictsª. | |
2615 | |
2616 Other examples are available to demonstrate additional | |
2617 features of AnaGram. | |
2618 | |
2619 RCALC (examples/rcalc) is a simple four function | |
2620 calculator which accepts roman numeral input. It | |
2621 illustrates the following AnaGram features: | |
2622 ©pointer inputª | |
2623 ©SYNTAX_ERRORª macro | |
2624 ©context stackª | |
2625 | |
2626 DSL (examples/dsl) is a complete DOS script language, | |
2627 which provides capabilities well in excess of DOS batch | |
2628 files. DSL is a complete working program, used in the | |
2629 past to create AnaGram's install program. Some of the | |
2630 specific features of AnaGram which it illustrates are: | |
2631 ©distinguish lexemesª | |
2632 ©distinguish keywordsª | |
2633 ©far tablesª | |
2634 | |
2635 MPP is a fully functional macro preprocessor for C or | |
2636 C++. Included with MPP are two C grammars, either of | |
2637 which may be incorporated into MPP. MPP uses several | |
2638 parsers that work together: | |
2639 TS.SYN is the primary token scanner parser that | |
2640 identifies tokens, and handles preprocessor | |
2641 commands. | |
2642 MAS.SYN is used to do macro argument substitution. | |
2643 CT.SYN is used to identify tokens that result from | |
2644 string concatenation during macro argument | |
2645 substitution. | |
2646 EX.SYN is used to evaluate constant expressions in | |
2647 #if preprocessor statements. | |
2648 | |
2649 Among the more powerful features of AnaGram that MPP | |
2650 illustrates are: | |
2651 ©semantically determined productionsª | |
2652 ©event drivenª parsers | |
2653 ## | |
2654 | |
2655 Goal, Goal Token, Start Token | |
2656 | |
2657 The ©grammar tokenª is the token which represents the | |
2658 "top level" in your grammar. Some people refer to it as | |
2659 the "goal" or "goal token" and others as the "start | |
2660 token". Whichever it is called, it is the single token | |
2661 which describes the complete input to your parser. | |
2662 | |
2663 The most common way to specify a grammar token is as | |
2664 follows: | |
2665 grammar -> statements?..., eof | |
2666 This production tells AnaGram that the input to your | |
2667 parser consists of a (possibly empty) sequence of | |
2668 statements followed by an end of file token. | |
2669 | |
2670 There are a number of ways of specifying which token in | |
2671 your ©syntax fileª represents the top level of your | |
2672 grammar. You may simply name it "grammar", or you may | |
2673 tag it with a '$' character when you define it, or you | |
2674 may set the ©grammar tokenª ©configuration parameterª. | |
2675 | |
2676 If you should inadvertently tag several tokens with the | |
2677 '$' character and/or set the grammar token parameter, | |
2678 it is the last such specification in the file which | |
2679 wins. Some people develop their grammars bottom up, | |
2680 gradually adding new levels of complexity. In the | |
2681 course of development, they may specify a number of | |
2682 tokens as grammar tokens and forget to remove the old | |
2683 specifications. | |
2684 | |
2685 Notice that if you define the token "grammar" anywhere | |
2686 in your syntax and specify the grammar token otherwise, | |
2687 "grammar" will not be the grammar token. This is to | |
2688 keep "grammar" from being a reserved word. If you need | |
2689 to use it in your syntax for something other than the | |
2690 whole grammar, you are free to do so. | |
2691 ## | |
2692 | |
2693 Grammar | |
2694 | |
2695 Traditionally, a "grammar" is a set of ©productionªs | |
2696 which taken together specify precisely a set of | |
2697 acceptable input streams, in terms of an abstract set | |
2698 of ©terminal tokensª. The set of acceptable input | |
2699 streams is often called the "language" defined by the | |
2700 grammar. | |
2701 | |
2702 In AnaGram, the term "grammar" also includes | |
2703 ©configuration sectionsª as well as the ©definitionsª | |
2704 of ©character setsª and ©virtual productionsª which | |
2705 augment the collection of productions. The term is | |
2706 often used in contrast to the term "©syntax fileª" | |
2707 which is used to signify the complete AnaGram source | |
2708 file including reduction procedures and embedded C or | |
2709 the term "©parserª" which refers to AnaGram's output | |
2710 file. | |
2711 | |
2712 A grammar is often called a "syntax", and the rules of | |
2713 the grammar are often called syntactic rules. | |
2714 ## | |
2715 | |
2716 Grammar Analysis | |
2717 | |
2718 The major function of AnaGram is the analysis of | |
2719 context-free grammars written in a particular variant | |
2720 of Backus-Naur Form. | |
2721 | |
2722 The analysis of a grammar proceeds in four stages. In | |
2723 the first, the input grammar is analyzed and a number | |
2724 of tables are built which describe all of the | |
2725 ©productionªs and components of the ©grammarª. | |
2726 | |
2727 In the second stage, AnaGram analyzes all of the | |
2728 character sets defined in the grammar, and where | |
2729 necessary, defines auxiliary tokens and productions. | |
2730 | |
2731 In the third stage, AnaGram identifies all of the | |
2732 states of the parser and builds the go-to table for the | |
2733 parser. | |
2734 | |
2735 In the fourth stage, Anagram identifies ©reduction | |
2736 tokensª for each completed ©grammar ruleª in each state | |
2737 and checks for ©conflictªs. | |
2738 | |
2739 Use the ©Analyze Grammarª command to cause AnaGram to | |
2740 analyze your grammar. | |
2741 ## | |
2742 | |
2743 Grammar Is Ambiguous | |
2744 | |
2745 This ©warningª message appears if your ©grammarª | |
2746 contains ©conflictªs. AnaGram will resolve ©shift-reduce | |
2747 conflictsª by selecting the shift option. It will | |
2748 resolve ©reduce-reduce conflictsª by selecting from the | |
2749 conflicting ©grammar ruleªs the one which appears first | |
2750 in the ©syntax fileª. | |
2751 ## | |
2752 | |
2753 Grammar Rule | |
2754 | |
2755 A "grammar rule" is the right hand side of a production. | |
2756 It is a sequence of ©rule elementsª. Each rule element | |
2757 identifies some token, which can be either a ©terminal | |
2758 tokenª or ©nonterminal tokenª. | |
2759 | |
2760 A grammar rule is "matched" by a | |
2761 corresponding sequence of tokens in the input stream to | |
2762 the parser. The rule elements in the grammar rule may be | |
2763 ©token nameªs, ©set expressionsª, ©character constantsª, | |
2764 ©immediate actionªs, ©keyword stringsª, or ©virtual | |
2765 productionsª. | |
2766 | |
2767 A grammar rule may be followed by an | |
2768 optional ©reduction procedureª. The ©semantic valuesª of | |
2769 the tokens that comprise the rule may be passed to the | |
2770 reduction procedure by using ©parameter assignmentsª. | |
2771 | |
2772 A grammar rule always makes up the right hand side of a | |
2773 production. The left hand side of the production | |
2774 identifies one or more ©nonterminal tokensª, or | |
2775 ©reduction tokensª, to which the rule reduces when | |
2776 matched. If there is more than one reduction token, | |
2777 the production is called a ©semantically determined productionª and | |
2778 there should be a ©reduction procedureª to select | |
2779 the correct reduction token at run time. | |
2780 ## | |
2781 | |
2782 Grammar Token | |
2783 | |
2784 The "grammar token" ©configuration parameterª may be | |
2785 used to specify the ©goalª, or "start" token for the | |
2786 syntax analyzer portion of AnaGram. Alternatively, you | |
2787 could simply call the token "grammar", or you could | |
2788 append a '$' character to it when you define it. | |
2789 | |
2790 Each grammar must have a grammar token specified before | |
2791 it can be analyzed or before a parser can be built. The | |
2792 grammar token is the single token to which the grammar | |
2793 finally condenses. When this token is identified by the | |
2794 parser, the parse is complete. | |
2795 ## | |
2796 | |
2797 Grammar Trace | |
2798 | |
2799 AnaGram's Grammar Trace facility lets you examine the workings of your | |
2800 ©parserª in detail. You can use the Grammar Trace as soon as you have | |
2801 analyzed your ©grammarª, even before you have written any ©reduction | |
2802 procedureªs or other code. Thus you can defer writing procedural code | |
2803 until you have the grammar working to your specifications. | |
2804 | |
2805 Select the ©Grammar Trace Windowª | |
2806 from the ©Action Menuª or click on the Grammar Trace | |
2807 button. | |
2808 | |
2809 In the ©Parser Stack paneª you can see a representation of the | |
2810 ©parser state stackª and ©parser stateª as they might appear in the | |
2811 course of execution of your ©parserª. The ©Rule Stack paneª shows the | |
2812 relationship between the contents of the parser stack and your | |
2813 ©grammarª. If your grammar uses ©semantically determined | |
2814 productionsª, you can select the appropriate ©reduction tokenª from | |
2815 the ©Reduction Choices paneª. | |
2816 | |
2817 At any stage, the ©Parser Stackª represents a parse | |
2818 in progress. It shows the sequence of ©tokenªs that have | |
2819 been input so far and the states in which they were | |
2820 seen. When a production is complete and the grammar rule | |
2821 is reduced, the tokens that make up the rule are removed | |
2822 from the stack and replaced by the token on the left | |
2823 side of the production. Initially, the Parser Stack contains | |
2824 only a ©lookahead lineª. | |
2825 | |
2826 To explore your grammar, choose ©tokenªs one by one from | |
2827 the ©Allowable Inputª | |
2828 pane. This pane shows the tokens allowable at the current state of the | |
2829 grammar, and the actions that result when the tokens are chosen. | |
2830 | |
2831 You can also enter text characters directly in the ©text entryª | |
2832 field. This means you can run a Grammar Trace like a ©File Traceª | |
2833 where the test file is replaced by the characters you type in the | |
2834 text entry field. This is a very convenient way to check out your | |
2835 grammar. Text entry is, of course, not appropriate for grammars that | |
2836 expect ©token inputª. | |
2837 | |
2838 In a ©File Traceª you can advance the parse no matter which pane is | |
2839 active. In a Grammar Trace there is a question as to whether input is | |
2840 intended to come from the Allowable Input pane or the text entry | |
2841 field. Therefore the parse can only be advanced when one of these | |
2842 two is active to indicate that it is the source of input. | |
2843 | |
2844 Specialized prebuilt Grammar Traces such as the ©Conflict Traceª and | |
2845 the ©Auxiliary Traceª can be selected from ©Auxiliary Windowsª popup | |
2846 menus where appropriate. | |
2847 | |
2848 All Grammar Trace activity updates the ©trace coverageª counts. | |
2849 ## | |
2850 | |
2851 Text Entry | |
2852 | |
2853 It is sometimes more convenient to enter text in the | |
2854 text entry box on the ©Grammar Traceª toolbar than to | |
2855 select individual tokens from the ©Allowable Input paneª. | |
2856 | |
2857 By entering text you can proceed quickly to a troublesome | |
2858 state without having to choose each individual token | |
2859 en route. | |
2860 | |
2861 After entering text, press Enter or click on the Proceed | |
2862 button to parse the text. Click on the single step button | |
2863 to work slowly through the text step by step. | |
2864 ## | |
2865 | |
2866 header file name | |
2867 | |
2868 The "header file name" parameter names the ©parser | |
2869 headerª file that AnaGram will generate when it builds | |
2870 your parser. This header file can be used with your | |
2871 parser or with other modules in your program. The | |
2872 header file contains a number of typedef statements and | |
2873 an number of macro definitions which are needed in your | |
2874 parser and may be useful in other modules. | |
2875 | |
2876 If the value of this parameter contains a '#' character, | |
2877 AnaGram will substitute the name of your syntax file for | |
2878 the '#'. The default value of "header file name" is | |
2879 "#.h". | |
2880 ## | |
2881 | |
2882 Help, Using Help | |
2883 | |
2884 There are 3 main ways to access AnaGram Online Help: | |
2885 Press F1 for context-sensitive help from most windows and menu items. | |
2886 Similarly, use the ©Help Cursorª from most windows and menu items. | |
2887 From the Help menu, you can bring up ©Help Topicsª and choose a topic. | |
2888 | |
2889 You can also get fly-over help for the toolbar buttons on the ©Control | |
2890 Panelª. File and Grammar Traces have a Help button. | |
2891 | |
2892 AnaGram's Help windows, unlike most others, remain on-screen until you | |
2893 dismiss them. This means you can refer to several topics at once. They | |
2894 have hypertext links to other Help topics. Also, right-clicking | |
2895 the mouse on a Help window or pressing F1 will pop up an Auxiliary | |
2896 Windows menu of all linked topics in the window. "Using Help" is always | |
2897 available from this popup menu. | |
2898 | |
2899 Note that, for the ©Warningsª, ©Configuration Parameterªs and ©Help | |
2900 Topicsª windows, F1 will give you help for the item | |
2901 on the highlighted line, whereas the Help Cursor allows you | |
2902 to select any line by clicking on it. | |
2903 | |
2904 AnaGram also has documentation in HTML format, indexed in the index.html | |
2905 file. This documentation covers Getting Started, examples, and some | |
2906 further topics mainly condensed from the User's Guide. Hard copy | |
2907 documentation is in the AnaGram User's Guide, which has the most | |
2908 detail. | |
2909 ## | |
2910 | |
2911 Hidden | |
2912 | |
2913 In a ©configuration sectionª of your grammar you may use | |
2914 an ©attribute statementª to declare one or more tokens | |
2915 to be "hidden". Tokens that are "hidden" do not appear | |
2916 in the ©token namesª table, and thus do not appear in syntax error | |
2917 diagnoses. When your parser attempts to determine the | |
2918 ©error frameª of a ©syntax errorª, it will disregard the | |
2919 tokens that have been declared hidden. The hidden | |
2920 declaration consists simply of the keyword hidden | |
2921 followed by a list of tokens, separated by commas and | |
2922 enclosed in braces ({ }): | |
2923 [ hidden { widget, wombat, foo, bar } ] | |
2924 | |
2925 You would use the "hidden" attribute primarily for | |
2926 tokens whose name would not mean anything to your users. | |
2927 ## | |
2928 | |
2929 Immediate Action | |
2930 | |
2931 Immediate actions are snippets of C code which are to | |
2932 be executed in the middle of a ©grammar ruleª. Immediate | |
2933 actions are denoted by a '!' character followed by | |
2934 either a C expression, terminated by a semicolon; or a | |
2935 block of C code enclosed in braces. For example, in a | |
2936 simple desk calculator example one might write the | |
2937 following: | |
2938 transaction | |
2939 -> !printf('#');, expression:x =printf("%d\n",x); | |
2940 | |
2941 Notice that the only apparent difference between an | |
2942 immediate action and a ©reduction procedureª is that the | |
2943 immediate action is preceded by '!' instead of '='. | |
2944 Notice that the immediate action must be followed by a | |
2945 comma to separate it from the following ©rule elementª. | |
2946 | |
2947 Immediate actions may also be used in ©definitionªs: | |
2948 prompt = !printf('#'); | |
2949 | |
2950 The above example, using this definition would then be: | |
2951 transaction | |
2952 -> prompt, expression:x =printf("%d\n",x); | |
2953 | |
2954 You could accomplish the same result by writing a ©null | |
2955 productionª and a reduction procedure: | |
2956 prompt | |
2957 -> =printf('#'); | |
2958 | |
2959 This is exactly how AnaGram implements immediate | |
2960 actions. | |
2961 ## | |
2962 | |
2963 Implementation Errors | |
2964 | |
2965 "Implementation errors" are errors your parser detects | |
2966 which are not the immediate result of bad input. When | |
2967 it encounters an implementation error, your parser will | |
2968 call a macro which you can define to deal with the | |
2969 problem in a manner suitable to your needs. If you don't | |
2970 provide these macros, AnaGram will make default | |
2971 definitions. There are two macros corresponding to two | |
2972 implementation errors: | |
2973 ©PARSER_STACK_OVERFLOWª | |
2974 ©REDUCTION_TOKEN_ERRORª | |
2975 ## | |
2976 | |
2977 Inappropriate Value | |
2978 | |
2979 This ©warningª message appears when the value assigned to | |
2980 a ©configuration parameterª is not appropriate to that | |
2981 parameter. Check the definition of the parameter, by | |
2982 opening the ©Configuration Parameters Windowª, | |
2983 selecting the parameter and pressing F1. | |
2984 ## | |
2985 | |
2986 Initializer | |
2987 | |
2988 For every ©parserª it generates, AnaGram generates an | |
2989 "initializer" function to call the parser. AnaGram | |
2990 names the initializer by prefixing the ©parser nameª | |
2991 with "init_". If your parser is ©event drivenª, you must | |
2992 call the initializer before you call the parser. | |
2993 | |
2994 If your parser is not event driven, AnaGram will | |
2995 normally include a call to the initializer in the | |
2996 parser. If you wish to be able to call your parser more | |
2997 than once without its being re-initialized, you may turn | |
2998 off the ©auto initª ©configuration switchª. When you do | |
2999 this, you assume responsibility for calling the | |
3000 initializer. If your parser is event driven, you must | |
3001 always call the initializer function. | |
3002 | |
3003 If the ©reentrant parserª switch is set, the initializer takes | |
3004 a pointer to the ©parser control blockª as its sole argument. Otherwise | |
3005 it takes no arguments. The initializer returns no value. All | |
3006 communication is by means of the ©parser control blockª. | |
3007 ## | |
3008 | |
3009 Input Character | |
3010 | |
3011 The actual unit of ©parser inputª is usually a | |
3012 single character. Note that you are not limited to | |
3013 eight-bit characters. Your parser will use the input | |
3014 character to index a translation table, ©ag_tcvª, to | |
3015 determine the ©token numberª for that character. The | |
3016 ©token numberª identifies the actual syntactic token. | |
3017 The character code itself will be the ©semantic valueª | |
3018 of the token. Note that AnaGram groups together all | |
3019 input characters that are syntactically | |
3020 indistinguishable into a single input token. | |
3021 ## | |
3022 | |
3023 input_code | |
3024 | |
3025 input_code is a field in the ©parser control blockª | |
3026 which contains the current ©input characterª, or, if your | |
3027 ©GET_INPUTª macro supplies ©token numberªs directly, the | |
3028 token number. | |
3029 | |
3030 If you write your own ©GET_INPUTª macro, you must make | |
3031 sure that you store the input character, or token | |
3032 number, you get into ©PCBª.input_code. | |
3033 ## | |
3034 | |
3035 INPUT_CODE(t) | |
3036 | |
3037 If you set both the ©pointer inputª and the ©input | |
3038 valuesª ©configuration parameterªs, you must provide an | |
3039 INPUT_CODE macro for your parser. In this situation, | |
3040 your parser will use the pointer to load the | |
3041 ©input_valueª field of the ©parser control blockª and | |
3042 uses the INPUT_CODE macro to extract the appropriate | |
3043 value for the ©input_codeª field. For example, if the | |
3044 input_value is a structure and the appropriate member | |
3045 field is called "id" you would write: | |
3046 | |
3047 #define INPUT_CODE(t) (t).id | |
3048 ## | |
3049 | |
3050 input_context | |
3051 | |
3052 "input_context" is a field which AnaGram adds to the | |
3053 definition of the ©parser control blockª structure when | |
3054 you define a ©context typeª ©configuration parameterª. | |
3055 If you choose, you can write your GET_INPUT macro so | |
3056 that it stores the context value in ©PCBª.input_context. | |
3057 The default definition for ©GET_CONTEXTª will then stack | |
3058 the context value at the appropriate time. You can think | |
3059 of PCB.input_context as a sort of temporary "parking | |
3060 place" for the context value. | |
3061 ## | |
3062 | |
3063 Input Scan Aborted | |
3064 | |
3065 This ©warningª message appears if AnaGram is unable to | |
3066 finish scanning your ©syntax fileª because of previous | |
3067 errors. | |
3068 ## | |
3069 | |
3070 input values | |
3071 | |
3072 "Input values" is a ©configuration switchª which | |
3073 defaults to off. If your ©parser inputª includes | |
3074 explicit ©token valueªs which are not simply the ascii | |
3075 values of corresponding ascii input characters, you must | |
3076 set the "input values" switch to inform AnaGram. Unless | |
3077 your parser is ©event drivenª or uses ©pointer inputª, | |
3078 you must also provide your own ©GET_INPUTª macro. | |
3079 | |
3080 If your parser uses pointer input, you must provide an | |
3081 ©INPUT_CODE(t)ª macro. | |
3082 | |
3083 The semantic value of an input token is to be stored in the | |
3084 ©input_valueª field of the parser control block. | |
3085 ## | |
3086 | |
3087 input_value | |
3088 | |
3089 input_value is a field in the ©parser control blockª | |
3090 which is used to store the semantic value of the input | |
3091 token. | |
3092 | |
3093 If you write your own ©GET_INPUTª macro, and you have | |
3094 set the ©input valuesª ©configuration switchª, you | |
3095 should make sure that you store the value of the ©input | |
3096 characterª or token into ©PCBª.input_value. | |
3097 ## | |
3098 | |
3099 Internal Error | |
3100 | |
3101 "AnaGram internal error: ..." is a ©warningª message which | |
3102 appears if one of AnaGram's internal consistency tests | |
3103 fails. This message should never appear if AnaGram is | |
3104 working properly. Usually AnaGram will abort on | |
3105 encountering an internal error, although under | |
3106 a small set of circumstances it may continue. Should | |
3107 this happen, it would be wise to close AnaGram and | |
3108 restart it. | |
3109 | |
3110 If you do get an internal error, please note the complete | |
3111 message identifing the problem and file a bug report, | |
3112 following the directions posted on the AnaGram web page | |
3113 at http://www.parsifalsoft.com. | |
3114 A copy of the relevant | |
3115 syntax file and a summary of the circumstances surrounding | |
3116 the problem would be greatly appreciated. | |
3117 ## | |
3118 | |
3119 Intersection | |
3120 | |
3121 In set theory, the intersection of two sets, A and B, is | |
3122 defined to be the set of all elements of A which are | |
3123 also elements of B. In an AnaGram ©syntax fileª, the | |
3124 intersection of two ©character setsª is represented with | |
3125 the '&' operator. The intersection operator has lower | |
3126 ©precedenceª than the ©complementª operator, but higher | |
3127 precedence than the ©unionª and ©differenceª operators. | |
3128 The intersection operator is ©left associativeª. | |
3129 ## | |
3130 | |
3131 Keyboard Support | |
3132 | |
3133 AnaGram can be controlled entirely from the keyboard. In the Control | |
3134 Panel, you | |
3135 can tab to any button and press Enter to select it. In addition to | |
3136 the conventional | |
3137 Windows keyboard functions, the following keys have been implemented: | |
3138 Escape closes any AnaGram window except the Control Panel. | |
3139 F8 toggles between an active AnaGram window and the Control Panel | |
3140 F10 accesses the Control Panel menu from any | |
3141 AnaGram Window. | |
3142 Shift F10 pops up the Auxiliary Windows menu | |
3143 ## | |
3144 | |
3145 Keyword, Keyword String | |
3146 | |
3147 Keywords are a very important feature of AnaGram. They | |
3148 provide an easy way to pick up special character | |
3149 sequences in your input, thereby eliminating the need | |
3150 for a lot of tedious ©productionªs. | |
3151 | |
3152 If AnaGram finds, on the right hand side of one of your | |
3153 ©grammarª productions, a string enclosed in double | |
3154 quotes, such as "IF", it automatically creates from the | |
3155 string a "keyword" which is incorporated into your | |
3156 parser. You may have any number of keywords. A keyword | |
3157 is treated as a single terminal token. Recognition of | |
3158 keywords is governed by the ©case sensitiveª switch. | |
3159 | |
3160 Your parser will look for a keyword in its input stream | |
3161 wherever you have defined this particular keyword to be | |
3162 legitimate input. It will do whatever lookahead is | |
3163 necessary in order to pick up the entire keyword. If | |
3164 several keywords match the input, such as IF and IFF, | |
3165 it will select the longest match, IFF in this example. | |
3166 | |
3167 Important points to notice about keywords: | |
3168 1) Keywords take precedence over ordinary | |
3169 characters in the input stream - thus if the character | |
3170 I and the keyword IF are both legitimate input at some | |
3171 point, IF will be selected, if present, in preference | |
3172 to I. | |
3173 2) Keywords are not reserved words. Your parser | |
3174 will only look for a keyword when it is in a state | |
3175 where that keyword is legitimate input. | |
3176 3) Keywords do not participate in character sets | |
3177 and should not appear in definitions of character sets. | |
3178 In particular, they are not considered as belonging to | |
3179 the complement of a character set. Thus | |
3180 a keyword would not be considered legitimate input | |
3181 for the production | |
3182 next char -> ~( '/' + '*' ) | |
3183 | |
3184 4) Keywords may appear in virtual productions. | |
3185 | |
3186 5) Keywords may be named by means of a definition. | |
3187 | |
3188 AnaGram will list all the keywords in your grammar in | |
3189 the ©Keywordsª window. In addition, in numerous | |
3190 windows where the cursor bar selects a state, the | |
3191 ©Auxiliary Windowsª popup menu will list a Keywords option. | |
3192 This window will provide a list of the keywords | |
3193 acceptable in the selected ©parser stateª. | |
3194 | |
3195 On occasion, a kind of conflict, called a ©keyword | |
3196 anomalyª may occur. If so, such conflicts will be listed | |
3197 in the ©Keyword Anomaliesª window. The "©stickyª" | |
3198 ©attribute statementª is useful in dealing with keyword | |
3199 anomalies. | |
3200 ## | |
3201 | |
3202 Keyword Anomalies Found | |
3203 | |
3204 This ©warningª message indicates that AnaGram has found | |
3205 at least one ©keyword anomalyª in your ©grammarª. Open | |
3206 the ©Keyword Anomaliesª window to see a list of those | |
3207 that have been found. | |
3208 ## | |
3209 | |
3210 Keyword Anomaly | |
3211 | |
3212 In ©syntax directed parsingª, it is assumed that input | |
3213 ©tokenªs can be uniquely identified. In the case of | |
3214 ©keywordªs, however, there is the possibility that the | |
3215 individual characters making up the keyword, as well as | |
3216 the keyword taken as a whole, could constitute | |
3217 legitimate input under some circumstances. Thus | |
3218 ©keywordsª, though a powerful and useful tool, are not | |
3219 completely consistent with the assumptions that underlie | |
3220 ©syntax directed parsingª. This can occasionally give | |
3221 rise to a type of conflict, diagnosed by AnaGram, | |
3222 called a "keyword anomaly". AnaGram is quite | |
3223 conservative in its diagnoses, so that many keyword | |
3224 anomalies it reports are actually innocuous and can be | |
3225 safely ignored. | |
3226 | |
3227 Basically, a keyword anomaly is a situation where a | |
3228 keyword is recognized, causes a reduction, and the | |
3229 parser arrives in a state where the keyword is not | |
3230 legal input. If the keyword, seen simply as a sequence | |
3231 of characters, might have been legal input in the | |
3232 original state, AnaGram notes the existence of a | |
3233 keyword anomaly. | |
3234 | |
3235 If you have a keyword that causes a keyword anomaly and | |
3236 it is actually a reserved word in your grammar, the | |
3237 anomaly is by definition innocuous. You should use the | |
3238 ©reserve keywordsª statement to inform AnaGram that the | |
3239 keyword is reserved and the anomaly need not be | |
3240 diagnosed. | |
3241 | |
3242 To help identify and correct any problems associated | |
3243 with keyword anomalies, AnaGram provides the ©Keyword | |
3244 Anomaliesª window to identify the anomalies, and the | |
3245 ©Keyword Anomaly Traceª to help you understand a | |
3246 particular anomaly. | |
3247 ## | |
3248 | |
3249 Keyword Anomaly Trace | |
3250 | |
3251 A Keyword Anomaly Trace is a ready made ©grammar traceª | |
3252 window which you may select from the ©Auxiliary Windowsª | |
3253 menu of the ©Keyword Anomaliesª window. The anomaly | |
3254 trace provides a path to a state which illustrates the | |
3255 ©keyword anomalyª. In this state, the keyword is a | |
3256 reducing token, but after the reduction, it is not | |
3257 allowable input. | |
3258 ## | |
3259 | |
3260 Keyword Anomalies | |
3261 | |
3262 The Keyword Anomalies window is available only if your | |
3263 grammar has ©keywordª anomalies. | |
3264 | |
3265 Each entry in the Keyword Anomalies window consists of | |
3266 two lines. The first line identifies the ©parser stateª | |
3267 at which the ©keyword anomalyª occurs and the offending | |
3268 keyword. The second line identifies the ©grammar ruleª | |
3269 which the keyword may erroneously reduce. | |
3270 | |
3271 The ©Auxiliary Windowsª menu provides three auxiliary | |
3272 windows keyed directly to the anomaly to help you | |
3273 determine the nature of the problem: The ©Keyword | |
3274 Anomaly Traceª window, the ©Reduction Traceª window, and | |
3275 the ©Rule Derivationª window. Three other windows provide | |
3276 supporting information: the ©Reduction Statesª window, | |
3277 the ©Rule Contextª window and the ©State Definitionª | |
3278 window. | |
3279 ## | |
3280 | |
3281 Keywords | |
3282 | |
3283 The Keywords entry in the ©Browse Menuª pops up a | |
3284 window which lists all of the keywords defined in your | |
3285 ©grammarª. The ©token numberª is also specified. | |
3286 | |
3287 A Keywords window is also an option in the ©Auxiliary | |
3288 Windowsª popup menu for any window which distinguishes | |
3289 various states of your parser. The Keywords window will | |
3290 show all of the ©keywordªs which will be recognized in | |
3291 the state selected by the cursor bar in the parent | |
3292 window. | |
3293 | |
3294 The ©Auxiliary Windowsª menu for a Keywords window | |
3295 provides a ©Token Usageª option which will allow you to | |
3296 all the uses of a particular keyword in your grammar. | |
3297 ## | |
3298 | |
3299 left | |
3300 | |
3301 "left" controls a ©precedence declarationª, indicating | |
3302 that all of the listed ©rule elementsª are to be | |
3303 considered ©left associativeª. | |
3304 ## | |
3305 | |
3306 Left Associative | |
3307 | |
3308 A binary operator is said to be left associative if | |
3309 an expression with repeated instances of the operator | |
3310 is to be evaluated from the left. Thus, for example, | |
3311 x = a/b/c | |
3312 | |
3313 is normally taken to mean x = (a/b)/c The division | |
3314 operator is said to be left associative. | |
3315 | |
3316 In ©grammarªs with ©conflictªs, you may use ©precedence | |
3317 declarationªs to specify that an operator should be left | |
3318 associative. | |
3319 ## | |
3320 | |
3321 Lexeme | |
3322 | |
3323 The "lexeme" ©attribute statementª is used to fine-tune | |
3324 the "©disregardª" statement. The lexeme statement takes | |
3325 the form: | |
3326 lexeme { T1, T2,....Tn } | |
3327 | |
3328 where T1,...Tn is a list of ©nonterminalª tokens | |
3329 separated by commas. Lexeme statements may be placed in | |
3330 any ©configuration sectionª, and there may be any number | |
3331 of them. | |
3332 | |
3333 When you specify that a ©tokenª is to be disregarded, | |
3334 AnaGram rewrites your ©grammarª so that the token will be | |
3335 passed over whenever it occurs at the beginning of a | |
3336 file or following a lexical unit, or "lexeme". If you | |
3337 have no lexeme statement, then the lexemes in your | |
3338 grammar are just the terminal tokens. | |
3339 | |
3340 The lexeme statement allows you to specify that certain | |
3341 nonterminal tokens are also to be treated as lexemes. | |
3342 This means that the disregard token will be skipped | |
3343 following the lexeme, but not between the characters | |
3344 that constitute the lexeme. | |
3345 | |
3346 Lexemes correspond to the tokens that a lexical scanner, | |
3347 if you were using one, would commonly identify and pass | |
3348 to a parser as single tokens. You don't usually wish to | |
3349 disregard ©white spaceª within these tokens. For | |
3350 example, in a grammar for a conventional programming | |
3351 language where blank characters are to be disregarded, | |
3352 you might include: | |
3353 [ | |
3354 lexeme {string, character constant, name, number} | |
3355 ] | |
3356 | |
3357 since blank characters must not be overlooked within | |
3358 strings and constants, and should not be permitted | |
3359 within names or numbers. | |
3360 | |
3361 If your grammar allows for situations where successive | |
3362 lexemes could run together if they were not separated | |
3363 by space, a name followed by a number, for example, you | |
3364 may use the "©distinguish lexemesª" ©configuration | |
3365 switchª to force a separation between the tokens. | |
3366 | |
3367 White space may be used explicitly within definitions of | |
3368 lexeme tokens in your grammar if desired, without | |
3369 causing conflicts. Thus, if you wish to allow embedded | |
3370 space in variable names, you might write: | |
3371 [ | |
3372 disregard space | |
3373 lexeme {variable name} | |
3374 ] | |
3375 space = ' ' + '\t' | |
3376 letter = 'a-z' + 'A-Z' | |
3377 digit = '0-9' | |
3378 | |
3379 variable name | |
3380 -> letter | |
3381 -> variable name, letter + digit | |
3382 -> variable name, space..., letter + digit | |
3383 ## | |
3384 | |
3385 line | |
3386 | |
3387 line is a field in your ©parser control blockª used for | |
3388 keeping track of the line number of the current | |
3389 character in your input. Line and column numbers are | |
3390 tracked only if the ©lines and columnsª ©configuration | |
3391 switchª has been set. | |
3392 ## | |
3393 | |
3394 line length | |
3395 | |
3396 Line length is an ©obsolete configuration parameterª. | |
3397 ## | |
3398 | |
3399 Line Numbers | |
3400 | |
3401 "Line numbers" is a ©configuration switchª which | |
3402 defaults to off. If it is on, the ©Build Parserª | |
3403 command will put "#line" directives into the generated | |
3404 C code file so that your compiler diagnostics will | |
3405 refer to lines in the ©syntax fileª rather than in the | |
3406 generated C code file. For more information on the | |
3407 "#line" directive, see Kernighan and Ritchie, second | |
3408 edition, section A12.6. | |
3409 | |
3410 If the "line numbers" switch is off, AnaGram will put | |
3411 comments into your parser file to help you find | |
3412 reduction procedures and embedded C in your syntax | |
3413 file. | |
3414 | |
3415 Prior to AnaGram 2.01, if your C or C++ compiler required that the | |
3416 backslashes in the pathname in the #line directive be doubled, you | |
3417 would have used AnaGram's ©escape backslashesª switch to make this | |
3418 happen. Although you may still use ©escape backslashesª, it should no | |
3419 longer be necessary because AnaGram now puts forward slashes into #line | |
3420 pathnames instead of backslashes. | |
3421 | |
3422 If you wish, you may specify the pathname in the #line | |
3423 directives explicitly by using the ©Line Numbers Pathª | |
3424 configuration parameter. | |
3425 | |
3426 You may also wish to change the "©parser file nameª" | |
3427 parameter to provide a full path name for your parser | |
3428 file. | |
3429 ## | |
3430 | |
3431 Line Numbers Path | |
3432 | |
3433 "Line Numbers Path" is a ©configuration parameterª | |
3434 which takes a string value. It defaults to NULL. | |
3435 | |
3436 When you have set the ©Line Numbersª ©configuration | |
3437 switchª and Line Numbers Path is not NULL, AnaGram | |
3438 uses it in the #line directive in place of the full | |
3439 path name of your ©syntax fileª. | |
3440 | |
3441 Note that Line Numbers Path should be the complete | |
3442 pathname for your syntax file. | |
3443 | |
3444 Line Numbers Path is useful when using AnaGram in cross | |
3445 platform development. When parsers are to be compiled | |
3446 and tested on a platform different from that used to run | |
3447 AnaGram, you may use Line Numbers Path to provide a | |
3448 pathname on the platform used for compiling and | |
3449 testing. | |
3450 ## | |
3451 | |
3452 Lines and Columns | |
3453 | |
3454 "Lines and columns" is a ©configuration switchª which | |
3455 defaults to on. When set, i.e., on, it causes the | |
3456 ©Build Parserª command to incorporate code into your | |
3457 parser which will automatically track the line number | |
3458 and column number of the input token. | |
3459 | |
3460 You would normally set the "lines and columns" switch | |
3461 when you are planning to build a parser which will read | |
3462 an input file and which will need to diagnose ©syntax | |
3463 errorsª with some precision. | |
3464 | |
3465 Your parser will store the line and column numbers in | |
3466 the ©lineª and ©columnª fields respectively in the | |
3467 ©parser control blockª. | |
3468 | |
3469 If the input to your parser includes tab characters, you | |
3470 should either set the ©tab spacingª ©configuration | |
3471 parameterª appropriately or provide a ©TAB_SPACINGª | |
3472 macro for your parser. | |
3473 | |
3474 Your parser will count line and column numbers beginning | |
3475 with one. | |
3476 ## | |
3477 | |
3478 Main Program | |
3479 | |
3480 The "main program" ©configuration switchª determines | |
3481 what AnaGram does if you invoke the ©Build Parserª | |
3482 command, but have no ©embedded Cª in your ©syntax | |
3483 fileª. If the switch is on and you have not specified | |
3484 ©pointer inputª or an ©event drivenª parser, AnaGram | |
3485 creates a main program which does nothing but call your | |
3486 ©parserª. The "main program" switch defaults to on. | |
3487 | |
3488 This feature, along with the default definitions for | |
3489 ©GET_INPUTª and ©error handlingª, makes it possible | |
3490 to write a grammar with no ©embedded Cª or ©reduction | |
3491 procedureªs whatsoever and still get an executable | |
3492 program which will read input from stdin and parse it | |
3493 according to your grammar. | |
3494 ## | |
3495 | |
3496 Marked Rule | |
3497 | |
3498 A "marked rule" is a ©grammar ruleª together with a | |
3499 marked token that indicates how much of the rule has already | |
3500 been matched. The ©marked tokenª and any tokens following it | |
3501 indicate the input that should be expected if the | |
3502 remainder of the rule is to be matched. | |
3503 | |
3504 When marked rules are displayed in AnaGram windows, the | |
3505 marked token is represented by a difference in the font. The token may | |
3506 be in bold face, underlined, italicized, shown with a different point | |
3507 size, or in a different font altogether. Since AnaGram allows you to | |
3508 change fonts to suit your own preferences, you should be careful that | |
3509 the font you choose for the marked tokens allows them to be readily | |
3510 distinguished from the other tokens in your grammar rules. An | |
3511 underlined font is often suitable. | |
3512 ## | |
3513 | |
3514 Max conflicts | |
3515 | |
3516 The "max conflicts" ©configuration parameterª limits the | |
3517 number of ©conflictªs AnaGram will record. Sometimes, a | |
3518 simple error editing your syntax file can cause hundreds | |
3519 of conflicts, which you don't need to see in gory | |
3520 detail. The default value of max conflicts is 50. If you | |
3521 have a grammar that is in serious trouble and you want | |
3522 to see more conflicts, you may change max conflicts to | |
3523 suit your needs. | |
3524 ## | |
3525 | |
3526 Missing | |
3527 | |
3528 The ©warningª message Missing <element 1> in <element 2> | |
3529 indicates that AnaGram expects to see an instance of | |
3530 syntactic element 1 at the specified location, internal | |
3531 to an instance of syntactic element 2. AnaGram cannot | |
3532 reliably continue parsing its input after an error of | |
3533 this type. Therefore, it limits further analysis of | |
3534 your grammar to scanning for syntax errors. | |
3535 ## | |
3536 | |
3537 Missing Production | |
3538 | |
3539 "Missing production, TXXX: <token name>" is a ©warningª | |
3540 message which indicates that the specified ©tokenª | |
3541 appears to be defined recursively, but there is no | |
3542 initial ©productionª to get the recursion started. If | |
3543 you get this warning, check your ©grammarª closely. | |
3544 ## | |
3545 | |
3546 Missing Reduction Procedure | |
3547 | |
3548 "Missing reduction procedure, RXXX" is a ©warningª | |
3549 message which appears either when the ©grammar ruleª indicated | |
3550 specifies a ©parameter assignmentª but does not have a | |
3551 ©reduction procedureª to use it, or when the rule has no reduction | |
3552 procedure but the value of the token on the left hand side is used in | |
3553 as an argument for some other reduction procedure and the ©default reduction valueª | |
3554 does not have the same type as the token on the left hand side. | |
3555 In this latter case, a reduction procedure may be needed to effect | |
3556 correct type conversion. | |
3557 | |
3558 This warning is | |
3559 provided in case the lack of a reduction procedure is an | |
3560 oversight. | |
3561 ## | |
3562 | |
3563 Multiple Definitions | |
3564 | |
3565 "Multiple definitions for TXXX: <token name>" is a | |
3566 ©warningª message which indicates that the specified | |
3567 ©tokenª has been defined both as a ©character setª and | |
3568 as a ©nonterminal tokenª. It cannot be both. | |
3569 ## | |
3570 | |
3571 Near Functions | |
3572 | |
3573 "Near Functions" is a ©configuration switchª that | |
3574 defaults to off. It controls the use of the "near" | |
3575 keyword for static functions in your parser. If your | |
3576 parser is to run on an 80x86 processor you might wish | |
3577 to turn it on. Your parser will then be a slight bit | |
3578 smaller and will run a little bit faster. | |
3579 | |
3580 If you are going to run your parser on some other | |
3581 processor or use a C or C++ compiler that does not | |
3582 support the "near" keyword you should make sure "near | |
3583 functions" is set to off. | |
3584 ## | |
3585 | |
3586 Negative Character Code in Pointer Mode | |
3587 | |
3588 This ©warningª message appears if your ©grammarª defines | |
3589 negative character codes and uses ©pointer inputª. If | |
3590 your grammar uses the default definition for ©pointer | |
3591 typeª it will be reading unsigned characters so that | |
3592 the parser will never see the negative codes that have | |
3593 been defined. You may correct the problem by providing | |
3594 your own definition of pointer type. | |
3595 ## | |
3596 | |
3597 Nest Comments | |
3598 | |
3599 "Nest comments" is a ©configuration switchª which | |
3600 defaults to off. It controls the treatment of ©commentsª | |
3601 while scanning your ©syntax fileª. It defaults to off, | |
3602 in accordance with the ANSI standard for C which | |
3603 disallows ©nested commentsª. Note that AnaGram scans | |
3604 comments in any ©embedded Cª code as well as in the | |
3605 grammar specification. You may turn this switch on and | |
3606 off as many times as necessary in a single file. | |
3607 ## | |
3608 | |
3609 Nested Comment | |
3610 | |
3611 As delivered, AnaGram treats C style ©commentsª | |
3612 according to the ANSI standard: They do not nest. For | |
3613 those who prefer nested comments, however, the ©nest | |
3614 commentsª ©configuration switchª allows them to nest. | |
3615 ## | |
3616 | |
3617 Nesting too deep | |
3618 | |
3619 This ©warningª message indicates that ©set | |
3620 expressionªs or ©virtual productionsª are | |
3621 nested so deeply they have exhausted the available | |
3622 stack space and AnaGram cannot continue its analysis. | |
3623 | |
3624 Use a ©definitionª statement to name an intermediate | |
3625 level. | |
3626 ## | |
3627 | |
3628 no cr | |
3629 | |
3630 "no cr" is a ©configuration switchª which | |
3631 defaults to off. When this switch is set, it will | |
3632 cause the ©parser fileª and ©header fileª to be | |
3633 written without carriage returns. This is convenient | |
3634 if you wish to use the generated parser files in a | |
3635 Unix environment. | |
3636 ## | |
3637 | |
3638 No Grammar Token Specified | |
3639 | |
3640 This ©warningª message appears if your ©grammarª does not | |
3641 specify a ©grammar tokenª. Edit your ©syntax fileª to | |
3642 specify one. | |
3643 ## | |
3644 | |
3645 No Productions in Syntax File | |
3646 | |
3647 This ©warningª message appears if AnaGram did not find | |
3648 any ©productionsª at all in your ©syntax fileª. Check | |
3649 to see you have the right file. | |
3650 ## | |
3651 | |
3652 No Such Parameter | |
3653 | |
3654 This ©warningª message appears when AnaGram does not | |
3655 recognize the name of a ©configuration parameterª you | |
3656 have tried to set in your ©syntax fileª. Check the | |
3657 spelling of the parameter you wish to set in the | |
3658 ©Configuration Parameters Windowª. | |
3659 ## | |
3660 | |
3661 No Terminal Tokens in Expansion | |
3662 | |
3663 No terminal tokens in expansion of TXXX is a ©warningª | |
3664 message indicating that there are no terminal tokens | |
3665 to be found in an expansion of the specified token. | |
3666 Although there are a few circumstances where this could | |
3667 be legitimate, it is more likely that there is a missing | |
3668 rule in the grammar. | |
3669 ## | |
3670 | |
3671 Not a Character Set | |
3672 | |
3673 "Not a character set, TXXX: <token name>" is a ©warningª | |
3674 message which indicates that the specified ©tokenª has | |
3675 been used both on the left side of a ©productionª and in | |
3676 a ©character setª expression defining some other token. | |
3677 AnaGram will use an empty set in place of the | |
3678 specified token in evaluating the ©character setª. You | |
3679 will get another warning, ©Error definingª token, when | |
3680 AnaGram finishes its evaluation of the character set. | |
3681 ## | |
3682 | |
3683 Nothing Reduces | |
3684 | |
3685 "Nothing reduces TXXX -> RYYY" is a ©warningª message | |
3686 which indicates that the ©grammarª does not specify any | |
3687 input to follow an instance of the indicated ©grammar | |
3688 ruleª. In all probability, the grammar does not have | |
3689 any explicit end of file, or ©eof tokenª. If the grammar | |
3690 does not have any conflicts with ©tokenª T000, then an | |
3691 explicit end of file indicator is not necessary. | |
3692 Otherwise you should modify your grammar to require an | |
3693 explicit end of file. | |
3694 ## | |
3695 | |
3696 Null Character in String | |
3697 | |
3698 This ©warningª message appears when AnaGram finds an | |
3699 explicit null character in a quoted string. If you must | |
3700 allow for a null in a ©keyword stringª | |
3701 you will have to rewrite your | |
3702 ©grammar ruleª. For instance, instead of | |
3703 | |
3704 widget | |
3705 -> "abc\0def" | |
3706 | |
3707 write | |
3708 | |
3709 widget | |
3710 -> "abc", 0, "def" | |
3711 ## | |
3712 | |
3713 nonassoc | |
3714 | |
3715 "nonassoc" controls a ©precedence declarationª, | |
3716 indicating that all of the listed ©rule elementsª are | |
3717 to be considered non-associative. | |
3718 ## | |
3719 | |
3720 Nonterminal Token, Nonterminal | |
3721 | |
3722 A nonterminal token is one which is constructed from a | |
3723 series of other tokens as specified by one or more | |
3724 ©productionªs. Nonterminal tokens are to be | |
3725 distinguished from ©terminal tokenªs, which are the | |
3726 basic input units appearing in your input stream. | |
3727 Terminal tokens most often represent single characters | |
3728 or a character belonging to a ©character setª such as | |
3729 'a-z'. | |
3730 ## | |
3731 | |
3732 Null Production | |
3733 | |
3734 A "null production" is one that has no tokens on the | |
3735 right hand side whatsoever. Null ©productionªs | |
3736 essentially are identified by the first following input | |
3737 token. Null productions are extremely convenient | |
3738 syntactic elements when you wish to make some input | |
3739 optional. For example, suppose that you wish to allow an | |
3740 optional semicolon at some point in your grammar. You | |
3741 could write the following pair of productions: | |
3742 optional semicolon -> | ';' | |
3743 Note that a null production can never follow a '|'. | |
3744 | |
3745 This could also be written on multiple lines thus: | |
3746 optional semicolon | |
3747 -> | |
3748 -> ';' | |
3749 | |
3750 You can always rewrite your grammar to eliminate null | |
3751 productions if you wish, but you usually pay a price in | |
3752 conciseness and clarity. Sometimes, however, it is | |
3753 necessary to do such a rewrite in order to avoid | |
3754 ©conflictªs, to which null productions are especially | |
3755 prone. For example suppose you have the following | |
3756 production: | |
3757 foo -> wombat, optional semicolon, widget | |
3758 | |
3759 You can rewrite this as two productions: | |
3760 foo | |
3761 -> wombat, widget | |
3762 -> wombat, ';', widget | |
3763 | |
3764 This rewrite specifies exactly the same input language, | |
3765 but is less prone to conflicts. On the other hand, it | |
3766 may require significantly more table space in your | |
3767 parser. | |
3768 | |
3769 If you have a null production with no ©reduction | |
3770 procedureª specified, your parser will automatically | |
3771 assign the value zero to ©reduction tokenª. | |
3772 | |
3773 Null productions can also be generated by ©virtual | |
3774 productionsª. | |
3775 | |
3776 A token that has a null production is a "©zero lengthª" | |
3777 token. | |
3778 ## | |
3779 | |
3780 Old Style | |
3781 | |
3782 "Old Style" is a ©configuration switchª which defaults | |
3783 to off. It controls the function definitions in the code | |
3784 AnaGram generates. When "old style" is off, it generates | |
3785 ANSI style calling sequences with prototypes as | |
3786 necessary. When "old style" is on, it generates old | |
3787 style function definitions. | |
3788 ## | |
3789 | |
3790 Output Files | |
3791 | |
3792 When you use the ©Build Parserª command, to request | |
3793 output from AnaGram, it creates two files: a ©parser | |
3794 fileª and a ©parser headerª file. | |
3795 ## | |
3796 | |
3797 Page Length | |
3798 | |
3799 "Page length" is an ©obsolete configuration parameterª. | |
3800 ## | |
3801 | |
3802 Obsolete Configuration Parameter, Obsolete Configuration Switch | |
3803 | |
3804 A number of ©configuration parameterªs and ©configuration switchªes | |
3805 which were used in the DOS version of AnaGram are no longer | |
3806 used, but are still recognized for the sake of upward | |
3807 compatibility. These parameters include: | |
3808 ©bottom marginª | |
3809 ©line lengthª | |
3810 ©page lengthª | |
3811 ©top marginª | |
3812 ©quick referenceª | |
3813 ©video modeª | |
3814 | |
3815 ## | |
3816 | |
3817 Parameter | |
3818 | |
3819 "Parameter <name> has type void" is a ©warningª message | |
3820 which appears when a ©parameter assignmentª is attached | |
3821 to a ©tokenª that has been defined to have the void | |
3822 ©data typeª. | |
3823 ## | |
3824 | |
3825 Parameter Assignment | |
3826 | |
3827 In any ©grammar ruleª, the ©semantic valueª of any | |
3828 ©rule elementª may be passed to a ©reduction procedureª | |
3829 by means of a parameter assignment. Simply follow the | |
3830 rule element with a colon and a C variable name. The C | |
3831 variable name can then be used in the reduction | |
3832 procedure to reference the semantic value of the token | |
3833 it is attached to. AnaGram will automatically provide | |
3834 necessary declarations. | |
3835 | |
3836 Here are some examples of rule elements with parameter | |
3837 assignments: | |
3838 | |
3839 '0-9':d | |
3840 integer:n | |
3841 expression:x | |
3842 declaration : declaration_descriptor | |
3843 | |
3844 ## | |
3845 | |
3846 Parameter Not Defined | |
3847 | |
3848 AnaGram does not have a ©configuration parameterª | |
3849 with the specified name. Please check the spelling. | |
3850 ## | |
3851 | |
3852 Parameter Takes Integer Value | |
3853 The specified ©configuration parameterª takes | |
3854 an integer value only. | |
3855 ## | |
3856 | |
3857 | |
3858 Parameter Takes String Value | |
3859 | |
3860 The specified ©configuration parameterª takes | |
3861 a string value only. | |
3862 ## | |
3863 | |
3864 Parse Function | |
3865 | |
3866 To run your parser, you call the parse function. | |
3867 The name of the parse function is given by | |
3868 the ©parser nameª ©configuration parameterª and defaults to the | |
3869 name of your parser file. | |
3870 | |
3871 If your parser uses ©pointer inputª, you should set the ©pointerª | |
3872 field of the ©parser control blockª before calling the parser | |
3873 function. | |
3874 | |
3875 If your parser is ©event drivenª, you should first call the | |
3876 ©initializerª, and then you should call the parser function | |
3877 for each input token you | |
3878 | |
3879 If the ©reentrant parserª switch is set, the parse function takes | |
3880 a pointer to the ©parser control blockª as its sole argument. Otherwise | |
3881 it takes no arguments. The parse function returns no value. All | |
3882 communication is by means of the ©parser control blockª. | |
3883 | |
3884 To retrieve the value of the ©grammar tokenª, once the parse is complete, | |
3885 use the ©parser value functionª. | |
3886 ## | |
3887 | |
3888 Parser | |
3889 | |
3890 A parser is a program or, more commonly, a procedure within | |
3891 a program, which scans a sequence of ©input charactersª | |
3892 or input tokens and accumulates them in an input | |
3893 buffer or stack as determined by a set of ©productionªs | |
3894 which constitute a ©grammarª. | |
3895 | |
3896 When the parser discovers | |
3897 a sequence of tokens as defined by a ©grammar ruleª, or | |
3898 right hand side of a production, it "reduces" the | |
3899 sequence to a single ©reduction tokenª as defined by the | |
3900 left hand side of the grammar rule. This ©nonterminal | |
3901 tokenª now replaces the tokens which matched the grammar | |
3902 rule and the search for matches continues. | |
3903 | |
3904 If an input | |
3905 token is encountered which will not yield a match for | |
3906 any rule, it is considered a ©syntax errorª and some | |
3907 kind of ©error recoveryª may be required to continue. If | |
3908 a match, or ©reduce actionª, yields the ©grammar tokenª, | |
3909 sometimes called the ©goal tokenª or ©start tokenª, the | |
3910 parser deems its work complete and returns to whatever | |
3911 procedure may have called it. | |
3912 | |
3913 The ©Grammar Traceª and ©File Traceª functions in | |
3914 AnaGram provide a convenient means for understanding the | |
3915 detailed operation of a syntax directed parser. | |
3916 | |
3917 ©Tokensª may have ©semantic valuesª. If the ©input | |
3918 valuesª ©configuration switchª is on, your parser will | |
3919 expect semantic values to be provided by the input | |
3920 process along with the token identification code. If the | |
3921 input values switch is off, your parser will take the | |
3922 ascii value of the input character, that is, the actual | |
3923 input code, as the value of the character. | |
3924 | |
3925 When the | |
3926 parser reduces a production, it can call a ©reduction | |
3927 procedureª or ©semantic actionª to analyze the values of | |
3928 the constituent tokens. This reduction procedure can | |
3929 then return a value which characterizes the reduced | |
3930 token. | |
3931 ## | |
3932 | |
3933 Parser Control Block | |
3934 | |
3935 A "Parser Control Block" is a structure which contains | |
3936 all of the data necessary to describe the instantaneous | |
3937 state of a parser. The typedef statement which defines | |
3938 the structure is included in the ©parser headerª file | |
3939 for your parser. AnaGram creates the name of the data | |
3940 type for the structure by appending "_pcb_type" to the | |
3941 ©parser nameª. | |
3942 | |
3943 You may add your own declarations to the parser control | |
3944 block by using the ©extend pcbª statement. | |
3945 | |
3946 If the ©declare pcbª ©configuration switchª is on, its | |
3947 normal state, AnaGram will declare a parser control | |
3948 block for you at the beginning of your parser file. | |
3949 AnaGram will determine the name of the parser control | |
3950 block by appending "_pcb" to the ©parser nameª. AnaGram | |
3951 will also define the macro PCB as a short hand notation | |
3952 for use within the parser. All references to the parser | |
3953 control block within the code that AnaGram generates | |
3954 are made using the PCB macro. | |
3955 | |
3956 If you wish to declare your own parser control block, | |
3957 you must include the ©parser headerª file for your | |
3958 parser before your declaration. Then you declare a | |
3959 control block and define PCB to refer to the control | |
3960 block you have declared. | |
3961 | |
3962 Suppose your grammar is called widget. You would then | |
3963 write the following statements in your ©embedded Cª: | |
3964 #include "widget.h" | |
3965 widget_pcb_type widget_control_pcb; | |
3966 #define PCB widget_control_pcb | |
3967 | |
3968 Alternatively, you could write the following: | |
3969 #include "widget.h" | |
3970 widget_pcb_type *widget_control_pcb_pointer; | |
3971 #define PCB (*widget_control_pcb) | |
3972 | |
3973 and then allocate storage for the structure when | |
3974 necessary. | |
3975 | |
3976 Some fields of interest in the parser control block are | |
3977 as follows: | |
3978 ©input_codeª | |
3979 ©input_valueª | |
3980 ©input_contextª | |
3981 ©pointerª | |
3982 ©token_numberª | |
3983 ©reduction_tokenª | |
3984 ©ssxª | |
3985 ©snª | |
3986 ©ssª[©parser stack sizeª] | |
3987 ©vsª[parser stack size]; | |
3988 ©csª[parser stack size]; | |
3989 ©lineª | |
3990 ©columnª | |
3991 *©error_messageª | |
3992 ©error_frame_ssxª | |
3993 ©error_frame_tokenª | |
3994 ## | |
3995 | |
3996 PCB | |
3997 | |
3998 "PCB" is a macro AnaGram defines for use in the code it | |
3999 generates to refer to the ©parser control blockª for | |
4000 your ©parserª. Normally, AnaGram automatically declares | |
4001 storage for a parser control block and defines PCB for | |
4002 you. If you turn off the ©declare PCBª switch, you may | |
4003 define PCB yourself. | |
4004 ## | |
4005 | |
4006 PCB_TYPE | |
4007 | |
4008 If you are writing your parser in C++, you may prefer to derive | |
4009 a class from the ©parser control blockª rather than use the | |
4010 ©extend pcbª statement. In this case you may define the | |
4011 PCB_TYPE macro in your syntax file to specify your derived | |
4012 class. | |
4013 | |
4014 For instance, you have defined | |
4015 | |
4016 class MyPcb : public parser_pcb_type {...}; | |
4017 | |
4018 You would then add the following line: | |
4019 | |
4020 #define PCB_TYPE MyPcb | |
4021 | |
4022 If you do not define PCB_TYPE, AnaGram will define it as the | |
4023 type of your parser control block. | |
4024 ## | |
4025 | |
4026 Parser File | |
4027 | |
4028 The "parser file" is the C (or C++) file output by AnaGram when | |
4029 you execute the ©Build Parserª command. It contains all | |
4030 of the ©embedded Cª from your ©syntax fileª, all of the | |
4031 ©reduction procedureªs defined in your ©grammarª, | |
4032 syntax tables which represent, in a condensed form, all | |
4033 of the intricacies of your grammar, and a customized | |
4034 ©parsing engineª. The name of the parser file is given | |
4035 by the ©parser file nameª ©configuration parameterª. The | |
4036 name of the ©parserª itself is given by the ©parser | |
4037 nameª configuration parameter. | |
4038 | |
4039 If you wish the parser file to be written without carriage | |
4040 returns, suitable for a Unix environment, set the ©no crª | |
4041 configuration switch. | |
4042 ## | |
4043 | |
4044 Parser File Name | |
4045 | |
4046 "Parser file name" is a ©configuration parameterª which | |
4047 takes a string value. The default value is "#.c". | |
4048 AnaGram uses this parameter to generate the name of the | |
4049 output C file, or ©parser fileª, created by the ©Build | |
4050 Parserª command. The '#' character is used in this | |
4051 string as a wild card to indicate the name of the | |
4052 current ©syntax fileª. If the first character of the | |
4053 parser file name string is a '.' character, AnaGram | |
4054 will substitute the name of the current working | |
4055 directory for the dot. Thus ".\\#.c" will create the | |
4056 file name as a complete path. This can sometimes be | |
4057 important when using the ©line numbersª switch to | |
4058 enable a debugger to find code in your parser file. | |
4059 | |
4060 Note that the parser file name is not the same as the | |
4061 ©parser nameª. | |
4062 ## | |
4063 | |
4064 Parser Generator | |
4065 | |
4066 A parser generator, such as AnaGram, is a program that | |
4067 converts a ©grammarª, a rule-based description of the | |
4068 input to a program, into a conventional, procedural | |
4069 module called a ©parserª. The parsers AnaGram generates | |
4070 are simple C modules which can be compiled on almost | |
4071 any platform. AnaGram parsers are also compatible with | |
4072 C++. | |
4073 ## | |
4074 | |
4075 Header File, Parser Header | |
4076 | |
4077 When you use the command ©Build Parserª to generate | |
4078 source code for a parser, AnaGram creates two files, a | |
4079 header file and a C source file. Unless different | |
4080 paths are specified in the ©parser file nameª and | |
4081 ©header file nameª parameters, both files will be | |
4082 written to the directory that contains the ©syntax fileª. | |
4083 | |
4084 The header file contains a number of typedef statements, | |
4085 including the definition of the ©parser control blockª, | |
4086 and a number of macro | |
4087 definitions which may be useful in your parser | |
4088 or in other modules of your program. | |
4089 | |
4090 If you do not alter | |
4091 the ©header file nameª parameter, the | |
4092 name of the header file will be the same as the name of | |
4093 your ©syntax fileª and it will have the extension ".h". | |
4094 | |
4095 If you wish the header file to be written without carriage | |
4096 returns, suitable for a Unix environment, set the ©no crª | |
4097 configuration switch. | |
4098 ## | |
4099 | |
4100 Parser Input | |
4101 | |
4102 AnaGram ©parserªs may be configured to accept input in any of | |
4103 three different ways: | |
4104 | |
4105 By default, a ©parse functionª gets its input by invoking the | |
4106 ©GET_INPUTª macro each time it is ready for another input token. The | |
4107 default implementation of GET_INPUT reads ©input characterªs from stdin. For | |
4108 most practical problems, you will want to override this definition of | |
4109 GET_INPUT, storing the current input character in PCB.input_code. | |
4110 | |
4111 Alternatively, you may configure a parser to read input from an | |
4112 array in memory. Set the ©pointer inputª switch and load the | |
4113 ©pointerª field of the parser control block before calling the | |
4114 parse function. The parser will then run, incrementing the | |
4115 pointer, until it finishes or encounters an error. | |
4116 | |
4117 The third alternative is to set the ©event drivenª switch. The | |
4118 parser will be configured as a callback routine. Begin by calling | |
4119 the ©initializerª. Then, for each input character, store the | |
4120 character in the ©input_codeª field of the parser control block and | |
4121 call the parse function. Each time | |
4122 you call the parse function it will continue until it needs more | |
4123 input. You can check its status by inspecting the ©exit_flagª in the | |
4124 parser control block. | |
4125 | |
4126 The input to your parser may be either text characters or ©tokensª | |
4127 accumulated by a pre-processor, or ©lexical scannerª. The latter | |
4128 case is referred to as ©token inputª. If you use a lexical scanner, | |
4129 you may find it convenient to configure your parser as event driven. | |
4130 | |
4131 Altlhough lexical scanners are often not necessary | |
4132 when you use AnaGram, if you do need one you can write it in AnaGram. | |
4133 ## | |
4134 | |
4135 Parser Name | |
4136 | |
4137 "Parser Name" is a ©configuration parameterª which | |
4138 defaults to "#", where "#" represents the name of your | |
4139 ©syntax fileª. AnaGram uses this parameter to name your | |
4140 ©parse functionª. The ©initializerª for your parser will have the | |
4141 same name preceded by "init_". Note that "©parser file | |
4142 nameª" is not the same configuration parameter as "parser | |
4143 name". | |
4144 ## | |
4145 | |
4146 Parser Stack | |
4147 | |
4148 Your ©parserª uses a "parser stack" to keep track of the | |
4149 ©grammar rulesª it is trying to match and its progress | |
4150 in matching them. Normally, there are two separate | |
4151 stacks defined by AnaGram: ©PCBª.©ssª, the ©parser state | |
4152 stackª which maintains ©parser stateª numbers, and | |
4153 PCB.©vsª, the ©parser value stackª which maintains the | |
4154 ©semantic valueªs of tokens that have been identified so | |
4155 far. If you wish to maintain a stack tracking other | |
4156 variables you may set the ©context typeª ©configuration | |
4157 parameterª, and AnaGram will define a third stack, | |
4158 PCB.©csª. All are indexed by the same stack index, | |
4159 PCB.©ssxª. | |
4160 | |
4161 To see how tokens accumulate on the parser stack, run | |
4162 the ©Grammar Traceª or the ©File Traceª. | |
4163 | |
4164 Normally, when the return value of a ©reduction procedureª | |
4165 is stored on the parser value stack, it is stored by | |
4166 simply coercing the stack pointer to the correct type. | |
4167 If the return value is a C++ object, this can cause | |
4168 serious problems. These problems can be avoided by | |
4169 using the ©wrapperª statement. | |
4170 ## | |
4171 | |
4172 Parser stack alignment | |
4173 | |
4174 Parser stack alignment is a ©configuration parameterª whose | |
4175 value is a C or C++ data type. It defaults to "long". If | |
4176 any tokens have type "double", it will be automatically set | |
4177 to double. Thus, you will normally not need to change this | |
4178 parameter if your parser is to run on a PC or compatible | |
4179 processor. It provides alignment control for processors | |
4180 which restrict address for multibyte data access. The | |
4181 default setting provides for correct operation on 64 bit | |
4182 processors. | |
4183 | |
4184 To control byte alignment of the parser stack, | |
4185 ©PCBª.©vsª, AnaGram normally adds a field of the | |
4186 specified data type to the "union" statement which | |
4187 defines the data type for the ©parser stackª. This | |
4188 parameter can be used to deal with byte alignment | |
4189 problems when a ©parserª is to be run on a processor | |
4190 with byte alignment restrictions. For instance, if your | |
4191 ©grammarª has ©tokenªs of type "long double" and your | |
4192 processor requires long double variables to be | |
4193 properly aligned, you can include the following | |
4194 statement in a ©configuration sectionª in your grammar | |
4195 or in your ©configuration fileª: | |
4196 | |
4197 parser stack alignment = long double | |
4198 | |
4199 If the data type specified is "void", no alignment declaration | |
4200 will be made. | |
4201 ## | |
4202 | |
4203 Parser Stack Index, Stack Index | |
4204 | |
4205 The parser stack index, ©PCBª.©ssxª, tracks the depth | |
4206 of the ©parser state stackª, the ©parser value stackª, | |
4207 and the ©context stackª if you defined one. The parser | |
4208 stack index is incremented by ©shift actionsª and | |
4209 reduced by ©reduce actionsª. | |
4210 ## | |
4211 | |
4212 Parser Stack Overflow | |
4213 | |
4214 Your ©parserª uses a ©parser stackª to keep track of the | |
4215 ©grammar rulesª it is trying to match and its progress | |
4216 in matching them. If your grammar has any ©recursive | |
4217 ruleªs that are not strictly left recursive, then no | |
4218 matter how big you make the parser stack, it will be | |
4219 possible to create a syntactically correct input which | |
4220 will cause the stack to overflow. As a practical matter, | |
4221 however, it is usually possible to set the ©parser stack | |
4222 sizeª to a value large enough so that an overflow is a | |
4223 freak occurrence. Nevertheless, it is necessary to check | |
4224 for overflow, and in the case overflow should occur, | |
4225 your parser has to do something. What it does is invoke | |
4226 the ©PARSER_STACK_OVERFLOWª macro. If you don't define | |
4227 it, AnaGram will define it for you, although not | |
4228 necessarily to your taste. | |
4229 ## | |
4230 | |
4231 Recursive rule, Recursion | |
4232 | |
4233 A ©grammar ruleª is said to be "recursive" if the ©tokenª on the left side | |
4234 of the rule also appears on the right side of the rule, or | |
4235 in an ©expansion ruleª of any token on the right side of the rule. | |
4236 | |
4237 If the token on the left side is the | |
4238 first token on the right side, the rule is said to be "left recursive". | |
4239 If it is the last token on the right side, the rule is said to be | |
4240 "right recursive". Otherwise, the rule is "center recursive". | |
4241 | |
4242 For example: | |
4243 statement list | |
4244 -> statement | |
4245 -> statement list, statement // left recursive | |
4246 | |
4247 fraction part | |
4248 -> digit | |
4249 -> fraction part, digit // right recursive | |
4250 | |
4251 expression | |
4252 -> factor | |
4253 -> expression, '+' + '-', factor | |
4254 | |
4255 factor | |
4256 -> primary | |
4257 -> factor, '*' + '/', primary | |
4258 | |
4259 primary | |
4260 -> number | |
4261 -> name | |
4262 -> '(', expression, ')' // center recursive | |
4263 | |
4264 Note that if all the tokens in the rule other then the recursive token itself | |
4265 are ©zero lengthª tokens, it is possible for the | |
4266 rule to be matched arbitrarily many times without any input whatsoever. In | |
4267 other words, such a rule creates an infinite loop in the parser. AnaGram can | |
4268 detect this condition and issues an ©empty recursionª diagnostic if it occurs. | |
4269 | |
4270 ## | |
4271 | |
4272 PARSER_STACK_OVERFLOW | |
4273 | |
4274 PARSER_STACK_OVERFLOW is a user definable macro. If you | |
4275 do not define it, AnaGram will define it so that it | |
4276 will print a message on stderr and abort the ©parserª in | |
4277 case of a ©parser stack overflowª. | |
4278 ## | |
4279 | |
4280 Parser Stack Size | |
4281 | |
4282 "Parser stack size" is a ©configuration parameterª with | |
4283 a default value of 128. It is used to define the sizes | |
4284 of your ©parser stacksª in your ©parser control blockª. | |
4285 When analyzing your grammar, AnaGram will determine the | |
4286 minimum amount of stack space required for the deepest | |
4287 left ©recursionª. To this depth it will add one half the | |
4288 value of the parser stack size parameter. It will then | |
4289 set the actual stack size to the larger of this value | |
4290 and the parser stack size parameter. | |
4291 ## | |
4292 | |
4293 Parser State, State Number | |
4294 | |
4295 The essential part of your ©parserª is a group of tables | |
4296 which describe in detail what to do for each "state" of | |
4297 the parser. | |
4298 | |
4299 The states of a parser are determined by sets of | |
4300 "©characteristic rulesª". The ©State Definition Tableª | |
4301 shows the characteristic rules for each state of your | |
4302 parser. | |
4303 | |
4304 AnaGram numbers the states of a parser as it identifies | |
4305 them, beginning with zero. In all windows, state numbers | |
4306 are displayed as three digit numbers prefixed with the | |
4307 letter 'S'. | |
4308 ## | |
4309 | |
4310 Parser State Stack, State Stack | |
4311 | |
4312 The parser state stack is a stack maintained by your | |
4313 ©parserª and which is an integral part of the parsing | |
4314 process. At any point in the parse of your input | |
4315 stream, the parser state stack provides a summary of | |
4316 what has been found so far. The parser state stack is | |
4317 stored in ©PCBª.©ssª and is indexed by PCB.©ssxª, the | |
4318 ©parser stack indexª. | |
4319 ## | |
4320 | |
4321 Parser Value Stack, Value Stack | |
4322 | |
4323 In parallel with the ©parser state stackª, your parser | |
4324 maintains a "value stack", ©PCBª.©vsª, each entry of | |
4325 which corresponds to the ©semantic valueª of the token | |
4326 identified at that state. Since the semantic values of | |
4327 different tokens might well have different ©data typeªs, | |
4328 AnaGram gives you the opportunity, in your ©syntax | |
4329 fileª, to define the data type for any token. AnaGram | |
4330 then builds a typedef statement creating a data type | |
4331 which is a union of the all the types you have defined. | |
4332 AnaGram creates the name for this ©data typeª by | |
4333 appending "_vs_type" to the ©parser nameª. AnaGram uses | |
4334 this data type to define the value stack. | |
4335 ## | |
4336 | |
4337 Parser Action | |
4338 | |
4339 In a traditional LR parser, there are only four actions: the ©shift | |
4340 actionª, the ©reduce actionª, the ©accept actionª and the ©error | |
4341 actionª. AnaGram, in doing its ©grammar analysisª, identifies a | |
4342 number of special cases, and creates a number of extra actions which | |
4343 make for faster processing, but which can be represented as | |
4344 combinations of these primitive actions. | |
4345 | |
4346 When a shift action is performed, the current state | |
4347 number is pushed onto the ©parser state stackª and the | |
4348 new state number is determined by the current state | |
4349 number and the current input token. Different tokens | |
4350 cause different new states. | |
4351 | |
4352 When a reduce action is performed, the length of the | |
4353 rule being reduced is subtracted from the ©parser stack | |
4354 indexª and the new state number is read from the top of | |
4355 the parser state stack. The ©reduction tokenª for the | |
4356 rule being reduced is then used as an input token. | |
4357 ## | |
4358 | |
4359 Parsing Engine | |
4360 | |
4361 A parser consists of three basic components: A set of | |
4362 syntax tables, a set of ©reduction procedureªs and a | |
4363 parsing engine. The parsing engine is the body of code | |
4364 that interprets the parsing table, invokes input | |
4365 functions, and calls the reduction procedures. The | |
4366 ©Build Parserª command configures a parsing engine | |
4367 according to the implicit requirements of the syntax | |
4368 specification and according to the explicit values of | |
4369 the ©configuration parameterªs. | |
4370 | |
4371 The parsing engine itself is a simple automaton, | |
4372 characterized by a set of states and a set of inputs. | |
4373 The inputs are the tokens of your grammar. Each state | |
4374 is represented by a list of tokens which are admissible | |
4375 in that state and for each token a ©parser actionª to perform | |
4376 and a parameter which further defines the action. | |
4377 | |
4378 Each state in the grammar, with the exception of state | |
4379 zero, has a ©characteristic tokenª which must have been | |
4380 recognized in order to jump to that state. Therefore, | |
4381 the ©parser state stackª, which is essentially a list | |
4382 of state numbers, can also be thought of as a list of | |
4383 token numbers. This is the list of tokens that have | |
4384 been seen so far in the parse of your input stream. | |
4385 ## | |
4386 | |
4387 Partition | |
4388 | |
4389 If you use ©character setsª in your grammar, AnaGram | |
4390 will compute a "partition" of the ©character universeª. | |
4391 This partition is a collection of non-overlapping | |
4392 character sets such that every one of the sets you have | |
4393 defined can be written as a ©unionª of partition sets. | |
4394 | |
4395 Each partition set is assigned a unique ©tokenª. If one | |
4396 of your character sets requires more than one partition | |
4397 set to represent it, AnaGram will create appropriate | |
4398 ©productionªs and add them to your grammar so your parser | |
4399 can make the necessary distinctions. | |
4400 | |
4401 To see how AnaGram has partitioned the character | |
4402 universe, you may inspect the ©Partition Setsª window | |
4403 found in the ©Browse Menuª. | |
4404 ## | |
4405 | |
4406 Partition Set Number | |
4407 | |
4408 Each ©partitionª set is identified by a unique | |
4409 reference number called the partition set number. | |
4410 Partition set numbers are displayed in the form Pxxx. | |
4411 Partition sets are numbered starting with zero, so the | |
4412 first set is P000. | |
4413 | |
4414 To see the elements of a given partition set, call up | |
4415 the ©Partition Setsª window from the ©Browse Menuª, | |
4416 then, after selecting a partition set, call up the ©Set | |
4417 Elementsª window from the ©Auxiliary Windowsª popup menu. | |
4418 ## | |
4419 | |
4420 Partition Sets | |
4421 | |
4422 The Partition Sets option in the ©Browse Menuª pops up | |
4423 a window which shows the complete ©partitionª of the | |
4424 ©character universeª for your parser. | |
4425 | |
4426 The Partition Sets option in the ©Auxiliary Windowsª popup menu | |
4427 for the ©Character Setsª window lets you see the | |
4428 partition sets which cover the specified character set. | |
4429 | |
4430 Each entry in a Partition Sets window identifies a | |
4431 token number and a ©partition set numberª. The ©Auxiliary | |
4432 Windowsª menu provides a ©Set Elementsª entry which | |
4433 enables you to see precisely which characters belong to | |
4434 the partition set. It also has a Token Usage entry to show you | |
4435 what rules the set is used in. | |
4436 ## | |
4437 | |
4438 PCONTEXT | |
4439 | |
4440 PCONTEXT is an alternate form of the ©CONTEXTª macro | |
4441 which takes an explicit argument to specify the | |
4442 ©parser control blockª. PCONTEXT is defined in the ©parser | |
4443 headerª file. | |
4444 ## | |
4445 | |
4446 PERROR_CONTEXT | |
4447 | |
4448 PERROR_CONTEXT is an alternative form of the | |
4449 ©ERROR_CONTEXTª macro. It differs only in that it takes | |
4450 an argument so you can specify the appropriate | |
4451 ©parser control blockª explicitly. PERROR_CONTEXT is defined in | |
4452 the ©parser headerª file. | |
4453 ## | |
4454 | |
4455 pointer | |
4456 | |
4457 "pointer" is a field which will be included in the | |
4458 ©parser control blockª for your parser if you have set | |
4459 the ©pointer inputª ©configuration switchª. Your main | |
4460 program should set PCB.pointer before it calls your | |
4461 parser. Thereafter, your parser will increment it | |
4462 appropriately. When you are executing a ©reduction | |
4463 procedureª or a ©SYNTAX_ERRORª macro PCB.pointer will | |
4464 always point to the next input character to be read. | |
4465 ## | |
4466 | |
4467 Pointer input | |
4468 | |
4469 "Pointer input" is a ©configuration switchª which you | |
4470 may set to control ©parser inputª. It defaults to off. When you set | |
4471 pointer input, you tell AnaGram that the input to your parser is in | |
4472 memory and can be scanned simply by incrementing a pointer. Before | |
4473 calling your parser you should make sure that ©PCBª.©pointerª is | |
4474 properly initialized to point to the first character or token in your | |
4475 input. | |
4476 | |
4477 Use the ©configuration parameterª "©pointer typeª" to | |
4478 specify the type of the pointer. The default value of | |
4479 "pointer type" is "unsigned char *" | |
4480 | |
4481 There is no particular reason why pointer type should | |
4482 be limited to variants on char. It could define a | |
4483 pointer to int or a structure just as well. | |
4484 | |
4485 If you use pointer input with structures or C++ | |
4486 classes, you should set the ©input valuesª switch and | |
4487 define an ©INPUT_CODEª(t) macro. | |
4488 | |
4489 If you are using a 16 bit compiler and your input array | |
4490 is so large that you need "huge" | |
4491 pointers, make sure that "pointer type" is properly | |
4492 defined. | |
4493 ## | |
4494 | |
4495 Pointer Type | |
4496 | |
4497 "Pointer Type is a ©configuration parameterª which | |
4498 defaults to "unsigned char *". When you have specified | |
4499 ©pointer inputª, AnaGram uses the value of pointer type | |
4500 to declare a pointer field in your ©parser control | |
4501 blockª. | |
4502 ## | |
4503 | |
4504 Precedence, Operator Precedence | |
4505 | |
4506 In expressions of the form a+b*c, the convention is to | |
4507 perform the multiplication before the addition. | |
4508 Multiplication is said to take precedence over | |
4509 addition. In general the rank order in which operations | |
4510 are to be performed if there are no parentheses forcing | |
4511 an order of computation is called the precedence of the | |
4512 operators. | |
4513 | |
4514 If you have an ambiguous ©grammarª, that is, a grammar | |
4515 with a number of ©conflictªs, you may use ©precedence | |
4516 declarationªs to resolve the conflicts and to set | |
4517 operator precedence. | |
4518 ## | |
4519 | |
4520 Precedence Declaration | |
4521 | |
4522 Precedence declarations are ©attribute statementsª which | |
4523 may be used to resolve ©conflictªs in your grammar by | |
4524 assigning precedence and associativity to operators. | |
4525 Precedence declarations must be made inside | |
4526 ©configuration sectionsª. Each declaration consists of | |
4527 the keyword ©leftª, ©rightª, or ©nonassocª followed by a | |
4528 list of ©rule elementsª. The rule elements in the list | |
4529 must be separated by commas and the entire list must be | |
4530 enclosed in braces ({ }). | |
4531 | |
4532 Each of the rule elements is assigned the same | |
4533 precedence level, which is higher than that assigned in | |
4534 all previous precedence declarations and lower than that | |
4535 in all subsequent declarations. The rule elements are | |
4536 defined to be left, right, or nonassociative, | |
4537 depending on whether the keyword was "left", "right", or | |
4538 "nonassoc". | |
4539 | |
4540 All conflicts which are resolved by precedence | |
4541 declarations are listed in the ©Resolved Conflictsª | |
4542 window. | |
4543 ## | |
4544 | |
4545 Precedence Rules | |
4546 | |
4547 AnaGram can resolve certain types of ©conflictªs in your | |
4548 grammar by applying precedence rules. There are three | |
4549 classes of rules available: explicit ©precedence | |
4550 declarationsª, the "©stickyª" statement, and the | |
4551 implicit rule associated with the use of a "©disregardª" | |
4552 token outside a ©lexemeª. | |
4553 | |
4554 Whenever AnaGram uses a precedence rule of any kind to | |
4555 resolve a conflict, it produces a ©warningª message and | |
4556 lists the conflict in the ©Resolved Conflictsª window. | |
4557 ## | |
4558 | |
4559 Previous States | |
4560 | |
4561 The Previous States window can be accessed via the | |
4562 ©Auxiliary Windowsª popup menu from any window that identifies | |
4563 ©parser stateªs. It shows the ©characteristic ruleªs | |
4564 for all of the states which jump to the presently | |
4565 selected state. | |
4566 ## | |
4567 | |
4568 Print File Name | |
4569 | |
4570 "Print file name" is a configuration parameter which | |
4571 is not used in the Windows version of AnaGram. It is | |
4572 retained only for compatibility with pre-existing | |
4573 ©configuration fileªs. | |
4574 ## | |
4575 | |
4576 Problem States | |
4577 | |
4578 The Problem States window is essentially a trimmed | |
4579 version of the ©Reduction Statesª window. It is | |
4580 available in the ©Auxiliary Windowsª popup menu for the | |
4581 ©Conflictsª and ©Resolved Conflictsª windows. | |
4582 | |
4583 The Problem States window has the same format as the | |
4584 Reduction States window, and differs only in that it | |
4585 shows only those reduction states for which the | |
4586 ©conflict tokenª is acceptable input. | |
4587 ## | |
4588 | |
4589 Production | |
4590 | |
4591 Productions are the mechanism you use to describe how | |
4592 complex input structures are built up out of simpler | |
4593 ones. Each production has a left hand side and a right | |
4594 hand side. The right hand side, or ©grammar ruleª, is a | |
4595 sequence of ©rule elementsª, which may represent either | |
4596 ©terminal tokensª or ©nonterminal tokensª. The left | |
4597 hand side is a list of ©reduction tokensª. In most | |
4598 cases there would be only a single reduction token. | |
4599 Productions with more than one ©tokenª on the left hand | |
4600 side are called ©semantically determined productionsª. | |
4601 | |
4602 The "->" symbol is used to separate the left hand side | |
4603 from the right hand side. If you have several | |
4604 productions with the same left hand side, you can avoid | |
4605 rewriting the left hand side either by using '|' or by | |
4606 using another "->". | |
4607 | |
4608 A ©null productionª, or empty right hand side, cannot | |
4609 follow a '|'. | |
4610 | |
4611 Productions may be written thus: | |
4612 name | |
4613 -> letter | |
4614 -> name, digit | |
4615 | |
4616 This could also be written | |
4617 name -> letter | name, digit | |
4618 | |
4619 In order to accommodate semantic analysis of the data, | |
4620 you may attach to any grammar rule a ©reduction | |
4621 procedureª which will be executed when the rule is | |
4622 identified. Each token may have a ©semantic valueª. By | |
4623 using ©parameter assignmentªs, you may provide the | |
4624 reduction procedure with access to the semantic values of | |
4625 tokens that comprise the grammar rule. When it finishes, the | |
4626 reduction procedure may return a value which will be | |
4627 saved on the ©parser value stackª as the semantic value of the | |
4628 ©reduction tokenª. | |
4629 ## | |
4630 | |
4631 Productions | |
4632 | |
4633 The ©Productionªs window is available via the ©Auxiliary | |
4634 Windowsª popup menu in any window which identifies tokens. | |
4635 If the token identified by the highlighted line is | |
4636 ©nonterminalª, the Productions window will show the | |
4637 rules produced by that ©tokenª. | |
4638 ## | |
4639 | |
4640 PRULE_CONTEXT | |
4641 | |
4642 PRULE_CONTEXT is an alternative form of the | |
4643 ©RULE_CONTEXTª macro. It differs only in that it takes | |
4644 an argument so you can specify the appropriate ©parser control blockª | |
4645 explicitly. PRULE_CONTEXT is defined in | |
4646 the ©parser headerª file. | |
4647 ## | |
4648 | |
4649 Quick Reference | |
4650 | |
4651 "Quick reference" is an ©obsolete configuration switchª. | |
4652 ## | |
4653 | |
4654 Range Bounds Out of Order | |
4655 | |
4656 This is a ©warningª message that appears when you have a | |
4657 ©character rangeª of the form 'z-a'. AnaGram interprets | |
4658 this range as being equal to 'a-z', but provides a | |
4659 warning in case the unusual order was the result of a | |
4660 clerical error. | |
4661 ## | |
4662 | |
4663 Recursive Definition of Char Set | |
4664 | |
4665 This ©warningª appears when AnaGram discovers a | |
4666 recursively defined ©character setª. Character sets | |
4667 cannot be defined recursively. | |
4668 ## | |
4669 | |
4670 Redefinition | |
4671 | |
4672 "Redefinition of <name>" is a ©warningª message which | |
4673 appears when AnaGram discovers a redefinition of a | |
4674 ©symbolª. The new ©definitionª is ignored. | |
4675 ## | |
4676 | |
4677 Redefinition of Grammar Token | |
4678 | |
4679 This ©warningª appears when AnaGram encounters a new | |
4680 definition of the ©grammar tokenª. AnaGram discards the | |
4681 old definition. The last definition in the syntax file | |
4682 wins. If you get this warning, check your ©syntax fileª | |
4683 to make sure you have the grammar token you want. | |
4684 ## | |
4685 | |
4686 Redefinition of token | |
4687 | |
4688 "Redefinition of token, TXXX: <name>" is a ©warningª | |
4689 message which occurs when AnaGram encounters a | |
4690 ©definitionª statement and the specified ©grammar tokenª | |
4691 has already been seen on the left side of a | |
4692 ©productionª. AnaGram will ignore the definition | |
4693 statement. | |
4694 ## | |
4695 | |
4696 Reduce Action, Reduction | |
4697 | |
4698 The reduce action, or reduction, is one of the four | |
4699 actions of a traditional ©parsing engineª. The reduce | |
4700 action is performed when the parser has succeeded in | |
4701 matching all the elements of a ©grammar ruleª, and the | |
4702 next input token is not erroneous. Reducing the grammar | |
4703 rule amounts to subtracting the length of the rule from | |
4704 the ©parser stack indexª, identifying the ©reduction | |
4705 tokenª, stacking its ©semantic valueª and then doing a | |
4706 ©shift actionª with the reduction token as though it had | |
4707 been input directly. | |
4708 ## | |
4709 | |
4710 Reduce-Reduce Conflict | |
4711 | |
4712 A grammar has a "reduce-reduce" ©conflictª at some | |
4713 state if a single token turns out to be a ©reducing | |
4714 tokenª for more than one ©completed ruleª. | |
4715 ## | |
4716 | |
4717 Reducing Token | |
4718 | |
4719 In a ©parser stateª with more than one ©completed ruleª, | |
4720 your parser must be able to determine which one was | |
4721 actually found. Therefore, during analysis of your | |
4722 grammar, AnaGram examines each completed rule in order | |
4723 to determine all the states the ©parserª will branch to | |
4724 once the rule is reduced. These states are called the | |
4725 "reduction states" for the rule. In any window that | |
4726 displays ©marked ruleªs, these states may be found in | |
4727 the ©Reduction Statesª window listed in the ©Auxiliary | |
4728 Windowsª popup menu. | |
4729 | |
4730 The acceptable input tokens for those states are the | |
4731 "reducing tokens" for the completed rules in the state | |
4732 under investigation. If there is a single token which is | |
4733 a reducing token for more than one rule, then the | |
4734 grammar is said to have a ©reduce-reduce conflictª at | |
4735 that state. If in a particular state there is both a | |
4736 ©shift actionª and a ©reduce actionª for the same token | |
4737 the grammar is said to have a ©shift-reduce conflictª in | |
4738 that state. | |
4739 | |
4740 Note that a "reducing token" is not the same as a | |
4741 "©reduction tokenª". | |
4742 ## | |
4743 | |
4744 Reduction Choices | |
4745 | |
4746 "Reduction choices" is a ©configuration switchª which | |
4747 defaults to off. If it is set, AnaGram will include in | |
4748 your ©parser fileª a function which will identify the | |
4749 acceptable choices for ©reduction tokenª in the current | |
4750 state. This function, of course, is useful only if you | |
4751 are using ©semantically determined productionsª. The | |
4752 prototype of this function is: | |
4753 int $_reduction_choices(int *); | |
4754 where '$' represents the name of your parser. You must | |
4755 provide an integer array whose length is at least as | |
4756 long as the maximum number of reduction choices you | |
4757 might have. The function will fill the array with | |
4758 the token numbers of those which are acceptable in the | |
4759 current state and will return a count of the number of | |
4760 acceptable choices it found. | |
4761 ## | |
4762 | |
4763 reduction_token | |
4764 | |
4765 "reduction_token" is a field in your ©parser control | |
4766 blockª. If your grammar uses ©semantically determined | |
4767 productionsª, your ©reduction procedureªs need a | |
4768 mechanism to specify which token the rule is to reduce | |
4769 to. ©PCBª.reduction_token names the variable which | |
4770 contains the ©token numberª of the ©reduction tokenª. | |
4771 Prior to calling your reduction procedure, your parser | |
4772 will set this field to the token number of the default | |
4773 ©reduction tokenª, i.e., the leftmost syntactically correct token in the | |
4774 reduction token list for the production being reduced. | |
4775 If the reduction procedure establishes that a different | |
4776 reduction token is appropriate, it should store the | |
4777 appropriate token number in PCB.reduction_token. | |
4778 ## | |
4779 | |
4780 Reduction Procedures | |
4781 | |
4782 The Reduction Procedures window lists the C function | |
4783 prototypes for the ©reduction procedureªs in your grammar. | |
4784 | |
4785 When this window is active, the ©syntax fileª window, if | |
4786 visible, is synchronized with it so you can see the body of | |
4787 the reduction procedure as well as its usage. | |
4788 ## | |
4789 | |
4790 REDUCTION_TOKEN_ERROR | |
4791 | |
4792 REDUCTION_TOKEN_ERROR is a user definable macro which your ©parserª | |
4793 invokes when it encounters an inadmissible reduction | |
4794 token. This error should occur only if your parser uses | |
4795 ©semantically determined productionsª and your | |
4796 ©reduction procedureª provides an incorrect ©token | |
4797 numberª. If you do not define it, AnaGram will define | |
4798 it so that it will print an error message on stderr and | |
4799 abort the parse. | |
4800 | |
4801 ## | |
4802 | |
4803 Reduction Procedure, Semantic Action | |
4804 | |
4805 A "reduction procedure", or "semantic action", is a | |
4806 function you write which your ©parserª executes when it | |
4807 has identified the grammar rule to which the reduction | |
4808 procedure is attached in your grammar. | |
4809 | |
4810 When your parser has identified a particular ©grammar | |
4811 ruleª, that is to say, a particular sequence of ©tokensª | |
4812 that you have specified in your grammar, it "reduces" | |
4813 the production to the token at the head of the | |
4814 production, or ©reduction tokenª. | |
4815 | |
4816 If you choose, you can | |
4817 specify a "reduction procedure" which your parser will | |
4818 call so that your program can do semantic analysis on | |
4819 the production just identified. Your reduction procedure | |
4820 will be called using, as arguments, the ©semantic | |
4821 valuesª of tokens on the right side of the production. | |
4822 | |
4823 Your reduction procedure may, if you choose, return a | |
4824 value which will become the semantic value of the | |
4825 reduction token. Since many of the tokens in | |
4826 ©productionªs are there for only syntactic purposes, you | |
4827 may specify, when you write your grammar, the tokens | |
4828 whose values are needed as arguments for your reduction | |
4829 procedure. | |
4830 | |
4831 To attach a reduction procedure to a grammar rule, just | |
4832 write it immediately following the rule. There | |
4833 are two formats for reduction procedures, | |
4834 depending on the size and complexity of the procedure. | |
4835 | |
4836 The first form consists of an equal sign followed by a C | |
4837 expression and a semicolon. When the rule is matched the | |
4838 expression will be evaluated and its value will be | |
4839 stacked on the ©parser value stackª as | |
4840 the value of the reduction token. For example: | |
4841 =-a; | |
4842 =myProcedure(x, q); | |
4843 | |
4844 The second form consists of an equal sign followed by a | |
4845 block of C code enclosed in curly braces. If you wish to | |
4846 return a value for the reduction token you have to use a | |
4847 return statement. For example: | |
4848 ={ | |
4849 if (x > y) return x; | |
4850 return x+2y; | |
4851 } | |
4852 | |
4853 In both forms of the reduction procedure, ©parameter | |
4854 assignmentªs may be attached to ©rule elementªs in | |
4855 order to make their semantic values available to the reduction | |
4856 procedure. When the reduction procedure is executed, | |
4857 local variables | |
4858 will defined with the names specified in the parameter | |
4859 assignments. The values of these variables | |
4860 will have been set to the value of the corresponding | |
4861 token. | |
4862 | |
4863 If the return value of your reduction procedure is a | |
4864 C++ object, you may wish to spacify that AnaGram | |
4865 enclose it in a ©wrapperª so that constructor calls | |
4866 and destructor calls are made. Otherwise the object | |
4867 pushed onto and popped from the parser value stack simply by | |
4868 coercing the stack pointer to the appropriate type. | |
4869 | |
4870 The reduction procedures in your grammar are summarized | |
4871 in the ©Reduction Proceduresª window. | |
4872 ## | |
4873 | |
4874 Reduction States | |
4875 | |
4876 The Reduction States window can be accessed via the | |
4877 ©Auxiliary Windowsª popup menu from any window which displays | |
4878 ©parser stateª numbers and ©marked ruleªs. If the highlighted | |
4879 ©grammar ruleª has no marked token, the Reduction States window will | |
4880 show the states the parse could reach by reducing the rule and | |
4881 processing the resultant ©reduction tokenª. | |
4882 ## | |
4883 | |
4884 Reduction Token | |
4885 | |
4886 A ©tokenª which appears on the left hand side of a | |
4887 ©productionª is called a reduction token. It is so | |
4888 called because when the ©grammar ruleª on the right side | |
4889 of the production is matched in the input stream, your | |
4890 ©parserª will "reduce" the sequence of tokens which | |
4891 matches the rule by replacing the sequence of tokens | |
4892 with the reduction token. | |
4893 | |
4894 If more than one | |
4895 reduction token is specified, | |
4896 the production is called a ©semantically determined productionª | |
4897 and your ©reduction procedureª | |
4898 should choose the appropriate reduction token. If it does not, your parser | |
4899 will use the first token in the list that is syntactically | |
4900 correct as the default. | |
4901 | |
4902 The ©CHANGE_REDUCTIONª macro can be used to specify the reduction | |
4903 token. | |
4904 | |
4905 Note that a "reduction token" is not the same as a | |
4906 "©reducing tokenª". | |
4907 ## | |
4908 | |
4909 Reduction Trace | |
4910 | |
4911 The Reduction Trace window is available from the | |
4912 ©Conflictsª window and the ©Resolved Conflictsª window. | |
4913 It can be used in conjunction with the ©Conflict Traceª | |
4914 to study ©conflictªs. The Reduction Trace represents the | |
4915 result of taking the reduce option in the conflict state | |
4916 of the Conflict Trace. | |
4917 ## | |
4918 | |
4919 Reentrant Parser | |
4920 | |
4921 "Reentrant parser" is a ©configuration switchª which defaults to off. | |
4922 If it is on when AnaGram builds a parser AnaGram will generate code that | |
4923 passes the pointer to the ©parser control blockª via calling sequences, | |
4924 rather than using static references to the pcb. | |
4925 | |
4926 You can use the reentrant parser switch to help make ©thread safe | |
4927 parsersª. | |
4928 | |
4929 The reentrant parser switch is compatible with both C and C++. | |
4930 | |
4931 The reentrant parser switch cannot be used in conjunction with | |
4932 the ©old styleª switch. | |
4933 | |
4934 When you have enabled the reentrant parser switch, the ©parse functionª, | |
4935 the ©initializerª function, and the ©parser value functionª | |
4936 will be defined to take a pointer to the parser control block as | |
4937 their sole argument. | |
4938 ## | |
4939 | |
4940 Reload Button | |
4941 | |
4942 The ©File Traceª window includes a reload button to allow | |
4943 you to reread your ©test fileª after you have modified | |
4944 it without having to start a new file trace. After the | |
4945 file has been reread, the file trace is reset. | |
4946 ## | |
4947 | |
4948 rename macro | |
4949 | |
4950 AnaGram uses a number of macros in its generated code. | |
4951 It is possible, therefore, to run into naming | |
4952 collisions with other components of your program. The | |
4953 rename macro ©attribute statementª allows you to change | |
4954 the name AnaGram uses for a particular macro to avoid | |
4955 these problems. | |
4956 | |
4957 For example, in the Microsoft | |
4958 Foundation Classes, V4.2, there is a class called | |
4959 "CONTEXT". If you use the ©context stackª option in | |
4960 AnaGram, your ©parserª will have a macro called | |
4961 ©CONTEXTª. To avoid the name collision, add the | |
4962 following attribute statement to any configuration | |
4963 section in your grammar: | |
4964 rename macro CONTEXT AG_CONTEXT | |
4965 Then, simply use "AG_CONTEXT" where you would otherwise | |
4966 have used "CONTEXT". | |
4967 ## | |
4968 | |
4969 reserve keywords | |
4970 | |
4971 "reserve keywords" is an ©attribute statementª which | |
4972 can be used to specify a list of ©keywordªs that are | |
4973 reserved and cannot be used except as explicitly | |
4974 specified in the grammar. In particular this switch | |
4975 enables AnaGram to avoid issuing meaningless ©keyword | |
4976 anomalyª warnings. | |
4977 | |
4978 AnaGram does not automatically presume that keywords | |
4979 are also reserved words, since in many grammars there | |
4980 is no need to specify reserved words. | |
4981 | |
4982 Reserve keywords statements must be made inside | |
4983 ©configuration sectionsª. Each statement consists of | |
4984 the keyword "reserve keywords" followed by a list of | |
4985 keyword ©tokensª. The tokens must be separated by | |
4986 commas and the list must be enclosed in braces ({ }). | |
4987 Each keyword listed will then be treated as a reserved | |
4988 word. | |
4989 ## | |
4990 | |
4991 Reset Button | |
4992 | |
4993 The Reset button, found on ©File Traceª and ©Grammar | |
4994 Traceª windows restores the initial configuration of | |
4995 the trace. This is especially convenient for ©Conflict | |
4996 Traceª or other ©Auxiliary Traceªs. | |
4997 ## | |
4998 | |
4999 Resolved Conflicts | |
5000 | |
5001 AnaGram creates the Resolved Conflicts window only when | |
5002 the grammar it is analyzing has ©conflictªs and when | |
5003 those conflicts have been resolved by ©precedence | |
5004 declarationªs, by "©stickyª" statements, or in | |
5005 connection with the explicit use of a token specified in | |
5006 a ©disregardª statement. The Resolved Conflicts window | |
5007 shows the conflicts that have been resolved, using the | |
5008 same format as that of the ©Conflictsª Window. The rule | |
5009 chosen is marked with an asterisk in the leftmost column | |
5010 of the window. | |
5011 ## | |
5012 | |
5013 Resynchronization | |
5014 | |
5015 "Resynchronization" is the process of getting your | |
5016 parser back in step with its input after encountering a | |
5017 ©syntax errorª. As such, it is one method of ©error | |
5018 recoveryª. Of course, you would resynchronize only if it | |
5019 is necessary to continue after the error. There are | |
5020 several options available when using AnaGram. You could | |
5021 use the ©auto resynchª option, which causes AnaGram to | |
5022 incorporate an automatic resynchronizing procedure into | |
5023 your parser, or you could use the ©error token | |
5024 resynchronizationª option, which is similar to the | |
5025 technique used by YACC programmers. | |
5026 ## | |
5027 | |
5028 right | |
5029 | |
5030 "right" controls a ©precedence declarationª, indicating | |
5031 that all of the listed ©rule elementsª are to be | |
5032 considered ©right associativeª. | |
5033 ## | |
5034 | |
5035 Right Associative | |
5036 | |
5037 A binary operator is said to be right associative if | |
5038 an expression with repeated instances of the operator | |
5039 is to be evaluated from the right. Thus, for example, | |
5040 when '=' is used as an assignment operator | |
5041 x = a = b | |
5042 is normally taken to mean a = b followed by x = a The | |
5043 assignment operator is said to be right associative. | |
5044 | |
5045 In ©grammarªs with ©conflictªs, you may use ©precedence | |
5046 declarationªs to specify that an operator should be | |
5047 right associative. | |
5048 ## | |
5049 | |
5050 Rule Context | |
5051 | |
5052 The Rule Context window can be accessed via the | |
5053 ©Auxiliary Windowsª menu in any window that displays | |
5054 ©grammar ruleªs. AnaGram displays all occurrences in the | |
5055 ©grammarª of all the ©reduction tokenªs for the rule. | |
5056 ## | |
5057 | |
5058 RULE_CONTEXT | |
5059 | |
5060 RULE_CONTEXT is a macro you may use if you have defined | |
5061 a ©context stackª. In any reduction procedure, | |
5062 RULE_CONTEXT will be a pointer to the context value | |
5063 stacked before the first token of the rule being | |
5064 reduced. Since the context stack contains an entry for | |
5065 each token in the rule, you may inspect the context | |
5066 value for each token in the rule by subscripting | |
5067 RULE_CONTEXT. RULE_CONTEXT[k] is the context of the | |
5068 (k-1)th token in the rule. | |
5069 ## | |
5070 | |
5071 Rule Coverage | |
5072 | |
5073 "Rule Coverage" is the name of both a ©configuration | |
5074 switchª and a window. The configuration switch | |
5075 defaults to off. If you set it, AnaGram will include | |
5076 code in your ©parserª to count the number of times your | |
5077 parser identifies each ©grammar ruleª in your grammar. | |
5078 To maintain the counts, AnaGram declares, at the | |
5079 beginning of your parser, an integer array, whose name | |
5080 is created by appending "_nrc" to your ©parser nameª. | |
5081 The array contains one counter for each rule you have | |
5082 defined in your grammar. There are no entries for the | |
5083 auxiliary rules that AnaGram creates to deal with set | |
5084 overlaps or ©disregardª statements. In order to identify | |
5085 positively all the rules that the parser reduces, | |
5086 AnaGram has to turn off certain optimization features in | |
5087 your parser. Therefore a parser that has rule coverage | |
5088 enabled will run slightly slower that one with the | |
5089 switch off. | |
5090 | |
5091 In addition, AnaGram creates a pair of functions to | |
5092 write the counters to a file and to initialize the | |
5093 counters from a file. The names of these functions are | |
5094 given by appending "_write_counts" and "_read_counts" to | |
5095 the name of your parser. The name of the file is given by the | |
5096 ©coverage file nameª paramater which defaults | |
5097 to the name of your ©syntax fileª but with the extension ".nrc". | |
5098 | |
5099 If rule coverage is enabled, AnaGram will also enable the | |
5100 Rule Coverage option on the ©Browse Menuª. If you select | |
5101 Rule Coverage, AnaGram will initialize a ©Rule Coverageª | |
5102 window from the rule count file you select. | |
5103 | |
5104 AnaGram will | |
5105 warn you if the rule count file is older than | |
5106 the syntax file, since under those conditions, the | |
5107 coverage file might be invalid. | |
5108 ## | |
5109 | |
5110 Rule Derivation, Token Derivation | |
5111 | |
5112 You can use the Rule Derivation and Token Derivation | |
5113 windows to understand the nature of ©conflictªs in your | |
5114 grammar. To create these windows, open the ©Conflictsª | |
5115 window. Move the cursor bar to a ©completed ruleª, that | |
5116 is, one which has no marked token. Press the right mouse button to pop | |
5117 up the ©Auxiliary Windowsª menu. You may then select the Rule | |
5118 Derivation or the Token Derivation. | |
5119 | |
5120 The Rule Derivation window and the Token Derivation | |
5121 window, together, show how a ©conflictª, or ambiguity, | |
5122 has arisen in your grammar. Both windows contain a | |
5123 sequence of rules, and both begin with the same rule, | |
5124 the rule which is the root cause of the conflict. | |
5125 | |
5126 Each subsequent line in the rule derivation is an | |
5127 ©expansionª of the marked token in | |
5128 the previous rule. The last rule in the derivation | |
5129 window is the rule you selected in the Conflicts | |
5130 window. Thus the rule derivation window shows you how | |
5131 the rule involved in the conflict derives from the | |
5132 root. | |
5133 | |
5134 Each subsequent line in the token derivation window | |
5135 shows an expansion of the marked token in the previous rule. The first | |
5136 token of the last rule in the derivation window is the token that | |
5137 causes the conflict. This is the usage that is inconsistent with other | |
5138 usages of this token in the conflict state. | |
5139 | |
5140 The Rule Derivation and Token Derivation windows each | |
5141 have five auxiliary windows. The ©Rule Contextª window | |
5142 is keyed to the highlighted rule. the other four | |
5143 windows, the ©Expansion Rulesª window, the | |
5144 ©Productionsª window, the ©Set Elementsª window and the | |
5145 ©Token Usageª window are keyed to the marked token. | |
5146 Remember that there is no marked token on the last | |
5147 line of the Rule Derivation window. | |
5148 ## | |
5149 | |
5150 Rule Element | |
5151 | |
5152 A ©grammar ruleª is a list of "rule elements", separated | |
5153 by commas. Rule elements may be ©token nameªs, | |
5154 ©character setsª, ©keywordªs, ©immediate actionªs, or | |
5155 ©virtual productionsª. When AnaGram encounters a rule | |
5156 element for which no token presently exists, it creates | |
5157 one. | |
5158 | |
5159 Any rule element may be followed by a ©parameter assignmentª | |
5160 in order to make the ©semantic valueª of | |
5161 the rule element available to a ©reduction procedureª. | |
5162 ## | |
5163 | |
5164 Rule Number | |
5165 | |
5166 AnaGram assigns a unique rule number to each ©grammar | |
5167 ruleª that you specify in your grammar. Rules are | |
5168 numbered sequentially as they are encountered in the | |
5169 ©syntax fileª. AnaGram constructs rule 0 itself. Rule | |
5170 zero has a single ©rule elementª, the ©grammar tokenª, | |
5171 unless you have an ©disregardª statement in your | |
5172 grammar. In this case, there will be two elements. | |
5173 | |
5174 In AnaGram displays, rule numbers are displayed with a | |
5175 prefixed 'R' and a three digit decimal number. | |
5176 ## | |
5177 | |
5178 Rule Stack, Rule Stack Pane | |
5179 | |
5180 The Rule Stack pane appears across the bottom of a ©Grammar | |
5181 Traceª or ©File Traceª window. It provides an alternate view of the | |
5182 parser stack for the trace, showing, for each state, rules instead of | |
5183 the tokens that you see in the ©Parser Stack paneª. Because it is | |
5184 synched with the syntax file window, the Rule Stack makes it easy to | |
5185 see the relationship between the trace and your grammar. | |
5186 | |
5187 For each level of the parser stack, the Rule Stack shows the ©parser | |
5188 stateª number and all the active rules. The active rules at any | |
5189 state consist of all the ©expansion ruleªs for the state that are | |
5190 consistent with the input at all subsequent states. | |
5191 | |
5192 Except for the last level | |
5193 of the stack, each rule has a ©marked tokenª, which in the default | |
5194 configuration is displayed in bold, italic type. The significance of | |
5195 the marked token is that all tokens in the rule to the left of the | |
5196 marked token have already been matched in the input, and the input | |
5197 in subsequent levels is consistent so far with the marked | |
5198 token. As more input is processed, rules | |
5199 that are inconsistent with the new input are deleted from the display. | |
5200 | |
5201 The last level of the stack shows the current state of the parser and | |
5202 the rules against which the ©lookahead tokenª will be matched. At | |
5203 this level, there may be rules with no marked tokens. These are | |
5204 rules which have been matched exactly in the input. If there is | |
5205 more than one such rule, at the next parser step the parser will use | |
5206 the lookahead token to determine which rule to reduce. | |
5207 | |
5208 In the last level of the stack, marked tokens represent the input the | |
5209 parser expects to see. | |
5210 | |
5211 The Rule Stack pane is synched with the ©syntax fileª window if it is | |
5212 visible so that the rule highlighted in the Rule Stack can be seen | |
5213 in context in the syntax file. | |
5214 For rules that AnaGram | |
5215 generated automatically (to implement ©virtual productionsª | |
5216 or the ©disregardª statement). the cursor bar will move to the | |
5217 top of the syntax file window. | |
5218 | |
5219 The Rule Stack pane is also synched with the other panes in the trace. | |
5220 As you move the cursor bar in the Rule Stack, the cursor bar in the | |
5221 Parser Stack pane will track the stack level in the Rule Stack. In | |
5222 a File Trace, text will be highlighted in the ©Test Fileª pane | |
5223 corresponding to the selected token in the Parser Stack pane. In a | |
5224 Grammar Trace, the marked token in the highlighted rule will be | |
5225 highlighted in the ©Allowable Input paneª. | |
5226 | |
5227 Clicking the right mouse button pops up an ©Auxiliary Windowsª menu to | |
5228 give you more information about the highlighted rule. | |
5229 ## | |
5230 | |
5231 Rule Table | |
5232 | |
5233 The Rule Table lists, in numerical order, all the | |
5234 ©grammar ruleªs defined in your ©grammarª. Each rule is | |
5235 preceded by the ©nonterminalª tokens which produce it. | |
5236 If you are not using ©semantically determined | |
5237 productionªs, then there will be precisely one token | |
5238 line per rule. The Rule Table is synched to your ©syntax | |
5239 fileª to show the rule in context. | |
5240 ## | |
5241 | |
5242 Semantic Value, Token Value | |
5243 | |
5244 A ©tokenª generally has a "semantic value", or "token | |
5245 value", as well as the ©token numberª which identifies | |
5246 it syntactically. Each instance of the token in the | |
5247 input stream can have a different value. For example, | |
5248 you might have a token called "variable name". In one | |
5249 instance the variable name might be "widget" and in | |
5250 another, "wombat". Then "widget" and "wombat" would be | |
5251 the semantic values in the two instances. Another token | |
5252 might have numeric semantic values. | |
5253 | |
5254 You can specify the C or C++ ©data typeª of the token value. | |
5255 The data type of "variable name" could be "char *" | |
5256 where the value is a pointer to a string holding the name. There | |
5257 are separate default types for the values of ©terminalª | |
5258 and ©nonterminalª tokens. In the usual case of ordinary | |
5259 character input, the value of a terminal token is just | |
5260 the ascii character code. | |
5261 | |
5262 The value of a nonterminal token is determined by the ©reduction procedureªs | |
5263 attached to the rules the token produces. If there is no reduction | |
5264 procedure, the value of the token is the value of the first token | |
5265 in the rule. | |
5266 | |
5267 It should be noted that the stack operations have been | |
5268 implemented in such a way that a C++ object that belongs | |
5269 to a class for which the assignment operator has been | |
5270 overridden will encounter serious problems. This shortcoming | |
5271 will be addressed in a future version of AnaGram. Note that | |
5272 there is no problem with using a pointer to any C++ object. | |
5273 ## | |
5274 | |
5275 Semantically Determined Production | |
5276 | |
5277 A "semantically determined production" is one which has | |
5278 more than one ©reduction tokenª specified on the left | |
5279 side of the ©productionª. You would write such a | |
5280 production when the reduction tokens are syntactically | |
5281 indistinguishable. The ©reduction procedureª may then | |
5282 specify which of the listed reduction tokens the grammar | |
5283 rule is to reduce to based on semantic considerations. | |
5284 If there is no reduction procedure, or the reduction | |
5285 procedure does not specify a reduction token, the parser | |
5286 will use the first syntactically correct one in the list. | |
5287 | |
5288 To simplify changing the reduction token, AnaGram | |
5289 provides a predefined macro, ©CHANGE_REDUCTIONª. | |
5290 | |
5291 The ©semantic valueªs of all the reduction tokens for a | |
5292 given semantically determined production must have the | |
5293 same ©data typeª. | |
5294 | |
5295 ©File Traceª and ©Grammar Traceª have a ©Reduction Choices paneª which | |
5296 appears when a semantically determined production is invoked and | |
5297 you need to choose a reduction token. | |
5298 ## | |
5299 | |
5300 Set Elements | |
5301 | |
5302 The Set Elements window is available via the ©Auxiliary | |
5303 Windowsª popup menu from windows which specify character sets, | |
5304 partition sets or tokens. It displays the actual | |
5305 characters which make up the set, or which map to the | |
5306 specified token. For each character, the numeric code as | |
5307 well as its display symbol is given. | |
5308 ## | |
5309 | |
5310 Set Expression, Expression | |
5311 | |
5312 A set expression is an algebraic expression used to | |
5313 define a ©character setª in terms of individual | |
5314 characters, ranges of characters, or other sets of | |
5315 characters as constructed using ©complementsª, ©unionsª, | |
5316 ©intersectionsª, and ©differencesª. | |
5317 ## | |
5318 | |
5319 Shift Action | |
5320 | |
5321 The shift action is one of the four actions of a | |
5322 traditional ©parsing engineª. The shift action is | |
5323 performed when the input token matches one of the | |
5324 acceptable input tokens for the current ©parser stateª. | |
5325 The ©semantic valueª of the token and the current | |
5326 ©state numberª are stacked, the ©parser stack indexª is | |
5327 incremented and the state number is set to a value | |
5328 determined by the previous state and the input token. | |
5329 ## | |
5330 | |
5331 Shift-Reduce Conflict | |
5332 | |
5333 A "shift-reduce" ©conflictª occurs if in some ©parser | |
5334 stateª there exists a ©terminal tokenª that should be | |
5335 shifted, because it is legitimate input for one of the | |
5336 ©grammar ruleªs of the state, but should also be used to | |
5337 reduce some other rule because it is a ©reducing tokenª | |
5338 for that rule. | |
5339 ## | |
5340 | |
5341 sn | |
5342 | |
5343 sn is a field in a ©parser control blockª to which your | |
5344 ©error handlingª routines and your ©reduction | |
5345 procedureªs may refer. Its value is the current ©state | |
5346 numberª of your ©parserª. sn is modified every time | |
5347 your parser "shifts" (performs a ©shift actionª on) a | |
5348 token or reduces (performs a ©reduce actionª on) a | |
5349 ©productionª. | |
5350 ## | |
5351 | |
5352 ss | |
5353 | |
5354 ss is a field in a ©parser control blockª to which your | |
5355 ©error handlingª and ©reduction procedureªs may refer. | |
5356 It is the ©state stackª for your ©parserª. Before every | |
5357 ©shift actionª, the current ©state numberª, ©snª, is | |
5358 stored in PCB.ss[PCB.ssx], where ©ssxª is the ©parser | |
5359 stack indexª. PCB.ssx is then incremented. | |
5360 ## | |
5361 | |
5362 ssx | |
5363 | |
5364 ssx is a field in a ©parser control blockª to which | |
5365 your ©error handlingª routines and ©reduction | |
5366 procedureªs may refer. It is the ©parser stack indexª | |
5367 for your ©parserª. On every ©shift actionª it is | |
5368 incremented. On every ©reduce actionª the length of | |
5369 the ©grammar ruleª being reduced is subtracted from | |
5370 PCB.ssx. | |
5371 ## | |
5372 | |
5373 State Definition | |
5374 | |
5375 The State Definition window can be accessed via the | |
5376 ©Auxiliary Windowsª popup menu from any window that specifies | |
5377 states. It displays the ©characteristic rulesª that | |
5378 define the state. The rules are displayed with a marked token, which is | |
5379 the next token needed in the input if the particular ©grammar ruleª is | |
5380 to be matched. If the rule is a completed rule, no token will be | |
5381 marked. | |
5382 | |
5383 Each line contains the state number, blank if it is the | |
5384 same as the state number of the previous line, the ©rule | |
5385 numberª, and finally the ©marked ruleª. | |
5386 | |
5387 The ©State Definition Tableª, found in the ©Browse | |
5388 Menuª, displays the characteristic rules for all states | |
5389 in the ©grammarª. | |
5390 ## | |
5391 | |
5392 State Definition Table | |
5393 | |
5394 The State Definition Table lists, for each ©parser | |
5395 stateª, all of the ©characteristic rulesª which define | |
5396 that state. The rules are displayed with a ©marked tokenª, which is the | |
5397 next token needed in the input if the particular ©grammar ruleª is to | |
5398 be matched. If the rule is a completed rule, no token will be | |
5399 marked. | |
5400 | |
5401 Each line contains the state number, blank if it is the | |
5402 same as the state number of the previous line, the ©rule | |
5403 numberª, and finally the ©marked ruleª. | |
5404 | |
5405 In the ©Auxiliary Windowsª menu for many states there is | |
5406 a ©State Definitionª entry which provides the | |
5407 characteristic rules for the ©parser stateª identified by | |
5408 the cursor bar. | |
5409 ## | |
5410 | |
5411 State Expansion | |
5412 | |
5413 The State Expansion window may be accessed using the | |
5414 ©Auxiliary Windowsª menu from any window that identifies | |
5415 a particular ©parser stateª. It shows the complete set | |
5416 of ©expansion ruleªs for the state, consisting of the | |
5417 union of the set of ©characteristic ruleªs and, for each | |
5418 characteristic rule, the set of expansion rules for the | |
5419 marked token. Thus the State | |
5420 Expansion window shows all possible legal input to your | |
5421 parser in the given state. | |
5422 ## | |
5423 | |
5424 Sticky | |
5425 | |
5426 "Sticky" statements are ©attribute statementªs and may | |
5427 be used just like a ©precedence declarationª to resolve | |
5428 ©conflictªs. If a ©shift-reduce conflictª occurs in a | |
5429 state where the ©characteristic tokenª is "sticky", the | |
5430 shift action will always be chosen. | |
5431 | |
5432 Sticky statements must be made inside ©configuration | |
5433 sectionsª. Each statement consists of the keyword | |
5434 "sticky" followed by a list of ©tokensª. The tokens must | |
5435 be separated by commas and the list must be enclosed in | |
5436 braces ({ }). Each token will then be treated as sticky. | |
5437 | |
5438 All conflicts which are resolved by sticky statements | |
5439 are listed in the ©Resolved Conflictsª window. | |
5440 ## | |
5441 | |
5442 subgrammar | |
5443 | |
5444 Declaring a nonterminal token to be a "subgrammar" | |
5445 changes the way AnaGram searches for reducing tokens. | |
5446 | |
5447 Normally, if there is a completed rule in a particular | |
5448 state, AnaGram investigates all states to which the | |
5449 parser could jump on reducing the rule. It then | |
5450 considers all terminal tokens that are acceptable input | |
5451 in these states to be reducing tokens for the given | |
5452 rule. If this set of tokens overlaps the set of tokens | |
5453 for which there are shift actions, or the set of tokens | |
5454 which reduce a different rule, there is a ©conflictª. | |
5455 | |
5456 Now consider a particular nonterminal token T and all | |
5457 the rules it produces, whether directly or indirectly. | |
5458 What the preceding remarks mean is that in determining | |
5459 the reducing tokens for any of these rules, AnaGram | |
5460 considers not only the definition, but also the usage | |
5461 of T. | |
5462 | |
5463 There are circumstances when it is inappropriate to | |
5464 consider the usage of T. The most common example occurs | |
5465 when building a lexical scanner for a language such as | |
5466 C. In this case, you can write a complete grammar for a | |
5467 C token with no difficulty. But if you try to extend it | |
5468 to a sequence of tokens, you get scores of conflicts. | |
5469 This situation arises because you specify that any C | |
5470 token can follow another, when in actual practice, an | |
5471 identifier, for example, cannot follow another | |
5472 identifier without some intervening space or | |
5473 punctuation. While it is theoretically possible to write | |
5474 a grammar for a sequence of tokens that has no | |
5475 conflicts, it is not usually pretty. | |
5476 | |
5477 The subgrammar declaration resolves this problem by | |
5478 telling AnaGram that when it is looking for reducing | |
5479 tokens for any rule produced directly or indirectly by a | |
5480 subgrammar token, it should disregard the usage of the | |
5481 token and only consider usage internal to the definition | |
5482 of the subgrammar token, as though the subgrammar token | |
5483 were the start token of the grammar. | |
5484 | |
5485 The subgrammar declaration is made in a ©configuration | |
5486 sectionª and consists of the keyword "subgrammar" | |
5487 followed by a list of token names separated by | |
5488 commas and enclosed in braces ({ }). For example: | |
5489 subgrammar { name, number} | |
5490 ## | |
5491 | |
5492 Suspicious Production | |
5493 | |
5494 This ©warningª message appears when AnaGram finds a | |
5495 ©productionª of the form x -> x. There is probably a | |
5496 typo somewhere in your ©syntax fileª. This production | |
5497 causes a ©conflictª in your grammar. AnaGram leaves | |
5498 this production in your ©grammarª, but if you build a | |
5499 parser, it will never succeed in recognizing this | |
5500 production. | |
5501 ## | |
5502 | |
5503 Switch Takes on/off Values Only | |
5504 | |
5505 The specified parameter is a ©configuration switchª. The | |
5506 only values it may be assigned are ON and OFF. | |
5507 | |
5508 ## | |
5509 | |
5510 Symbol | |
5511 | |
5512 In writing your ©grammarª you use symbols, or names, to | |
5513 represent most of your ©tokensª. You may also use | |
5514 symbols to represent ©character setªs, ©virtual | |
5515 productionªs, ©immediate actionªs, or ©keywordªs. | |
5516 | |
5517 A symbol, or name, must begin with a letter or an | |
5518 underscore. It may then contain any number of these | |
5519 characters as well as digits and embedded white space | |
5520 (including comments). For identification purposes all | |
5521 adjacent white space characters within a symbol name | |
5522 are considered to be a single blank. | |
5523 | |
5524 Upper case and lower case letters are considered to be | |
5525 different. | |
5526 | |
5527 Examples: | |
5528 token name | |
5529 token/*embedded comment*/name | |
5530 | |
5531 All symbols used in your grammar are listed in | |
5532 the ©Symbol Tableª window found in the ©Browse Menuª. | |
5533 ## | |
5534 | |
5535 Symbol Table | |
5536 | |
5537 The Symbol Table lists all the symbols, or names, you | |
5538 used in your grammar. ©Symbolªs may be used, of course, | |
5539 to identify ©tokensª, ©definitionsª, ©virtual | |
5540 productionsª, ©immediate actionªs, or ©keywordªs. | |
5541 | |
5542 Each line in this table identifies a single symbol. The | |
5543 first field is the token number, if any. This is | |
5544 followed by the name. If the name identifies an | |
5545 ©expressionª or virtual production, it is followed by an | |
5546 equal sign and the expression or virtual production. | |
5547 ## | |
5548 | |
5549 Syntax Analysis Aborted | |
5550 | |
5551 This ©warningª message appears if, because of previous | |
5552 errors, AnaGram is unable to complete the ©Analyze | |
5553 Grammarª command on your ©syntax fileª. | |
5554 ## | |
5555 | |
5556 Syntax Directed Parsing | |
5557 | |
5558 Syntax directed parsing, or formal parsing, is an | |
5559 approach to building ©parsersª based on formal language | |
5560 theory. Given a suitable description of a language, | |
5561 called a ©grammarª, there are algorithms which can be | |
5562 used to create parsers for the language automatically. | |
5563 In this context, the set of all possible inputs to a | |
5564 program may be considered to constitute a language, and | |
5565 the rules for formulating the input to the program | |
5566 constitute the grammar for the language. | |
5567 | |
5568 The parsers built from a grammar have the advantage | |
5569 that they can recognize any input that conforms to the | |
5570 rules, and can reject as erroneous any input that fails | |
5571 to conform. | |
5572 | |
5573 Since the program logic necessary to parse input is | |
5574 often extremely intricate, programs which use formal | |
5575 parsing are usually much more reliable than those built | |
5576 by hand. They are also much easier to maintain, since | |
5577 it is much easier to modify a grammar specification | |
5578 than it is to modify complex program logic. | |
5579 ## | |
5580 | |
5581 Syntax Error | |
5582 | |
5583 When you specify a ©grammarª, you specify a set of | |
5584 input character or token sequences which your ©parserª | |
5585 will "recognize". Usually it is possible for there to | |
5586 be other sequences of input tokens which deviate from | |
5587 the rules set down by your grammar. Should your parser | |
5588 find such a sequence in its input which is not | |
5589 explicitly allowed for in your grammar, it is said to | |
5590 have found a "syntax error". The general treatment of | |
5591 syntax errors is called ©error handlingª, of which there | |
5592 are two distinct aspects: ©error diagnosisª and ©error | |
5593 recoveryª. AnaGram allows you to make provision for | |
5594 error handling to fit your needs, but should you not do | |
5595 so, it will provide simple default error handling. | |
5596 ## | |
5597 | |
5598 Statements | |
5599 | |
5600 AnaGram source files, or ©syntax fileªs, consist of | |
5601 the following types of statements: | |
5602 ©productionªs | |
5603 ©configuration sectionªs | |
5604 ©embedded Cª | |
5605 ©definitionªs | |
5606 ©token declarationªs | |
5607 | |
5608 Statements may be in any order. Each statement must | |
5609 begin on a new line. If a statement cannot be | |
5610 construed as complete, it may continue onto another | |
5611 line. | |
5612 | |
5613 Statements may contain spaces, tabs or comments, but | |
5614 may not contain blank lines. | |
5615 ## | |
5616 | |
5617 Syntax File | |
5618 | |
5619 Input files to AnaGram are called syntax files. The | |
5620 default extension for syntax files is .syn. A | |
5621 syntax file contains a "©grammarª" and supporting C or | |
5622 C++ code. The file consists of several distinct types | |
5623 of statements. These are ©token declarationsª, | |
5624 ©productionªs, ©definitionsª, ©embedded Cª, and | |
5625 ©configuration sectionsª. There may be as many of each | |
5626 as you need, in whatever order you find convenient. | |
5627 | |
5628 Each such statement begins on a new line. | |
5629 ## | |
5630 | |
5631 SYNTAX_ERROR | |
5632 | |
5633 SYNTAX_ERROR is a macro which your parser will invoke | |
5634 when it encounters a syntax error in its input stream. | |
5635 If you have set the ©diagnose errorsª ©configuration | |
5636 switchª, the static variable ©PCBª.©syntax_errorª will | |
5637 contain a pointer to a diagnostic message when | |
5638 SYNTAX_ERROR is invoked. If you have also set the | |
5639 ©error frameª switch, ©PCBª.©error_frame_ssxª and | |
5640 ©PCBª.©error_frame_tokenª will also be set | |
5641 appropriately. | |
5642 ## | |
5643 | |
5644 Tab Spacing | |
5645 | |
5646 "tab spacing" is a ©configuration parameterª which | |
5647 controls the expansion of tabs when AnaGram displays | |
5648 your source file or test files in the ©File Traceª window. | |
5649 | |
5650 The value of "tab spacing" is also used to set the | |
5651 default value of the ©TAB_SPACINGª macro in your parser. | |
5652 | |
5653 The default value of "tab spacing" is 8. If you prefer | |
5654 a different value, you should probably include an | |
5655 appropriate statement in your ©configuration fileª. For | |
5656 example: | |
5657 | |
5658 tab spacing = 2 | |
5659 ## | |
5660 | |
5661 TAB_SPACING | |
5662 | |
5663 If you have enabled the ©lines and columnsª switch, your | |
5664 parser needs to know tab spacing in order to increment | |
5665 the column count when it encounters a tab character. It | |
5666 is set up to use the value given by the TAB_SPACING | |
5667 macro. If you do not define TAB_SPACING in your parser, | |
5668 AnaGram will provide a default definition, setting it to | |
5669 the value of the ©tab spacingª ©configuration | |
5670 parameterª. | |
5671 ## | |
5672 | |
5673 Terminal, Terminal Token | |
5674 | |
5675 A "terminal token" is a token which does not appear on | |
5676 the left side of a ©productionª. It represents, | |
5677 therefore, a basic unit of input to your ©parserª. If | |
5678 the input to your parser consists of ascii characters, | |
5679 you may define terminal tokens explicitly as ascii | |
5680 characters or as sets of ascii characters. If you have a | |
5681 lexical scanner, or preprocessor, which produces numeric | |
5682 codes, you may define the terminal tokens directly in | |
5683 terms of these numeric codes. | |
5684 ## | |
5685 | |
5686 Test File Binary | |
5687 | |
5688 "Test file binary" is a ©configuration switchª which | |
5689 defaults to off. When it is off, and you select the | |
5690 ©File Traceª option, AnaGram will read your test files | |
5691 in "text" mode, discarding carriage return characters. | |
5692 When "test file binary" is on, AnaGram will read test | |
5693 files in "binary" mode, preserving carriage return | |
5694 characters. | |
5695 | |
5696 If your parser needs to recognize carriage return | |
5697 characters explicitly, you should turn "test file | |
5698 binary" on. | |
5699 ## | |
5700 | |
5701 Test File Mask | |
5702 | |
5703 "Test file mask" is a string-valued ©configuration | |
5704 parameterª which AnaGram uses to set up the file dialog | |
5705 for the ©File Traceª command. It defaults to "*.*". If | |
5706 there is a conventional file name format for the input | |
5707 to the ©parserª you are developing, you will probably | |
5708 want to set "test file mask" in a ©configuration | |
5709 sectionª in your ©syntax fileª so it is easier to pick | |
5710 out your test files. | |
5711 ## | |
5712 | |
5713 Test range | |
5714 | |
5715 "Test range" is a ©configuration switchª which defaults | |
5716 to on. When it is set, i.e., on, AnaGram will configure | |
5717 your parser so that it checks input characters to | |
5718 verify that they are within the range given by the | |
5719 ©character universeª before it indexes the ©token | |
5720 conversionª table. If range testing is not necessary | |
5721 for your parser, you may turn test range off and get a | |
5722 slight improvement in the performance of your parser. | |
5723 ## | |
5724 | |
5725 Thread Safe Parsers | |
5726 | |
5727 AnaGram 2.01 incorporates several changes designed to make it | |
5728 easier to write thread safe parsers. | |
5729 | |
5730 First, the ©parserªs generated by AnaGram 2.01 no longer use static or global | |
5731 variables to store temporary data. All nonconstant data have been | |
5732 moved to the ©parser control blockª. | |
5733 | |
5734 Second, two new features which make it substantially | |
5735 easier to build thread safe parsers have been added. The ©reentrant parserª switch | |
5736 makes the entire parser reentrant, by passing the pointer to the parser control | |
5737 block as an argument on all function calls. The ©extend pcbª statement allows | |
5738 you to add your own variable declarations to the ©parser control | |
5739 blockª so you can avoid references to global or static variables in | |
5740 your ©reduction procedureªs. | |
5741 | |
5742 Third, new support has been added for C++ classes, including | |
5743 the ©wrapperª statement and the ©PCB_TYPEª macro. | |
5744 ## | |
5745 | |
5746 token_number | |
5747 | |
5748 token_number is a field in a ©parser control blockª to | |
5749 which your ©error handlingª procedures and ©reduction | |
5750 procedureªs may refer. It contains the actual ©token | |
5751 numberª of the current input token. If you are supplying | |
5752 token numbers directly, it is the result of using the | |
5753 actual input character to index the ©token conversionª | |
5754 array, ag_tcv. | |
5755 ## | |
5756 | |
5757 Token | |
5758 | |
5759 Tokens are the units with which your parser works. | |
5760 There are two kinds of tokens: ©terminal tokensª and | |
5761 ©nonterminal tokensª. These latter are identified by the | |
5762 parser as sequences of tokens. The grouping of tokens | |
5763 into more complex tokens is governed by the ©grammar | |
5764 rulesª, or ©productionªs in your grammar. In your | |
5765 grammar, tokens are denoted by ©token nameªs, ©virtual | |
5766 productionsª, explicit ©character representationsª, | |
5767 ©keywordªs, ©immediate actionªs, or ©expressionªs which | |
5768 yield ©character setsª. | |
5769 ## | |
5770 | |
5771 Token Conversion | |
5772 | |
5773 By using ©character setª ©expressionªs, you may in your | |
5774 ©syntax fileª define a number of input characters as | |
5775 being syntactically equivalent. When your ©parserª gets | |
5776 an input character, it uses the character code to index | |
5777 a table called ©ag_tcvª. The value it extracts from this | |
5778 table is the ©token numberª for the input character. The | |
5779 actual character code of the input character becomes the | |
5780 ©token valueª. | |
5781 ## | |
5782 | |
5783 Token Declaration | |
5784 | |
5785 A token declaration is simply a ©productionª with no | |
5786 right hand side. Token declarations can be used to | |
5787 define the ©data typeªs of tokens. To define the data type | |
5788 of a token, simply put the data type in parentheses | |
5789 preceding the name of the token. You can use a list of | |
5790 tokens joined by commas, if you wish. Thus: | |
5791 (char *) variable name, function name | |
5792 could be used to specify that the ©semantic valueªs of | |
5793 the tokens "variable name" and "function name" are both | |
5794 character pointers. | |
5795 | |
5796 Of course, token types may be specified as part of any | |
5797 production the token generates, but sometimes, in the | |
5798 interest of clarity, it is advisable to group all | |
5799 declarations together. | |
5800 ## | |
5801 | |
5802 Token Name | |
5803 | |
5804 All ©nonterminal tokensª that you define in your | |
5805 ©grammarª by means of explicit ©productionªs must have | |
5806 names by which they may be referenced. Token names are | |
5807 ©symbolsª which represent the token syntactically in | |
5808 your grammar specification. | |
5809 ## | |
5810 | |
5811 Token Names | |
5812 | |
5813 "Token names" is a ©configuration switchª that defaults | |
5814 to off. If it is set, it causes AnaGram to include in | |
5815 the ©parser fileª a static array of character strings, indexed by | |
5816 token number, which provides ascii representations of token | |
5817 names. The name of this array is given by "<parser name>_token_names", | |
5818 where <parser name> is the name of the parser function as | |
5819 given by the value of the ©parser nameª parameter. | |
5820 | |
5821 AnaGram also defines a macro, ©TOKEN_NAMESª, which evaluates | |
5822 to the name of the array. | |
5823 | |
5824 The array contains strings for all grammar tokens which have | |
5825 been explicitly named in the syntax file as well as tokens | |
5826 which represent ©keywordªs or single character constants. | |
5827 | |
5828 The array is useful in creating ©syntax errorª diagnostics. | |
5829 | |
5830 Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained | |
5831 strings only for explicitly named tokens. If this restriction | |
5832 is required, set the ©token names onlyª switch. | |
5833 | |
5834 Token names are also included if the ©diagnose errorsª | |
5835 switch is set. | |
5836 ## | |
5837 | |
5838 TOKEN_NAMES | |
5839 | |
5840 "TOKEN_NAMES" is the name of a macro that AnaGram defines to | |
5841 provide access to a static array of character strings indexed by | |
5842 token number, which provides ascii representation of token | |
5843 names. The array is generated if any of the ©token namesª, | |
5844 ©token names onlyª or ©diagnose errorsª switches are ON. | |
5845 | |
5846 If ©token names onlyª is set, the array contains non-empty | |
5847 strings only for those tokens which are explicitly named | |
5848 in the syntax file. Otherwise, the array also contains | |
5849 strings for tokens which represent keywords or single | |
5850 character constants. | |
5851 ## | |
5852 | |
5853 | |
5854 token names only | |
5855 | |
5856 "Token names only" is a ©configuration switchª that defaults to | |
5857 off. If it is set, it will cause AnaGram to include in the | |
5858 parser file a static array containing the names of the tokens | |
5859 in your grammar. This array will include only those tokens | |
5860 to which you have assigned names explicitly and will not | |
5861 include character constants or keywords. "Token names only" | |
5862 takes precedence over ©token namesª. | |
5863 ## | |
5864 | |
5865 Token Not Used | |
5866 | |
5867 "Token not used, TXXX: <token name> is a ©warningª | |
5868 message which appears if AnaGram finds an unused ©tokenª | |
5869 in your ©grammarª. Often an unused token is the result | |
5870 of an oversight of some kind and indicates a problem in | |
5871 the grammar. | |
5872 ## | |
5873 | |
5874 Token Number | |
5875 | |
5876 AnaGram assigns a unique number, called the "token | |
5877 number" to each token in the grammar, no matter whether | |
5878 it is a ©terminal tokenª or a ©nonterminal tokenª. Your | |
5879 parser does all of its analysis of your input stream | |
5880 using token numbers as its primary material. | |
5881 | |
5882 You may need to know the values of token numbers that | |
5883 AnaGram has assigned, either so a lexical scanner can | |
5884 output correct token numbers, or so a ©reduction | |
5885 procedureª can correctly resolve a ©semantically | |
5886 determined productionª. | |
5887 | |
5888 To help you, AnaGram defines enumeration constants for | |
5889 each of the named tokens in your grammar. The definition | |
5890 of these constants is in the ©parser headerª file. | |
5891 ## | |
5892 | |
5893 Token Representation | |
5894 | |
5895 Not all of the ©tokensª in your grammar have a ©token | |
5896 nameª. Some of the tokens may represent ©character setsª | |
5897 which you spelled out explicitly, ©virtual productionsª, | |
5898 ©immediate actionªs, or ©keywordªs. In its analysis | |
5899 tables, AnaGram tries to provide a meaningful | |
5900 representation for tokens whenever it can. Its first | |
5901 choice is to use the name, if it has one. Otherwise it | |
5902 will use the set definition or the definition of the | |
5903 virtual production if one exists. If AnaGram cannot | |
5904 otherwise represent your token, it will resort to using | |
5905 the token number which it normally represents using the | |
5906 letter T followed by a three digit, zero-padded token | |
5907 number. | |
5908 ## | |
5909 | |
5910 Token Table | |
5911 | |
5912 The Token Table lists all the tokens of your grammar. | |
5913 The first field is the token number. It is followed by a | |
5914 flag field which is "zl" if the token is a ©nonterminal | |
5915 tokenª and is ©zero lengthª. If the token is nonterminal | |
5916 and not zero length, the flag field contains "nt". If | |
5917 the token is a ©terminal tokenª, the field is blank. | |
5918 | |
5919 The next field is blank unless the token has been | |
5920 declared ©stickyª or has had a ©precedenceª level | |
5921 assigned. If the token is sticky, this field will | |
5922 contain 's'. If a precedence level has been assigned, | |
5923 this field will contain the letter 'l', 'r', or 'n' to | |
5924 indicate associativity followed by the precedence | |
5925 level. Finally there is the ©data typeª of the ©semantic | |
5926 valueª of this token and the ©token representationª. | |
5927 ## | |
5928 | |
5929 Token Usage | |
5930 | |
5931 The Token Usage table may be accessed via the ©Auxiliary | |
5932 Windowsª menu from any window that identifies tokens. It | |
5933 shows all the rules in the grammar that use the token. | |
5934 ## | |
5935 | |
5936 Top Margin | |
5937 | |
5938 "Top margin" is an ©obsolete configuration parameterª. | |
5939 ## | |
5940 | |
5941 Trace Coverage | |
5942 | |
5943 Trace Coverage is a table which is built whenever you | |
5944 run ©Grammar Traceª, one of its pre-built versions, or a ©File | |
5945 Traceª. You can access it from the ©Browse Menuª. It shows the number | |
5946 of times each rule in your grammar has been reduced. Unless you have | |
5947 set the ©Rule Coverageª ©configuration switchª, some ©null productionªs | |
5948 and some rules that consist of only one element will not be counted | |
5949 because of speed optimizations in the parser tables. | |
5950 | |
5951 The Trace Coverage tables are reset to zero when you load a new syntax | |
5952 file or start AnaGram. | |
5953 ## | |
5954 | |
5955 Compound Action | |
5956 | |
5957 Traditionally, ©LALR-1 parserªs use only four simple | |
5958 ©parser actionªs: shift, reduce, accept and error. | |
5959 AnaGram parsers use a number of compound actions | |
5960 in order to reduce the size of parse tables and | |
5961 speed up processing. A single compound action | |
5962 may replace several simple shift or reduce actions. | |
5963 | |
5964 The ©Traditional Engineª ©configuration switchª may | |
5965 be used to force AnaGram to use only the simple | |
5966 actions. | |
5967 ## | |
5968 | |
5969 Traditional Engine | |
5970 | |
5971 "Traditional engine" is a ©configuration switchª that | |
5972 defaults to off. Traditional ©LALR-1 parserªs use a | |
5973 ©parsing engineª which has only four actions: | |
5974 ©shift actionª | |
5975 ©reduce actionª | |
5976 ©accept actionª | |
5977 ©error actionª | |
5978 | |
5979 | |
5980 AnaGram, in the interest of | |
5981 faster execution and more compact parse tables, | |
5982 uses a parsing engine with a number of | |
5983 short-cut, or ©compound actionªs. The "traditional engine" switch tells | |
5984 AnaGram not to use the short-cut actions. | |
5985 | |
5986 You would turn this switch on if you wished to use the ©Grammar Traceª | |
5987 or ©File Traceª to see how the standard four parser actions work for | |
5988 a particular combination of grammar and input. Note that to see the | |
5989 effects of single parser actions, you must use the ©Single Stepª | |
5990 button. Remember that in the Grammar Trace, when you single step and | |
5991 the token you have selected causes a reduce action, it will appear | |
5992 on the ©lookahead lineª of the ©parser stack paneª and will be preselected | |
5993 in the ©allowable input paneª until it is finally shifted in to | |
5994 the parser stack. | |
5995 | |
5996 Normally, you should leave the "traditional engine" switch off, Then | |
5997 AnaGram will, whenever possible, compress several parsing actions into | |
5998 one compound action in order to speed execution of the parser. | |
5999 | |
6000 Unfortunately use of the term "traditional" has sometimes created the | |
6001 impression that there is a conservative aspect to the operation of | |
6002 traditional engine parsers. This is not the case. They have the same | |
6003 effect, but are slower and have much larger tables. | |
6004 ## | |
6005 | |
6006 Type Redefinition | |
6007 | |
6008 "Type Redefinition of TXXX: <token name> is a ©warningª | |
6009 message which appears when AnaGram finds a conflicting | |
6010 ©data typeª definition for a ©tokenª in your ©grammarª. | |
6011 The new definition will override the previous one. If | |
6012 you intend to use different type definitions, you should | |
6013 use extreme caution and check the generated code to | |
6014 verify that your ©reduction procedureªs are getting the | |
6015 values you intended. | |
6016 ## | |
6017 | |
6018 Undefined Symbol | |
6019 | |
6020 "Undefined symbol: <name>" is a ©warningª message which | |
6021 appears when AnaGram encounters an undefined ©symbolª | |
6022 while evaluating a ©character setª expression. The | |
6023 following warning in the ©Warningsª window identifies | |
6024 the particular ©tokenª AnaGram was trying to evaluate. | |
6025 ## | |
6026 | |
6027 Undefined Token | |
6028 | |
6029 "Undefined token TXXX: <name>" is a ©warningª message | |
6030 which appears when the indicated ©tokenª has been used | |
6031 in the ©grammarª, but there is no definition of it as a | |
6032 ©terminal tokenª nor does any ©productionª define it as | |
6033 a ©nonterminal tokenª. | |
6034 ## | |
6035 | |
6036 Unexpected | |
6037 | |
6038 "Unexpected <element 1> in <element 2>" is a ©warningª | |
6039 message which you may get when AnaGram analyzes your | |
6040 grammar. It appears when AnaGram unexpectedly encounters an instance of | |
6041 syntactic element 1 at the specified location in an instance of | |
6042 syntactic element 2. AnaGram cannot reliably continue parsing its | |
6043 input. Therefore, it limits further analysis to scanning for syntax | |
6044 errors. If this error is not the result of a prior error, you should | |
6045 correct your ©syntax fileª. Remember that this error could result from | |
6046 something missing just as well as from something extraneous. | |
6047 | |
6048 If element 1 is ©eofª, it often means that you have | |
6049 an unbalanced brace or comment delimiter in the code | |
6050 following the indicated location. | |
6051 ## | |
6052 | |
6053 Union | |
6054 | |
6055 The union of two sets is the set of all elements that | |
6056 are to be found in one or another of the two sets. In an | |
6057 AnaGram syntax file the union of two ©character setsª A | |
6058 and B is represented using the plus sign, as in A + B. | |
6059 The union operator has the same precedence as the | |
6060 ©differenceª operator: lower than that of ©intersectionª | |
6061 and ©complementª. The union operator is ©left | |
6062 associativeª. | |
6063 | |
6064 Watch out! In an AnaGram syntax file 65 + 97 represents | |
6065 the character set which consists of the lower case 'a' | |
6066 and upper case 'A'. It does not represent 162, the sum | |
6067 of 65 and 97. | |
6068 ## | |
6069 | |
6070 Video mode | |
6071 | |
6072 "Video mode" is an ©obsolete configuration parameterª. | |
6073 ## | |
6074 | |
6075 Virtual Production | |
6076 | |
6077 Virtual productions are a special short hand | |
6078 representation of ©grammar rulesª which can be used to | |
6079 indicate a choice of inputs. They are an important | |
6080 convenience, especially useful when you are first | |
6081 building a grammar. | |
6082 | |
6083 Here are some examples of virtual productions: | |
6084 name? // optional name | |
6085 name?... // 0 or more instances of name | |
6086 {name | number} // exactly one name or number | |
6087 {name | number}... // one or more instances of name or number | |
6088 [name | number] // optional choice of name or number | |
6089 [name | number]... // zero or more instances of name or number | |
6090 | |
6091 AnaGram rewrites virtual productions, so that when you | |
6092 look at the syntax tables in AnaGram, there will be | |
6093 actual ©productionªs replacing the virtual productions. | |
6094 | |
6095 A virtual production appears as one of the rule | |
6096 elements in a grammar rule, i.e. as one of the members | |
6097 of the list on the right side of a production. | |
6098 | |
6099 The simplest virtual production is the "optional" | |
6100 token. If x is an arbitrary token, x? can be used to | |
6101 indicate an optional x. | |
6102 | |
6103 Related virtual productions are x... and x?... where | |
6104 the three dots indicate repetition. x... represents an | |
6105 arbitrary number of occurrences of x, but at least one. | |
6106 x?... represents zero or more occurrences of x. | |
6107 | |
6108 The remaining virtual productions use curly or square | |
6109 brackets to enclose a sequence of rules. The brackets | |
6110 may be followed variously by nothing, a string of three | |
6111 dots, or a slash, to indicate the choices to be made | |
6112 from the rules. Note that rules may be used, not merely | |
6113 tokens. | |
6114 | |
6115 If r1 through rn are a set of ©grammar rulesª, then | |
6116 {r1 | r2 | ... | rn} | |
6117 is a virtual production that allows a choice of exactly | |
6118 one of the rules. Similarly, | |
6119 {r1 | r2 | ... | rn}... | |
6120 is a virtual production that allows a choice of one or | |
6121 more of the rules. And, finally, | |
6122 {r1 | r2 | ... | rn}/... | |
6123 is a virtual production that allows a choice of one or | |
6124 more of the rules subject to the side condition that | |
6125 rules must alternate, that is, that no rule can follow | |
6126 itself immediately without the interposition of some | |
6127 other rule. This is a case that is not particularly | |
6128 easy to write by hand, but is quite useful in a number | |
6129 of contexts. | |
6130 | |
6131 If the above virtual productions are written with [] | |
6132 instead of {}, they all become optional. [] is an | |
6133 optional choice, []... is zero or more choices, and | |
6134 []/... is zero or more alternating choices. | |
6135 | |
6136 Null productions are not permitted in virtual | |
6137 productions in those cases where they would cause an | |
6138 intrinsic ambiguity. | |
6139 | |
6140 You may use a ©definitionª statement to assign a name to | |
6141 a virtual production. | |
6142 ## | |
6143 | |
6144 Void token | |
6145 | |
6146 "Void token, <token name>, used as parameter" is a | |
6147 ©warningª message which appears if AnaGram encounters a | |
6148 ©data typeª definition declaring a ©tokenª to have type | |
6149 void when the token has previously been used in a | |
6150 ©parameter assignmentª for a ©reduction procedureª. Your | |
6151 C or C++ compiler will complain when it tries to compile | |
6152 the call to the reduction procedure. | |
6153 ## | |
6154 | |
6155 vs | |
6156 | |
6157 vs is a field in a ©parser control blockª to which your | |
6158 ©error handlingª procedures and ©reduction procedureªs | |
6159 may refer. It is the ©parser value stackª for your | |
6160 parser. The ©semantic valuesª of the ©tokensª identified | |
6161 by the parser are stored in the value stack. The value | |
6162 stack, like the other ©parser stacksª, is indexed by | |
6163 ©PCBª.©ssxª. When you are executing a reduction | |
6164 procedure, PCB.vs[PCB.ssx] contains the semantic value | |
6165 of the first token in the grammar rule you are reducing, | |
6166 PCB.vs[PCB.ssx+1] contains the second, and so forth. The | |
6167 return value from your reduction procedure will be | |
6168 stored in turn in PCB.vs[PCB.ssx]. | |
6169 | |
6170 vs is defined to be of type $_vt, where "$" represents | |
6171 the name of your parser. AnaGram defines $_vt to | |
6172 be a union of fields of sizes corresponding to all the | |
6173 different data types declared in your syntax for the | |
6174 semantic values of your tokens. In order to avoid | |
6175 restrictions on the use of C++ classes, the fields are | |
6176 defined as character arrays. On some processors which | |
6177 have byte alignment restrictions for multibyte data, | |
6178 you might encounter a bus error. To correct this | |
6179 problem, set the ©parser stack alignmentª parameter to | |
6180 an appropriate data type. | |
6181 ## | |
6182 | |
6183 Warning | |
6184 | |
6185 If while analyzing your syntax file, AnaGram finds | |
6186 something suspicious, it is likely to issue a warning. | |
6187 The Warnings window will pop up automatically when the | |
6188 analysis has been completed. If the warning is for a | |
6189 ©syntax errorª in your input file, you will have to fix | |
6190 it, because AnaGram cannot successfully interpret it. | |
6191 Otherwise, AnaGram will be able to create a ©parserª for | |
6192 you, if you wish, no matter how serious the warnings may | |
6193 be. | |
6194 | |
6195 You can bring up the Help topic associated with a highlighted warning | |
6196 by pressing F1 or by clicking with a ©Help Cursorª. | |
6197 | |
6198 If you have syntax errors, AnaGram will synchronize the | |
6199 cursor in the ©syntax fileª window with the cursor in the | |
6200 Warnings window so that whenever the Warnings window is | |
6201 active, the cursor bar in the syntax file window will | |
6202 identify the location of the error. | |
6203 | |
6204 ## | |
6205 | |
6206 What's New | |
6207 | |
6208 Changes in AnaGram 2.40 | |
6209 | |
6210 Most of the changes in AnaGram 2.40 are under the hood - cleanup of | |
6211 source files, reorganization of the source tree, revision of build and | |
6212 test procedures, and so forth, in preparation for the open source | |
6213 release. All of this will, with luck, be invisible to the end user. | |
6214 | |
6215 Open Source | |
6216 | |
6217 AnaGram is now ©open sourceª. AnaGram itself | |
6218 uses the 4-clause BSD ©licenseª; the ©parsing engineª, and thus the output | |
6219 files, are licensed with the less restrictive zlib ©licenseª. Source | |
6220 distributions are available from http://www.parsifalsoft.com. | |
6221 | |
6222 The manual has been re-typeset using LaTeX instead of WordPerfect. | |
6223 The typographic consistency and formatting has been considerably | |
6224 improved; unfortunately, the pagination is now completely different, | |
6225 so page numbers are not portable to the new version. | |
6226 | |
6227 All the logic dealing with registration, trial copies, serial numbers, | |
6228 and so forth has been removed. | |
6229 | |
6230 Unix Support | |
6231 | |
6232 The Unix build of the ©command line versionª of AnaGram (agcl) is now | |
6233 supported and available to the public. There is at present no GUI for | |
6234 the Unix version. The long-term goal is to migrate the AnaGram GUI | |
6235 away from the closed (and orphaned) IBM Visual Age class library to | |
6236 something else, probably GTK, so as to support both Windows and Unix. | |
6237 | |
6238 Improved Functionality | |
6239 | |
6240 Examples. The examples have been adjusted to the current dialect of | |
6241 C++ and are now compilable again. The legacy "classlib" code some | |
6242 still depend on is being phased out. | |
6243 | |
6244 Increased Convenience | |
6245 | |
6246 File names. File names in the AnaGram distribution and source | |
6247 tree are no longer limited to 8+3 characters, and quite a few now have | |
6248 less cryptic names. Additionally, all HTML files are now named ".html", | |
6249 not ".htm". | |
6250 | |
6251 Installed files. The AnaGram.cgb and AnaGram.hlp files found in | |
6252 older releases of AnaGram no longer exist; their contents are compiled | |
6253 into the AnaGram executables instead. | |
6254 | |
6255 Bug Fixes | |
6256 | |
6257 Engine compiler error. The ©error_messageª field of the PCB has | |
6258 been changed to const char * so current C++ compilers will accept the | |
6259 code generated when ©diagnose errorsª is turned off. | |
6260 | |
6261 Multiple output header files. Including more than one AnaGram | |
6262 output header file at once used to cause some compilers to issue a | |
6263 warning, because an #ifndef directive was checking the wrong | |
6264 symbol. This has been corrected. | |
6265 | |
6266 Wrappers and error tokens. AnaGram 2.01 generated uncompilable | |
6267 code if you tried to use the ©wrapperª feature and error token | |
6268 resynchronization at the same time. This has been corrected. | |
6269 | |
6270 More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain | |
6271 problems with large keyword tables, but in the process introduced | |
6272 another, which is now fixed. | |
6273 | |
6274 For changes in the previous versions of AnaGram, see ©What's New in AnaGram | |
6275 2.01ª and ©What's New in AnaGram 2.0ª. | |
6276 | |
6277 ## | |
6278 | |
6279 What's New in AnaGram 2.01 | |
6280 | |
6281 Changes in AnaGram 2.01 | |
6282 | |
6283 Improved Functionality | |
6284 | |
6285 Improved support for building ©thread safe parsersª. All | |
6286 nonconstant parser data previously declared as static variables has been | |
6287 moved to the ©parser control blockª. When the ©reentrant parserª switch | |
6288 is set, all references to the parser control block are passed to functions | |
6289 via calling sequences. The ©extend pcbª switch provides a mechanism to | |
6290 add user-defined variables to the parser control block. | |
6291 | |
6292 Improved support for C++ parsers. The ©wrapperª statement | |
6293 provides C++ wrapper classes for objects to be stored on the ©parser value stackª. | |
6294 The ©PCB_TYPEª macro allows you to derive a C++ class from the parser control | |
6295 block and to access its members from your ©reduction proceduresª. | |
6296 | |
6297 Support for the ©ISO Latin 1ª character set. When using | |
6298 the ©case sensitiveª switch, case conversion is performed for all ISO-Latin-1 | |
6299 characters, not just those in the ASCII range. | |
6300 | |
6301 Improved support for error diagnostics. It is now possible for users | |
6302 to provide their own text for the error messages created by the ©diagnose errorsª | |
6303 switch. In addition, the ©token namesª table option now includes ascii representation | |
6304 of individual characters and keywords instead of only named tokens. The ©token names | |
6305 onlyª switch can be used for compatibility with previous versions of AnaGram | |
6306 | |
6307 More precise determination of error context. The tables used by the ©error frameª | |
6308 option to provide the context of a syntax error have been reworked and now provide | |
6309 a substantially more precise localization of the error. | |
6310 | |
6311 Improved error diagnostics in AnaGram | |
6312 | |
6313 ©Missing reduction procedureª diagnostic. | |
6314 In addition to warning that there is a ©parameter assignmentª | |
6315 without a ©reduction procedureª, this | |
6316 diagnostic is now provided if the ©default reduction valueª | |
6317 does not have the same ©data typeª as the ©reduction tokenª. | |
6318 | |
6319 ©Command line versionª. Diagnostics have been reformatted so | |
6320 they can be recognized by the Microsoft Visual C++ IDE. | |
6321 | |
6322 Refined ©keyword anomalyª diagnostics. There should | |
6323 now be fewer false alarms. | |
6324 | |
6325 Increased Convenience | |
6326 | |
6327 ©File Traceª. If your grammar uses ©semantically determined productionsª, | |
6328 the File Trace feature will now remember the choices you have | |
6329 made for ©reduction tokenªs, so that you do not have to make | |
6330 the same choices over and over again as you work with an example. | |
6331 | |
6332 File Paths. The file paths in the #line directives created by the ©line numbersª | |
6333 switch now use forward slashes instead of backslashes. | |
6334 | |
6335 Changed Defaults | |
6336 | |
6337 ©Parser stack alignmentª. Now defaults to long instead of int. | |
6338 ©Parser stack sizeª. Now defaults to 128 instead of 32. | |
6339 | |
6340 Bug Fixes | |
6341 | |
6342 Interaction between context tracking and error token. In previous | |
6343 versions of AnaGram, if the first token in a rule was the ©error tokenª, | |
6344 the value of ©CONTEXTª was the value that corresponded to the location | |
6345 of the error. CONTEXT now correctly shows the context at which the | |
6346 aborted rule began. For instance, in the following example, if a | |
6347 syntax error is encountered while parsing the expression, the error | |
6348 rule will skip over remaining characters to the terminating semicolon. | |
6349 When invoked from handleError(), the CONTEXT macro will return the | |
6350 context as it was at the beginning of the expression. | |
6351 expression statement | |
6352 -> expression, ';' | |
6353 -> error, ~(eof + ';')?..., ';' =handleError(); | |
6354 | |
6355 ©Distinguish lexemesª. Several minor bugs in the implementation of distinguish lexemes have been | |
6356 corrected. | |
6357 | |
6358 Set partition logic. Corrected problems in the interaction between the set ©partitionª logic | |
6359 and the implementation of the ©disregardª statement. | |
6360 | |
6361 Table size. Fixed a data sizing problem which occurred when one particular parse table | |
6362 had precisely 256 entries. | |
6363 | |
6364 Keyword recognition. Fixed a problem that could cause difficulties with ©keywordª | |
6365 recognition when the ©case sensitiveª switch was turned off. | |
6366 | |
6367 Default conflict resolution. With unresolved ©shift-reduce conflictªs, the shift case was | |
6368 not always being selected. This problem has been corrected. | |
6369 | |
6370 Lockup. It was possible to write an erroneous grammar that would cause | |
6371 AnaGram to lock up. This problem has been corrected. | |
6372 | |
6373 Potential bus error. The error diagnostic funtion created by the ©diagnose errorsª | |
6374 switch, could, under some circumstances, access an uninitialized value | |
6375 on the ©parser value stackª. This problem has been corrected. | |
6376 | |
6377 Internal errors. Fixed a number of minor bugs which could cause ©internal errorªs | |
6378 while running ©File Traceª. | |
6379 | |
6380 For changes in the previous version of AnaGram, see ©What's New in AnaGram 2.0ª. | |
6381 ## | |
6382 | |
6383 What's New in AnaGram 2.0 | |
6384 | |
6385 AnaGram's user interface has been completely revamped to make it more | |
6386 convenient and easier to use. However, the same tried and true AnaGram | |
6387 algorithms are still in place to build your parsers. The rules for | |
6388 syntax files are also unchanged. | |
6389 | |
6390 The ©File Traceª and ©Grammar Traceª facilities have each had their | |
6391 windows combined into a single unit, and a ©Rule Stackª synched with | |
6392 these windows and with your syntax file window has been added. The | |
6393 Rule Stack is particularly convenient for relating the progress of the | |
6394 parse to the ©grammar rulesª in your ©syntax fileª. | |
6395 | |
6396 A ©text entryª field has also been added to the Grammar Trace. This | |
6397 means you can provide character input to your parser in much the same | |
6398 way you can with a ©test fileª in File Trace, but with instant control | |
6399 over the input. | |
6400 | |
6401 Some further controls have been added to both File and Grammar Traces. | |
6402 In particular there is a Reset button to reset the trace to its initial | |
6403 state. This is particularly useful for ©Conflict Traceªs. | |
6404 | |
6405 AnaGram now has a small ©Control Panelª (default position is at the | |
6406 upper right of the screen) from which you can conveniently control | |
6407 operation. A menu bar provides access to the various commands and | |
6408 tables. There are toolbar buttons for Analyze Grammar, Build Parser, | |
6409 File Trace, and so on. The panel also has a data entry field for | |
6410 entering search keys. | |
6411 | |
6412 You can set both colors and fonts in AnaGram windows to suit your own | |
6413 preferences. We suggest you check Help for ©Colorsª or ©Fontsª before | |
6414 making changes to make sure that all information will still be properly | |
6415 displayed. | |
6416 | |
6417 AnaGram's ©Helpª has been updated to provide hypertext-type links. But | |
6418 you can still keep multiple Help windows on view at once. A popup menu | |
6419 shows all the links in a window. New topics have been added. Also, | |
6420 further documentation topics are provided in HTML format in the html | |
6421 subdirectory. | |
6422 | |
6423 A ©Help Cursorª on the Control Panel toolbar can be used to get help for | |
6424 most AnaGram windows, buttons and menu items. F1 can also be used. | |
6425 | |
6426 On the ©Action Menuª you will find a list of your most recently used | |
6427 syntax files. Just click on the file of your choice to have AnaGram | |
6428 analyze it (or build it if ©Autobuildª is on). | |
6429 ## | |
6430 | |
6431 White Space | |
6432 | |
6433 In many grammars it is desirable to pass over blanks, | |
6434 tabs, and similar characters, as well as comments, | |
6435 collectively termed "white space", as though they were | |
6436 not there. The "©disregardª" statement in AnaGram may | |
6437 be optionally used to accomplish this. The "©lexemeª" | |
6438 statement may be used to exercise fine control over the | |
6439 scope of the disregard statement. | |
6440 ## | |
6441 | |
6442 Wrapper | |
6443 | |
6444 The wrapper ©attribute statementª provides correct handling of C++ | |
6445 objects returned by ©reduction procedureªs. | |
6446 | |
6447 If you specify a wrapper for a C++ object, then, when a reduction | |
6448 procedure returns an instance of the object, a copy of the object will | |
6449 be constructed on the ©parser value stackª and the destructor will be | |
6450 called when the object is removed from the stack. | |
6451 | |
6452 Without a wrapper, objects are stored on the value stack simply | |
6453 by coercing the stack pointer to the appropriate type. | |
6454 There is no constructor call when the object is stored nor | |
6455 a destructor call when it is removed from the stack. | |
6456 | |
6457 Classes which use reference counts or otherwise overload the | |
6458 assignment operator should always have wrappers in order to | |
6459 function correctly. | |
6460 | |
6461 Wrapper statements, like other ©attribute statementsª, must appear in | |
6462 configuration sections. The syntax is simply | |
6463 wrapper { <comma delimited list of data types> } | |
6464 | |
6465 For example: | |
6466 [ | |
6467 wrapper {CString, CFont} | |
6468 ] | |
6469 | |
6470 You cannot specify a wrapper for the ©default token typeª. | |
6471 | |
6472 If your parser exits with an error condition, there may be | |
6473 objects remaining on the stack. The ©DELETE_WRAPPERSª macro | |
6474 may be used to delete these objects. If you have enabled | |
6475 ©auto resynchª, DELETE_WRAPPERS will be invoked automatically. | |
6476 | |
6477 The ©AG_PLACEMENT_DELETE_REQUIREDª macro is used to control | |
6478 definition of a "placement delete" operator in the wrapper | |
6479 class AnaGram defines. | |
6480 ## | |
6481 | |
6482 Zero Length | |
6483 | |
6484 A zero length ©tokenª is a ©reduction tokenª which can | |
6485 be matched by a void, i.e. by nothing at all. It | |
6486 represents an optional item, or a sequence of optional | |
6487 items, in the input. Since the matching process can | |
6488 involve several levels of reductions, it is most precise | |
6489 to use the following recursive definition: A zero length | |
6490 token is one which either has at least one ©null | |
6491 productionª or has at least one grammar rule defining it | |
6492 such that all the tokens in the rule are zero length | |
6493 tokens. | |
6494 | |
6495 Care should be taken when using ©zero lengthª tokens in | |
6496 ©recursive ruleªs. If all the tokens in the rule other than | |
6497 the recursive token itself are zero length tokens | |
6498 the rule will generate an infinite loop in the generated | |
6499 parser. | |
6500 | |
6501 The ©Token Tableª identifies zero length tokens because | |
6502 the use of such tokens sometimes inadvertently causes | |
6503 ©conflictªs. | |
6504 ## | |
6505 | |
6506 Control Panel | |
6507 | |
6508 The AnaGram Control Panel appears at the upper right of your monitor | |
6509 when you start AnaGram. It has a menu bar, command buttons, a button | |
6510 which enables a ©help cursorª, and a ©status indicatorª. At the lower | |
6511 left you will see a data entry field for entering ©searchª | |
6512 keys, with neighboring search forward and search backward buttons. | |
6513 | |
6514 Notice that the ©Options Menuª has a "Stay On Top" entry which | |
6515 allows you to specify whether the Control Panel stays on top of | |
6516 other AnaGram windows. | |
6517 ## | |
6518 | |
6519 Status Indicator | |
6520 | |
6521 The status indicator at the right of the AnaGram | |
6522 Control Panel shows the status of the ©current grammarª: | |
6523 Ready | |
6524 Loaded | |
6525 Error | |
6526 Parsed | |
6527 Analyzed | |
6528 Built | |
6529 | |
6530 "Ready" appears only when no grammar has been selected. | |
6531 | |
6532 "Loaded" and "Parsed" are normally transitory. | |
6533 | |
6534 "Error" means at least one syntax error has been detected | |
6535 in your grammar and AnaGram cannot continue. Check the | |
6536 Warnings window to determine the nature of the problem. | |
6537 | |
6538 "Analyzed" means that a ©grammar analysisª has been | |
6539 completed, but no ©output filesª have been written. | |
6540 | |
6541 "Built" means that an analysis has been completed and | |
6542 output files have been written. | |
6543 ## | |
6544 | |
6545 Help Cursor | |
6546 | |
6547 The Help Cursor is accessed via the button with the question mark on | |
6548 AnaGram's ©Control Panelª. It is convenient for getting help on | |
6549 ©Warningªs, browse tables, menu items and so on. | |
6550 | |
6551 If you click on the button you enable the Help Cursor, which you can | |
6552 then drag with the mouse. A further mouse click will provide help | |
6553 for the item underneath the cursor. | |
6554 | |
6555 Note further that AnaGram also has F1 help which you may find | |
6556 simpler and faster than the Help Cursor. | |
6557 ## | |
6558 | |
6559 Search | |
6560 | |
6561 AnaGram has a simple search facility to let you search for text strings | |
6562 in AnaGram windows. A data entry field on the ©Control Panelª is | |
6563 provided for you to enter text. Left-clicking on the neighboring | |
6564 buttons lets you search either forward or backward for a line in the | |
6565 active window which contains at least one instance of the text. | |
6566 | |
6567 Note that the search begins at the next line after the highlighted line | |
6568 for forward search; at the line preceding the highlighted line for | |
6569 backward search. | |
6570 ## | |
6571 | |
6572 Search Key | |
6573 | |
6574 To find a text string in an AnaGram window, enter the | |
6575 string in the Search Key field in the ©Control Panelª | |
6576 and press Enter. | |
6577 | |
6578 To find another instance of the string click on the | |
6579 ©Find Nextª button or press F3. | |
6580 | |
6581 To find a previous instance of the string click on | |
6582 the ©Find Previousª button or press F4. | |
6583 | |
6584 In windows that have a cursor bar, a forward search | |
6585 begins on the line following the cursor and a backward | |
6586 search begins on the line preceding the cursor. | |
6587 ## | |
6588 | |
6589 Find Next | |
6590 | |
6591 The Find Next key, on the ©Control Panelª immediately | |
6592 to the right of the ©Search Keyª field, locates | |
6593 the next instance of the search key in the most recently | |
6594 active AnaGram window. F3 is the keyboard equivalent. | |
6595 ## | |
6596 | |
6597 Find Previous | |
6598 | |
6599 The Find Previous key, on the ©Control Panelª immediately | |
6600 to the right of the ©Find Nextª key, searches | |
6601 backwards for the search key in the most recently | |
6602 active AnaGram window. F4 is the keyboard equivalent. | |
6603 ## | |
6604 | |
6605 Fonts, Set Fonts | |
6606 | |
6607 The Set Fonts dialog allows you to use the fonts of your choice in | |
6608 AnaGram windows. You should make sure that the ©marked tokenªs font is | |
6609 very distinctive so that marked tokens will show up clearly even if | |
6610 they are only 1 or 2 characters long. Sometimes it is helpful to use an | |
6611 underlined font for marked tokens. | |
6612 | |
6613 A Default button at the bottom of the dialog lets you revert to | |
6614 AnaGram's original fonts if you wish. | |
6615 ## | |
6616 | |
6617 Colors, Set Colors | |
6618 | |
6619 The Set Colors dialog allows you change the colors of | |
6620 AnaGram windows. Notice that in the ©File Traceª the ©test file paneª | |
6621 requires three different sets of text and background colors. You | |
6622 should make sure that the backgrounds, at least, can be easily | |
6623 distinguished from each other so the trace information can be | |
6624 properly displayed. You also want to take care that an active pane in | |
6625 a File Trace or Grammar Trace can be distinguished from inactive | |
6626 panes. | |
6627 | |
6628 The Default button at the bottom of the dialog lets you revert to | |
6629 AnaGram's original colors if you wish. | |
6630 | |
6631 Color changes pertain only to the client areas of AnaGram windows. The | |
6632 remaining parts of your windows will have the customary colors you have | |
6633 chosen for your system. | |
6634 ## | |
6635 | |
6636 Marked Token | |
6637 | |
6638 Some tables and trace panes display each rule with one token marked to | |
6639 show how far parsing has progressed in the rule. The marked token is | |
6640 the next input expected in the input stream. It is shown in a different | |
6641 font to distinguish it from other tokens in the rule. If no token is | |
6642 marked, the rule is a ©completed ruleª, i.e. it has been completely | |
6643 matched and will be reduced by the next input. | |
6644 | |
6645 You can set the font for marked tokens by choosing Fonts from the | |
6646 ©Options Menuª. You should make sure that the font is very distinctive so | |
6647 that marked tokens will show up clearly even if they are only 1 or 2 | |
6648 characters long. Sometimes it is helpful to use an underlined font for | |
6649 marked tokens. | |
6650 ## | |
6651 | |
6652 Synch Parse | |
6653 | |
6654 The Synch Parse button replaces the ©Single Stepª button on the | |
6655 toolbar of the ©File Trace windowª when, for some reason, the | |
6656 location of the blinking cursor in the ©test file paneª differs from | |
6657 the current parse position. This can occur when you single click in | |
6658 the test file pane or when the parse cannot track the cursor because | |
6659 of a ©syntax errorª or a ©semantically determined productionª. | |
6660 | |
6661 Click the synch parse button to resynch the parse with the cursor. | |
6662 ## | |
6663 | |
6664 | |
6665 Single Step | |
6666 | |
6667 The Single Step button is one of the control buttons for the ©File | |
6668 Traceª and ©Grammar Traceª. It advances the parse one ©parser | |
6669 actionª at a time. In the File Trace, it is replaced with the "©Synch | |
6670 Parseª" button whenever the blinking cursor loses synch with | |
6671 the current parse location. | |
6672 | |
6673 In the Grammar Trace, the Single Step button takes its input from the | |
6674 Allowable Input pane, the Reduction Choices pane, or the ©text entryª | |
6675 field, depending on which is active. | |
6676 ## | |
6677 | |
6678 Proceed | |
6679 | |
6680 The Proceed button is one of the control buttons for the | |
6681 ©Grammar Traceª. If the ©Reduction Choices paneª or the ©Allowable | |
6682 Input paneª is active, Proceed parses the highlighted token | |
6683 until it is shifted in to the ©parser stackª. If the ©text entryª | |
6684 field is active, Proceed parses all text in the field. If a | |
6685 ©syntax errorª is encountered, the parse stops and all ©reduce | |
6686 actionªs are undone. | |
6687 | |
6688 Note that selecting a token in Allowable Input can cause a syntax | |
6689 error under certain circumstances. This can happen only if the | |
6690 following conditions are all true: | |
6691 the indicated operation is a ©reductionª, | |
6692 the reduction token for the rule being reduced has been used in several | |
6693 different contexts in the grammar | |
6694 and the specified token may | |
6695 follow it in some contexts and not in others. | |
6696 ## | |
6697 | |
6698 Reduction Choices Pane | |
6699 | |
6700 The ©File Traceª and ©Grammar Traceª display a Reduction Choices | |
6701 pane when they need to reduce a ©semantically determined productionª. | |
6702 | |
6703 The rule to be reduced is highlighted in the ©rule stack paneª. | |
6704 If the ©syntax fileª window is visible, it shows the rule in | |
6705 context in your grammar. | |
6706 | |
6707 The Reduction Choices pane lists all possible ©reduction tokenªs for | |
6708 the specified rule. The first reduction token that is admissible in | |
6709 the current context is highlighted and it appears | |
6710 as the ©lookahead tokenª in the ©parser stack paneª. The text that | |
6711 comprises the entire rule is highlighted in the ©test file paneª. | |
6712 | |
6713 Select the desired reduction token before continuing with the parse. | |
6714 | |
6715 If you select a token and it does not appear as the lookahead token, | |
6716 it is not syntactically correct in the current context. If you try | |
6717 to proceed with the parse, you will get a ©selection errorª. | |
6718 ## | |
6719 | |
6720 Selection Error | |
6721 | |
6722 The ©Parse Statusª field indicates a "selection error" if you | |
6723 choose a ©reduction tokenª from the ©Reduction Choices paneª of | |
6724 a ©File Traceª or ©Grammar Traceª and the selected token is not | |
6725 syntactically correct in the current context. | |
6726 ## | |
6727 | |
6728 Parser Stack Pane | |
6729 | |
6730 The Parser Stack pane, the upper left pane of the ©File Traceª and | |
6731 ©Grammar Traceª windows, displays the ©parser stackª for the current | |
6732 trace. | |
6733 | |
6734 Each line corresponds to one level in the parser state stack. It shows | |
6735 the stack index, the ©parser stateª for that level, and the ©tokenª which | |
6736 was seen at that state. The last line of the stack, the ©lookahead | |
6737 lineª, corresponds to the current state of the parser. Since no input | |
6738 has yet been processed for this state, the token, if any, which | |
6739 appears at this level is a ©lookahead tokenª. | |
6740 | |
6741 If you move the cursor in the Parser Stack pane of a File Trace, | |
6742 the text that makes up the selected token will be | |
6743 highlighted in the ©Test File paneª. You can back the parse up to | |
6744 any desired stack level by double clicking at the beginning of the | |
6745 token text in the Test File pane. | |
6746 | |
6747 Similarly, if you move the cursor bar in the Parser Stack pane of a | |
6748 Grammar Trace, the ©Allowable Input paneª will change to display the | |
6749 allowable tokens in the selected state. The previously | |
6750 selected token will be highlighted. Then, double click on any token in | |
6751 the Allowable Input pane to back the parse up and choose a token | |
6752 a second time. | |
6753 | |
6754 The ©Rule Stack paneª of the File or Grammar Trace is also synched | |
6755 to the Parser Stack pane. If the ©syntax fileª window is visible, it | |
6756 will be synched to show the rule currently selected in the rule | |
6757 stack pane. Note that rules that have been automatically generated | |
6758 by the expansion of ©virtual productionsª cannot be synched, so the | |
6759 top line of the syntax file will be highlighted instead. | |
6760 | |
6761 In the Grammar Trace, the last line of the Parser Stack may or may not | |
6762 display a ©lookahead tokenª, depending on the last ©parser actionª | |
6763 performed. If input was taken from Allowable Input and the last | |
6764 action was a simple ©reduce actionª, the last input token selected | |
6765 will be displayed as the lookahead input. But if the last action | |
6766 performed shifted the token in, the lookahead field will be empty. | |
6767 | |
6768 If you right-click on a highlighted line in the Parser Stack pane, you will | |
6769 get a pop-up menu to give you more information. In particular you can | |
6770 get an ©Auxiliary Traceª starting at the current point in your File or | |
6771 Grammar Trace, so you can explore various possibilities without losing | |
6772 your position in the old trace. | |
6773 ## | |
6774 | |
6775 Exit | |
6776 | |
6777 Select this entry from the ©Action Menuª to terminate AnaGram. | |
6778 ## | |
6779 | |
6780 Allowable Input, Allowable Input Pane | |
6781 | |
6782 The upper right pane of the ©Grammar Traceª window lists the | |
6783 allowable input tokens for the current state of the ©grammarª. | |
6784 | |
6785 The tokens in the Allowable Input pane are listed in two groups: | |
6786 first, the ©terminal tokensª allowable in this state, and | |
6787 second, the ©nonterminal tokensª. Between these two groups of tokens | |
6788 is inserted a line which is either an option for a ©default reductionª, | |
6789 or declares that there is no default action. | |
6790 | |
6791 Double click, press Enter, or click the ©Proceedª button to | |
6792 parse the highlighted token. When all parse actions triggered | |
6793 by the highlighted token have been completed, all panes of the trace | |
6794 will be redrawn to show the new state of the parser. | |
6795 | |
6796 Note that selecting a token in Allowable Input can cause a syntax | |
6797 error under certain circumstances. This can happen only if the | |
6798 following conditions are all true: | |
6799 the indicated operation is a ©reductionª, | |
6800 the reduction token for the rule being reduced has been used in several | |
6801 different contexts in the grammar | |
6802 and the specified token may | |
6803 follow it in some contexts and not in others. | |
6804 | |
6805 If you wish to see the results of a single parser action, click | |
6806 on the ©single stepª button. The parser will perform a single | |
6807 parser action. If the | |
6808 token you selected was not shifted in, it will now be displayed | |
6809 as the ©lookahead tokenª on the last line, the ©lookahead lineª in | |
6810 the ©Parser Stack paneª, and will be preselected in the Allowable | |
6811 Input pane. | |
6812 | |
6813 Because AnaGram, by default, uses a number of compound | |
6814 parser actions, this situation does not arise very often unless you | |
6815 have set the ©traditional engineª switch or reset the ©default | |
6816 reductionsª switch. Usually you will want to select the same token to | |
6817 proceed, but it is not necessary. | |
6818 | |
6819 The Allowable Input pane also displays | |
6820 the ©parser actionª associated with a specific token. If it is | |
6821 not a ©compound actionª, the action and its result are also shown. | |
6822 | |
6823 The ©parser actionª field for a token may be interpreted as follows: If | |
6824 this token would cause a shift to a new state, the action field is ">>" | |
6825 followed by the new state number. If the token would cause a | |
6826 ©reductionª, the action field is "<<" followed by a ©rule numberª to | |
6827 show the rule reduced. If the parser action is a compound action, the | |
6828 action field is blank. If the token would cause the grammar to be | |
6829 accepted, the action field is "Accept". | |
6830 | |
6831 | |
6832 The ©text entryª field at the bottom of the Grammar Trace can be | |
6833 used as a convenient alternative to the Allowable Input pane. It | |
6834 accepts characters rather than tokens. Most non-printing characters | |
6835 such as newline are only available from Allowable Input. | |
6836 ## | |
6837 | |
6838 Copy | |
6839 | |
6840 The Copy command on the ©Windows Menuª copies the currently active | |
6841 table or Help topic to the clipboard. | |
6842 ## | |
6843 | |
6844 Statistical Summary | |
6845 | |
6846 While your grammar is being analyzed, a Statistical Summary window | |
6847 pops up to show you the progress of the analysis. Unless you have | |
6848 turned off ©Show Statisticsª on the ©Options Menuª, this window will remain | |
6849 on-screen for your reference. Among other things, it shows you the | |
6850 number of rules and states in your grammar, and the number of conflicts | |
6851 and warnings, if any. | |
6852 | |
6853 Note that if your grammar is small and you have Show Statistics turned | |
6854 off, the appearance of this window on your monitor may be exceedingly | |
6855 brief - you may just see a flash. | |
6856 | |
6857 If the window is turned off or you have closed it, you can get it from | |
6858 the ©Browse Menuª. | |
6859 ## | |
6860 | |
6861 Stay On Top | |
6862 | |
6863 The Stay On Top entry in the ©Options Menuª allows you to specify whether | |
6864 the ©Control Panelª stays on top of other AnaGram windows. | |
6865 ## | |
6866 | |
6867 Show Syntax | |
6868 | |
6869 If this entry in the ©Options Menuª is checked, AnaGram will display the | |
6870 ©syntax fileª when it has analyzed your ©grammarª. If this entry is not checked | |
6871 or you have closed the syntax file window, you can select the window | |
6872 from the ©Browse Menuª. | |
6873 ## | |
6874 | |
6875 Show Statistics | |
6876 | |
6877 If this entry in the ©Options Menuª is checked, AnaGram will leave the | |
6878 ©Statistical Summaryª on the screen after it has analyzed your ©grammarª. If | |
6879 this entry is not checked or you have closed the Statistical Summary | |
6880 window, you can select the window from the ©Browse Menuª. | |
6881 ## | |
6882 | |
6883 About AnaGram | |
6884 | |
6885 Select this entry from the ©Help Menuª to find out the version and | |
6886 serial numbers of your copy of AnaGram, and how to contact Parsifal | |
6887 Software. | |
6888 ## | |
6889 | |
6890 Help Topics | |
6891 | |
6892 Select Help Topics from the ©Help Menuª to get a complete list of AnaGram | |
6893 Help Topics titles. You can bring up the window for a highlighted topic | |
6894 by double-clicking with the left mouse button, pressing F1, or using | |
6895 the ©Help Cursorª. | |
6896 ## | |
6897 | |
6898 Cascade Windows | |
6899 | |
6900 Select this entry from the ©Windows Menuª to cascade your open windows | |
6901 starting at top left of the screen. | |
6902 ## | |
6903 | |
6904 Close Windows | |
6905 | |
6906 Select this entry from the ©Windows Menuª to close all open windows | |
6907 except the ©Control Panelª. You may also close the active window | |
6908 by pressing the Escape key. | |
6909 ## | |
6910 | |
6911 Hide Windows | |
6912 | |
6913 Select this entry from the ©Windows Menuª to hide all open windows | |
6914 except the ©Control Panelª. Restore them to the screen with ©Restore | |
6915 Windowsª | |
6916 ## | |
6917 | |
6918 Restore Windows | |
6919 | |
6920 Use this command on the ©Windows Menuª to restore to the screen | |
6921 any windows you have previously hidden with ©Hide Windowsª. | |
6922 ## | |
6923 | |
6924 Token Input, Preprocessor, Lexical Scanner | |
6925 | |
6926 AnaGram makes it unnecessary, in most cases, to have a separate | |
6927 preprocessor to provide the ©tokensª which are fed to your parser. | |
6928 | |
6929 However in some cases you may want to use a preprocessor, or lexical | |
6930 scanner, to provide input to your parser. The preprocessor may | |
6931 or may not be written in AnaGram. If it sends the parser token | |
6932 numbers, as opposed to character codes, this is referred to as token | |
6933 input, as opposed to character input. Please refer to the AnaGram | |
6934 User's Guide for information on identifying the tokens to the parser | |
6935 and providing their semantic values, if any. | |
6936 | |
6937 Since a ©File Traceª is based on character codes, it will be greyed out | |
6938 on the ©Action Menuª if you have token input. For a ©Grammar Traceª, | |
6939 entering characters in the ©text entryª field is not appropriate and | |
6940 will simply cause a syntax error. | |
6941 ## | |
6942 | |
6943 Lookahead Line | |
6944 | |
6945 The last line of the ©Parser Stack paneª, the "lookahead" line, | |
6946 will sometimes show a ©lookahead | |
6947 tokenª, and sometimes not. In a ©File Traceª, you will always see a | |
6948 lookahead token because it is available from the ©test fileª. | |
6949 | |
6950 In a ©Grammar Traceª you will usually see a lookahead token only when | |
6951 you have used the ©Single Stepª button or if there is available | |
6952 input in the ©text entryª field. In the latter case the token | |
6953 corresponding to the first character of the input will appear on the | |
6954 lookahead line. | |
6955 | |
6956 If you click Single Step after selecting a token from ©Allowable | |
6957 Inputª and it causes only a simple ©reduce actionª (as opposed to a | |
6958 shift or a compound action), then, upon completion of the reduction, | |
6959 the token you selected will appear on the lookahead line and also | |
6960 will be preselected in Allowable Input. | |
6961 | |
6962 Usually you would select | |
6963 this token for the next parse step. However, if there are other | |
6964 possible inputs in this state, the parse theoretically could have | |
6965 arrived at this state by a different sequence of input tokens. Thus, | |
6966 if you are more interested in the behavior of the parser at this | |
6967 state than in the response of the parser to a particular sequence of | |
6968 inputs, it is perfectly valid to select a different input token, and | |
6969 AnaGram will let you do it. | |
6970 | |
6971 Note that if you have enabled the ©traditional engineª switch or | |
6972 disabled the ©default reductionsª switch, the | |
6973 probability of finding a token which does a simple reduction is | |
6974 noticeably higher than otherwise. | |
6975 ## | |
6976 | |
6977 Action Menu | |
6978 | |
6979 The Action menu begins with the ©Analyze Grammarª and ©Build Parserª | |
6980 commands. If a grammar has already been analyzed, but not yet built, | |
6981 there will also be an extra Build command bearing the name of your | |
6982 syntax file. | |
6983 | |
6984 There are also ©Reanalyzeª and ©Rebuildª commands which are | |
6985 initially greyed out. They become available if you change the | |
6986 current syntax file. | |
6987 | |
6988 The next section has ©File Traceª and ©Grammar Traceª | |
6989 commands. If you have enabled the ©Error Traceª | |
6990 ©configuration switchª, this section also shows an | |
6991 Error Trace command. | |
6992 | |
6993 The menu ends with an ©Exitª command | |
6994 and a list of recently used syntax files, if any. Just | |
6995 click on a syntax file name to have AnaGram analyze it, or | |
6996 build it if the ©Autobuildª option is on. | |
6997 ## | |
6998 | |
6999 Browse Menu | |
7000 | |
7001 Initially, the Browse Menu shows only a single entry: | |
7002 ©Configuration Parametersª which lets you see the | |
7003 current state of configuration parameters before any | |
7004 may have been set by your syntax file. Once you have | |
7005 analyzed a grammar, this menu fills up with many tables | |
7006 containing information about your grammar. You can also | |
7007 bring up a window showing your ©syntax fileª from this menu. | |
7008 If your grammar has generated ©syntax errorªs or warnings, or | |
7009 contains conflicts, there will be ©Warningªs or ©Conflictªs | |
7010 entries. | |
7011 ## | |
7012 | |
7013 Options Menu | |
7014 | |
7015 From this menu you can select a ©Fontsª or ©Colorsª dialog so you can | |
7016 set AnaGram's fonts and colors to suit your own tastes. You can set | |
7017 ©Autobuildª if you want AnaGram to automatically build your ©grammarª | |
7018 when you select a ©syntax fileª from the ©Action Menuª. You can also | |
7019 choose whether or not to automatically show the ©Statistical Summaryª | |
7020 window or your syntax file window when you open a grammar, or make | |
7021 the ©Control Panelª stay on top of other AnaGram windows. | |
7022 ## | |
7023 | |
7024 Windows Menu | |
7025 | |
7026 The Windows menu lets you cascade, close, or hide all AnaGram | |
7027 windows except the ©Control Panelª, or restore them if they | |
7028 have been hidden. It also has a list of open windows (even | |
7029 if hidden) so you can select the one you want. The Copy command will | |
7030 copy most windows to the clipboard. | |
7031 ## | |
7032 | |
7033 Help Menu | |
7034 | |
7035 The Help Menu has the following entries: | |
7036 | |
7037 ©Getting Startedª provides a brief description of AnaGram and | |
7038 introductory suggestions. | |
7039 | |
7040 ©Help Topicsª brings up a list of all help topics. | |
7041 | |
7042 ©Using Helpª tells you how to use AnaGram's help facilities. | |
7043 | |
7044 ©What's Newª has information on new features of this version of AnaGram. | |
7045 | |
7046 ©About AnaGramª tells you what version of AnaGram you are using, and also | |
7047 provides contact information for Parsifal Software. | |
7048 ## | |
7049 | |
7050 Autobuild | |
7051 | |
7052 When Autobuild (©Options Menuª) is checked, selecting a file | |
7053 from the list of most recently used files on the ©Action Menuª | |
7054 invokes the ©Build Parserª command. Otherwise, the ©Analyze | |
7055 Grammarª command is invoked. | |
7056 ## | |
7057 | |
7058 Reanalyze, Rebuild | |
7059 | |
7060 Reanalyze and Rebuild commands on the ©Action Menuª are | |
7061 initially greyed out. | |
7062 | |
7063 Reanalyze becomes available if | |
7064 you have a syntax file currently analyzed or built | |
7065 in AnaGram and change it while AnaGram is still running. | |
7066 | |
7067 Rebuild becomes available if | |
7068 you have a syntax file currently built | |
7069 and change it while AnaGram is still running. | |
7070 ## | |
7071 | |
7072 Percent Sign | |
7073 | |
7074 The percent sign ( % ) is used to mark certain tokens in your grammar | |
7075 which AnaGram must redefine in order to implement the ©disregardª | |
7076 statement. If you have used this statement in your grammar, You will | |
7077 probably notice the percent sign appearing in some windows and traces. | |
7078 | |
7079 The percent sign indicates the original token, without the optional | |
7080 white space attached. Early versions of AnaGram used the degree sign | |
7081 instead, but this character is not generally available in Windows. | |
7082 ## | |
7083 | |
7084 Program Development | |
7085 | |
7086 The first step in writing a program is to write a ©grammarª in | |
7087 AnaGram notation which describes the input the program expects. | |
7088 | |
7089 The file containing the grammar, called the ©syntax fileª, should | |
7090 have the extension ".syn". You could also make up a few sample input | |
7091 files at this time, but it is not necessary to write ©reduction | |
7092 procedureªs at this stage. | |
7093 | |
7094 Run AnaGram and use the ©Analyze Grammarª command to create parse | |
7095 tables. If there are ©syntax errorsª in the grammar at this point, | |
7096 you will have to correct them before proceeding, but you do not | |
7097 necessarily have to eliminate ©conflictsª, if there are any, at this | |
7098 time. There are, however, many aids available to help you with | |
7099 conflicts. These aids are described in the AnaGram User's Guide, and | |
7100 somewhat more briefly in the Online Help topics. | |
7101 | |
7102 Once syntax errors are corrected, you can try out your grammar on the | |
7103 sample input files using the ©File Traceª facility. | |
7104 With File Trace, you can see interactively just how your grammar | |
7105 operates on your test files. You can also use ©Grammar Traceª to | |
7106 answer "what if" questions concerning input to the grammar. The | |
7107 Grammar Trace does not use a test file, but rather allows you to make | |
7108 input choices interactively. | |
7109 | |
7110 At any time, you can write ©reduction procedureªs to process your | |
7111 input data as its components are identified in the input stream. Each | |
7112 procedure is associated with a ©grammar ruleª. The reduction | |
7113 procedures will be incorporated into your parser when you create it | |
7114 with the ©Build Parserª command. | |
7115 | |
7116 By default, unless you specify an input procedure, ©parser inputª | |
7117 will be read from stdin, using the default ©GET_INPUTª macro. | |
7118 You will probably wish to redefine GET_INPUT, or configure your | |
7119 parser to use ©pointer inputª or ©event drivenª input. | |
7120 ## | |
7121 | |
7122 License, Copyright, Copying, Open Source, Warranty, No Warranty | |
7123 | |
7124 AnaGram, A System for Syntax Directed Programming | |
7125 | |
7126 Copyright 1993-2002 Parsifal Software | |
7127 | |
7128 Copyright 2006, 2007 David A. Holland | |
7129 | |
7130 All Rights Reserved. | |
7131 | |
7132 AnaGram itself is released to the public under the traditional 4-clause BSD | |
7133 license: | |
7134 | |
7135 Redistribution and use in source and binary forms, with or without | |
7136 modification, are permitted provided that the following conditions are | |
7137 met: | |
7138 | |
7139 1. Redistributions of source code must retain the above copyright notice, | |
7140 this list of conditions and the following disclaimer. | |
7141 | |
7142 2. Redistributions in binary form must reproduce the above copyright | |
7143 notice, this list of conditions and the following disclaimer in the | |
7144 documentation and/or other materials provided with the distribution. | |
7145 | |
7146 3. All advertising materials mentioning features or use of this software | |
7147 must display the following acknowledgement: | |
7148 This product includes software developed by Parsifal Software, | |
7149 Jerome T. Holland, and their contributors. | |
7150 | |
7151 4. Neither the name of Parsifal Software nor the name of Jerome T. | |
7152 Holland nor the names of their contributors may be used to endorse or | |
7153 promote products derived from this software without specific prior written | |
7154 permission. | |
7155 | |
7156 THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE, | |
7157 JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY | |
7158 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
7159 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY | |
7160 AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. | |
7161 IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T. | |
7162 HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT, | |
7163 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | |
7164 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | |
7165 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF | |
7166 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
7167 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | |
7168 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
7169 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
7170 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
7171 POSSIBILITY OF SUCH DAMAGE. | |
7172 | |
7173 The AnaGram ©parsing engineª, that is, the code that is emitted by | |
7174 AnaGram and incorporated into programs developed using AnaGram, uses | |
7175 this less restrictive zlib-style license: | |
7176 | |
7177 This software is provided 'as-is', without any express or implied warranty. | |
7178 In no event will the authors be held liable for any damages arising from | |
7179 the use of this software. | |
7180 | |
7181 Permission is granted to anyone to use this software for any purpose, | |
7182 including commercial applications, and to alter it and redistribute it | |
7183 freely, subject to the following restrictions: | |
7184 | |
7185 1. The origin of this software must not be misrepresented; you must not | |
7186 claim that you wrote the original software. If you use this software in a | |
7187 product, an acknowledgment in the product documentation would be | |
7188 appreciated but is not required. | |
7189 | |
7190 2. Altered source versions must be plainly marked as such, and must not | |
7191 be misrepresented as being the original software. | |
7192 | |
7193 3. This notice may not be removed or altered from any source distribution. | |
7194 | |
7195 ## |