comparison anagram/guisupport/helpdata.src @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 Accept Action
2
3 The accept action is one of the four actions of a
4 traditional ©parsing engineª. The accept action is
5 performed when the ©parserª has succeeded in identifying
6 the goal, or ©grammar tokenª for the ©grammarª. When
7 the parser executes the accept action, it sets the ©exit_flagª
8 field in the ©parser control blockª to AG_SUCCESS_CODE and returns
9 to the calling program. The accept action is thus the last action of
10 the parsing engine and occurs only once for each successful execution
11 of the parser.
12
13 If the grammar token has a non-void value, you may
14 obtain its value by calling the ©parser value functionª
15 whose name is given by <parser name>_value, that is,
16 by appending "_value" to the ©parser nameª.
17 ##
18
19 Parser Value Function, Return Value
20
21 The value assigned to the ©grammar tokenª in your parser
22 may be retrieved by calling the parser value function after
23 the parser has finished. The name of this function is given
24 by <©parser nameª>_value. The return type of the function
25 is the type assigned to the grammar token.
26
27 If you have set the ©reentrant parserª switch, the parser
28 value function takes a pointer to the ©parser control blockª
29 as its sole argument. Otherwise, it takes no arguments. The
30 value function is not defined if the grammar token has type "void".
31 ##
32
33 AG_PLACEMENT_DELETE_REQUIRED
34
35 When the ©wrapperª option is specified, the wrapper
36 template class that AnaGram defines uses a "placement
37 new" operator to construct the wrapper object on the
38 ©parser value stackª. The MSVC++ 6.0 compiler requires,
39 in this situation, that a corresponding "placement
40 delete" operator be defined. Other C++ compilers,
41 notably MSVC++ 5.0, generate an error message if
42 they encounter the definition of a "placement delete"
43 operator.
44
45 Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine
46 whther a "placement delete" operator should be defined.
47
48 AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++
49 6.0 or greater, 0 otherwise. You can override the automatic definition of
50 AG_PLACEMENT_DELETE_REQUIRED by defining it in the ©C prologueª section
51 of your grammar. Set it to a non-zero value to force the "placement
52 delete" definition, zero to skip the definition.
53
54 ##
55
56 ag_tcv
57
58 ag_tcv is an array AnaGram includes in your ©parserª.
59 Your parser uses ag_tcv to translate external codes to
60 the internal token numbers that AnaGram uses. It uses
61 the actual input code to index the ag_tcv array to
62 fetch a ©token numberª. The token number is then used
63 to identify the input token.
64 ##
65
66 Allow macros
67
68 "Allow macros" is a ©configuration switchª which
69 defaults to on. When it is set, i.e., on, ©reduction
70 procedureªs will be implemented as macros if they are
71 sufficiently simple. This makes your ©parserª somewhat
72 more compact but makes it somewhat more difficult to
73 debug. It's a good idea to turn this switch off for
74 debugging.
75 ##
76
77 Analyze Grammar
78
79 The Analyze Grammar command will scan and
80 analyze your ©syntax fileª, and create a number of
81 tables summarizing your grammar.
82
83 Analyze Grammar does not create any ©output filesª.
84 To create a ©parserª, use the ©Build Parserª command.
85 You would probably use Analyze Grammar, rather than Build Parser, during
86 initial development of your ©grammarª.
87
88 You can use ©File Traceª and ©Grammar Traceª as soon as you have
89 analyzed your grammar. It is not necessary to build a parser first.
90 ##
91
92 Attribute Statement
93
94 Attribute statements are used in ©configuration
95 sectionsª of your ©syntax fileª to specify certain
96 properties for ©tokenªs, ©character setªs, or other
97 units of your grammar. The attribute statements
98 available are:
99 ©disregardª
100 ©distinguish keywordsª
101 ©enumª
102 ©extend pcbª
103 ©hiddenª
104 ©leftª
105 ©lexemeª
106 ©nonassocª
107 ©rename macroª
108 ©reserve keywordsª
109 ©rightª
110 ©stickyª
111 ©subgrammarª
112 ©wrapperª
113 ##
114
115 Auto init
116
117 Auto init is a ©configuration switchª which defaults to
118 on. It controls the initialization of any ©parserª that
119 it is not ©event drivenª. When it is set to on, your
120 parser is automatically initialized every time it is
121 called. This is the situation you will normally use. On
122 occasion, however, it is desirable to call a parser
123 several times without reinitializing it. In this case,
124 you may set the auto init parameter to off and then
125 call the ©initializerª yourself whenever it is
126 appropriate.
127 ##
128
129 Auto resynch
130
131 "Auto resynch" is a ©configuration switchª which
132 defaults to off. You may use it to specify ©automatic
133 resynchronizationª as an ©error recoveryª mechanism.
134
135 Setting the "auto resynch" switch causes AnaGram to
136 include an automatic ©resynchronizationª procedure in
137 your ©parserª. The resynchronization procedure will be
138 invoked when your parser encounters a ©syntax errorª
139 and will skip over input until it finds input
140 characters or ©tokensª consistent with its state at the
141 time of the error.
142
143 An alternate technique, ©error token resynchronizationª,
144 uses an ©error tokenª which you include in your grammar.
145 ##
146
147 Automatic Resynchronization
148
149 Automatic ©resynchronizationª is one of several ©error
150 recoveryª options available as part of parsers built by
151 AnaGram. You enable automatic resynchronization by
152 setting the ©auto resynchª ©configuration switchª. If
153 your parser includes automatic resynchronization it will
154 incorporate a heuristic procedure which will skip over
155 input tokens until it finds a token which makes sense
156 with respect to one or another of the ©productionªs
157 active at the time of the ©syntax errorª.
158
159 The purpose of the resynchronization procedure is to
160 provide a simple way for your parser to proceed in the
161 event of syntax errors so that it can find more than one
162 syntax error on a given pass. The resynchronization
163 procedure uses a heuristic based on your own syntax.
164 AnaGram itself uses this technique to resynchronize
165 after syntax errors in its input.
166
167 A disadvantage to using this resynchronization technique
168 is that the resynchronization procedure turns off all
169 ©reduction procedureªs. Because of the error, a number
170 of reduction procedures, which normally would be
171 executed, will be skipped. The parameters for any
172 reduction procedures that might be called later would be
173 suspect and could cause serious problems. It seems more
174 prudent simply to shut them down.
175
176 If you use the automatic resynchronization procedure,
177 you must also specify an ©eof tokenª so that the
178 synchronizer doesn't inadvertently skip over the end of
179 file.
180
181 An alternative technique for resynchronization is called
182 ©error token resynchronizationª.
183 ##
184
185 Auxiliary Trace
186
187 An Auxiliary Trace is a pre-built grammar trace which
188 you may select from the ©Auxiliary Windowsª popup menu for
189 most windows which display parser state information.
190 The Auxiliary Trace provides a path to the state
191 specified in the highlighted line of the primary window.
192
193 When obtained
194 from the Parser Stack pane of the ©File Traceª or ©Grammar Traceª, the
195 Auxiliary Trace is simply a copy of the current status of these
196 traces so you can explore your alternatives while still retaining the
197 status of the original trace for reference.
198 ##
199
200 Auxiliary Windows
201
202 From most AnaGram windows you can pop up an Auxiliary Windows
203 menu by clicking the right mouse button or by pressing Shift F10.
204 Auxiliary Windows may
205 have Auxiliary Windows of their own.
206
207  Windows with a cursor bar (highlighted line):
208 The windows available in the Auxiliary Windows menu depend on the
209 grammar elements identified by the cursor bar in the parent window. If
210 the cursor bar identifies a ©parser stateª, there will be windows that
211 describe the state. If the cursor bar identifies a ©grammar ruleª,
212 there will be windows that describe the rule. If the cursor bar
213 identifies a ©tokenª, there will be windows that describe the token. In
214 the case of a ©marked ruleª, token windows will describe the marked
215 token, if any. In some cases, specialized pre-built grammar traces
216 such as the ©Conflict Traceª or ©Auxiliary Traceª are on the menu.
217
218  Help windows:
219 For Help windows, the Auxiliary Windows menu will show all the
220 available links to other ©Help topicsª from this window. ©Using Helpª
221 is always available.
222 ##
223
224 Backtrack
225
226 If your ©parserª does not continue after encountering a
227 ©syntax errorª, you can speed it up and make it a
228 little smaller by turning off the backtrack
229 ©configuration switchª. If backtrack is on, AnaGram
230 configures your parser so that in case of syntax error
231 it can undo any ©default reductionsª it might have made
232 as a consequence of the erroneous input. The purpose of
233 such an undo function is to identify the proper ©error
234 frameª and to maximize the probability of being able to
235 recover gracefully.
236 ##
237
238 Empty Recursion
239
240 This warning message tells you that the recursive step of the
241 specified ©recursive ruleª can be completely matched by ©zero
242 lengthª tokens, i.e., by nothing at all.
243 The result is potentially an infinite loop in the generated ©parserª.
244 The specified rule is an expansion rule of the specified token.
245
246 Because of the possibility of encountering an infinite loop while parsing,
247 AnaGram turns off its ©keyword anomalyª analysis if empty recursion is
248 found. The ©File Traceª function is also disabled for the same reason.
249
250 The ©circular definitionª of a token has the same effect as an
251 empty recursion, in that no additional input is required to match
252 the recursive rule.
253
254 ##
255 Keyword Anomaly analysis aborted: empty recursion
256
257 The ©keyword anomalyª analysis has been turned off, since the presence of
258 ©recursive ruleªs with ©empty recursionª can cause infinite loops in the analysis.
259
260 ##
261
262 Keyword Anomaly analysis aborted: circular definition
263
264 The ©keyword anomalyª analysis has been turned off, since the presence of
265 a ©circular definitionª can cause infinite loops in the analysis.
266
267 ##
268
269 File Trace disabled: empty recursion
270
271 Because of the presence of ©recursive ruleªs with ©empty recursionª in this grammar and
272 the infinite loops that can ensue, the ©File Traceª function has been
273 disabled.
274
275 ##
276
277 File Trace disabled: circular definition
278
279 Because of the presence of a ©circular definitionª in this grammar and
280 the infinite loops that can ensue, the ©File Traceª function has been
281 disabled.
282
283 ##
284
285
286
287 Both Error Token Resynch and Auto Resynch Specified
288
289
290
291 This ©warningª message indicates that your ©grammarª
292 defines an ©error tokenª and also requests ©automatic
293 resynchronizationª. AnaGram will ignore the request
294 for automatic resynchronization and will provide ©error
295 token resynchronizationª. If you named a token "error"
296 but do not wish ©error token resynchronizationª, you can
297 either rename "error", or, in a ©configuration
298 sectionª, you may explicitly specify the error token to
299 be something you don't otherwise use in your grammar:
300 [ error token = not used ]
301 ##
302
303 Bottom Margin
304
305 "Bottom margin" is an ©obsolete configuration parameterª.
306 ##
307
308 Bright Background
309
310 "Bright background" is a ©configuration switchª which
311 was used in the DOS version of AnaGram. It is no longer
312 used, but is still recognized for the sake of upward
313 compatibility with old ©configuration fileªs.
314 ##
315
316 Build Parser
317
318 You use the Build Parser command to create a ©parserª based on your
319 ©grammarª. The parser is a C file consisting of the ©embedded Cª (which
320 may include C++) code in your ©syntax fileª, your ©reduction
321 procedureªs, a number of tables derived from your grammar
322 specification, and a ©parsing engineª customized to your requirements.
323
324 If you only wish to investigate your grammar and do not
325 wish to create ©output filesª, use the ©Analyze
326 Grammarª command.
327 ##
328
329 Build <file name>
330
331 This item on the ©Action Menuª is available when you have analyzed a
332 ©grammarª but you have not yet built it. It builds the grammar
333 without reloading the ©syntax fileª from the disk.
334 ##
335
336 Cannot Make Wrapper for Default Token Type
337
338 This ©warningª message occurs when AnaGram finds a token type that has
339 been previously defined as the ©default token typeª
340 listed in a ©wrapperª statement. If a wrapper is needed for a
341 particular type, you must specify the ©data typeª explicitly
342 for each relevant ©tokenª.
343
344 As a result, a wrapper class has not been created for the specified token type.
345 ##
346
347 Token with Wrapper cannot be Default Token Type
348
349 This ©warningª message indicates that an attempt has been made
350 to specify a class that has previously been listed in a ©wrapperª
351 statement as the ©default token typeª.
352 If a wrapper is needed for a particular type, you must specify the
353 ©data typeª explicitly for each relevant ©tokenª.
354
355 As a result, the default token type has not been set.
356 ##
357
358 Case Sensitive
359
360 "Case sensitive" is a ©configuration switchª which
361 defaults to on. When it is on, it instructs AnaGram to
362 build a parser for which all input is case sensitive.
363 When it is off, the AnaGram builds a parser which
364 ignores case for all input.
365
366 If the ©iso latin 1ª configuration switch is turned
367 off, case conversion will be limited to characters
368 in the normal ascii range. When it is on, case
369 conversion will be done for all iso latin 1 characters.
370
371 If you have other requirements for case conversion,
372 you may provide your own definition in your ©embedded cª for the
373 ©CONVERT_CASEª macro which is invoked to perform case
374 conversion on input characters.
375
376 Note that the value of an input token is unaffected
377 by the case sensitive switch. When case sensitive is
378 off, 'a' and 'A' will be treated as the same input
379 token by the parser, but the ©token valueªs will
380 nevertheless be different.
381 ##
382
383 C Prologue
384
385 If you include a block of ©embedded Cª code at the very
386 beginning of your syntax file, it is called the "C
387 prologue". It will be copied to your ©parser fileª
388 before any of the code generated by AnaGram. You can
389 use the C prologue to ensure that copyright notices,
390 #include directives, or type definitions, for example,
391 occur at the very beginning of your parser file.
392
393 If you specify a C or C++ type of your own definition,
394 you must provide a definition in the C prologue.
395 ##
396
397 CHANGE_REDUCTION
398
399 CHANGE_REDUCTION(t) is a macro which AnaGram defines in
400 your ©parser fileª if your ©parserª uses ©semantically
401 determined productionsª. In your ©reduction procedureª,
402 when you need to change the ©reduction tokenª you can
403 easily do so by calling CHANGE_REDUCTION with the name
404 of the desired token as the argument. If the token name
405 has embedded spaces, replace the embedded spaces with
406 underline characters.
407 ##
408
409 Character Constant
410
411 You may represent single characters in your ©grammarª by
412 using character constants. The rules for character
413 constants are the same as in C. The escape sequences
414 are as follows:
415 \a alert (bell) character
416 \b backspace
417 \f formfeed
418 \n newline
419 \r carriage return
420 \t horizontal tab
421 \v vertical tab
422 \\ backslash
423 \? question mark
424 \' single quote
425 \" double quote
426 \ooo octal number
427 \xhh hexadecimal number
428
429 AnaGram treats a single
430 character as a ©character setª
431 which contains only the specified character. Therefore you
432 can use a character constant in a ©set expressionª.
433 ##
434
435 Character Map
436
437 The Character Map table shows you the mapping of input
438 characters to ©token numbersª. The ©ag_tcvª table in
439 your parser is based on the information in this table.
440
441 The fields in this table are:
442 character code
443 display character, if any (what Windows displays for this code)
444 ©partition set numberª
445 ©token numberª
446 ©token representationª
447
448 The display character will be what Windows displays for the character
449 code in the Data Tables font you have chosen.
450 ##
451
452 Character Range
453
454 A "character range" is a simple way to specify a
455 ©character setª. There are two ways to represent a
456 character range in an AnaGram ©syntax fileª.
457
458 The first way is like a ©character constantª: 'a-z'.
459
460 The second way allows somewhat greater freedom:
461 'a'..'z'
462 'a'..255
463 ^Z..037
464 -1..0xff
465 Here you use two arbitrary ©character representationsª
466 separated by two dots. If the two characters are out of
467 order, AnaGram will reverse the order, but will give
468 you a ©warningª.
469
470 More complex ©character setsª may be specified by using
471 ©unionª, ©differenceª, ©intersectionª, or ©complementª
472 operators.
473 ##
474
475 Character Representation
476
477 In an AnaGram ©syntax fileª you may represent a
478 character literally with a ©character constantª or
479 numerically using decimal, octal or hexadecimal
480 representations following the conventions for C. Thus
481 'A', 65, 0101, and 0x41 all represent the same
482 character. Control characters can be represented using
483 the '^' character and either an upper or lower case
484 letter. Thus ^j and ^J are acceptable representations
485 of the ascii newline code. The rules for character
486 constants are identical to those in C, and the same
487 escape sequences are recognized.
488 ##
489
490 Character Set
491
492 In AnaGram grammars you can conveniently specify whole
493 sets of characters at a time. This avoids
494 needless repetition and complexity.
495
496 Sets of characters may be defined in an AnaGram ©syntax
497 fileª in any of a number of ways. A single character is
498 taken to represent a character set consisting of a
499 single element. (See ©character representationª.) You
500 can also specify a set consisting of a range of
501 characters (see ©character rangeª) and perform the
502 familiar set operations, union, intersection, difference
503 and complement.
504
505 All the sets you define in your syntax file are
506 summarized in the ©Character Setsª window.
507
508 The ©unionª of two character sets, represented by a '+',
509 contains all characters that are in one or another of
510 the two sets. Thus, 'A-Z' + 'a-z' represents the set of
511 all upper and lower case letters.
512
513 The ©intersectionª of two character sets, represented
514 by a '&', contains all characters that are in both
515 sets. Thus, suppose you have the ©definitionsª
516 letter = 'A-Z' + 'a-z'
517 hex digit = '0-9' + 'A-F' + 'a-f'
518 Then (letter & hex digit) contains precisely upper and
519 lower case a to f.
520
521 The ©differenceª of two character sets, represented by
522 a '-', contains all characters that are in the first
523 set but not in the second set. Thus, using the same
524 definitions as above, (letter - hex digit) contains
525 precisely upper and lower case g to z.
526
527 The ©complementª of a character set, represented by a
528 preceding '~', represents all characters in the
529 ©character universeª which are not in the given set.
530 Suppose you have defined a set, ©eofª, which consists of
531 the characters which represent end of file. Then, in
532 your grammar where you wish to accept an arbitrary
533 character, what you really want is anything but an end
534 of file character. You can define it thus:
535 anything = ~eof
536 ##
537
538 Character Sets
539
540 This window lists all of the distinct ©character setªs
541 which you defined, implicitly or explicitly, in your
542 ©grammarª. Each line in the table describes one such
543 set.
544
545 The description takes the form of the internal set
546 number and the defining ©expressionª. The ©Auxiliary
547 Windowsª menu will allow you to see the ©Partition
548 Setsª which cover the character set, and the ©Set
549 Elementsª which it comprises, as well as the ©Token Usageª.
550 ##
551
552 Character Universe, Universe
553
554 The character universe, or set of all expected input
555 characters to your parser, is defined as all characters
556 in the range given by a particular lower bound and a
557 particular upper bound, as described below.
558
559 The character universe is used for two things in
560 AnaGram. The first use is for calculating the
561 ©complementª of a character set. The second use is in
562 the input processing of your parser. Input characters
563 will be used to index a ©token conversionª table to
564 convert character codes to token numbers. The length of
565 this table will be given by the size of the character
566 universe. If you have set the ©test rangeª
567 ©configuration switchª you parser will verify that the
568 input character is within the range of the conversion
569 table. Otherwise, the character code will not be
570 checked for validity. In this case, an out-of-range
571 character will lead to undefined behavior.
572
573 If you have not used any characters with negative codes
574 in your grammar, the lower bound is zero. Otherwise, it
575 is the most negative such character.
576
577 If the highest character code you have used is less
578 than or equal to 255, the upper bound will be 255.
579
580 If you have used a character code greater than 255, the
581 upper bound will be the largest such code which appears
582 in your syntax file.
583 ##
584
585 Characteristic Rule
586
587 Each ©parser stateª is characterized by a particular
588 set of ©grammar rulesª, and for each such rule, a
589 marked token which is the next ©tokenª expected. The
590 combination of a grammar rule and its marked token is often
591 called a ©marked ruleª. A marked rule which
592 characterizes a state is called a "characteristic
593 rule". In the course of doing ©grammar analysisª,
594 AnaGram determines the characteristic rules for each
595 ©parser stateª. After analyzing your grammar, you may
596 inspect the ©State Definition Tableª to see the
597 characteristic rules for any state in your parser.
598 ##
599
600 Characteristic Token
601
602 Every state in a ©parserª, except state 0, can be
603 characterized by the one, unique ©tokenª which causes a
604 jump to that state. That token is called the
605 ©characteristic tokenª of the state, because to get to
606 that ©parser stateª you must have just seen precisely
607 that token in the input. Note that several states could
608 have the same characteristic token.
609
610 When you have a list of states, such as is given by the
611 ©parser state stackª, it is equivalent to a list of
612 characteristic tokens. This list of tokens is the list
613 of tokens that have been recognized so far by the
614 parser.
615 ##
616
617 Circular Definition
618
619 If the ©expansion ruleªs for a ©tokenª contain a ©grammar ruleª that
620 consists only of the token itself, the definition of the
621 token is circular. A circular definition is an extreme
622 case of ©empty recursionª.
623
624 As in cases of empty recursion, the generated parser may contain
625 infinite loops. When such a condition is detected, therefore,
626 ©keyword anomalyª analysis the ©File Traceª option are disabled.
627
628 ##
629
630 column
631
632 "column" is an integer field in your ©parser control
633 blockª used for keeping track of the column number of
634 the current character in your input. Line and column
635 numbers are tracked only if the ©lines and columnsª
636 ©configuration switchª has been set.
637 ##
638
639 Command Line
640
641 If you provide the name of a syntax file on the
642 command line when you start AnaGram, it will open
643 the file and run either ©Analyze Grammarª or ©Build
644 Parserª depending on the setting of the ©Autobuildª
645 switch.
646 ##
647
648 Command Line Version, agcl.exe
649
650 The command line version of AnaGram, agcl.exe, can be
651 used in make files. It takes the name of a single syntax
652 file on the command
653 line. Error and ©warningª messages are written to stdout.
654
655 Normally you would only use the command line version once you
656 have finished developing your ©parserª and are integrating
657 it with the rest of your program.
658
659 The command line version of AnaGram is not included with
660 trial copies.
661 ##
662
663 Comment
664
665 You may incorporate comments in your syntax file using
666 either of two conventions. The first is the normal C
667 convention for comments which begin with "/*" and end
668 with "*/". Such comments may be of arbitrary length. By
669 setting or resetting the ©nest commentsª switch, you
670 may control whether they may be nested or not.
671
672 The second convention for comments is the C++ comment
673 convention. In this case the comment begins with "//"
674 and ends with a newline.
675
676 When writing a ©grammarª, you may wish to allow a user
677 to comment his input freely without your having to
678 explicitly allow for comments in your grammar. You may
679 accomplish this by using the ©disregardª statement.
680 ##
681
682 Compile Command
683
684 "Compile command" is a ©configuration parameterª which
685 takes a string value. This parameter was used in the
686 DOS version of AnaGram, but is ignored in the Windows
687 version.
688 ##
689
690 Complement
691
692 In set theory, the complement of a set, S, is the set
693 of all elements of the ©universeª which are not members
694 of the set S.
695
696 In AnaGram, the complement operator for ©character
697 setsª is given by '~' and has higher precedence than
698 ©differenceª, ©intersectionª, or ©unionª.
699
700 In AnaGram, the most useful complement is that of the
701 end of file character set. For ordinary ascii files it
702 is often convenient to read the entire file into
703 memory, append a zero byte to the end, and define the
704 end of file set thus:
705 eof = 0 + ^Z.
706 Then, ~©eofª represents all legitimate input characters.
707
708 You can then use set differences to specify certain
709 useful sets without tedious enumeration. For example, a
710 comment that is to be terminated by the end of line
711 then consists of characters from the set
712 comment char = ~'\n' & ~eof
713 This set could also be written
714 comment char = ~('\n' + eof)
715 ##
716
717 Completed Rule
718
719 A "completed rule" is a ©characteristic ruleª which has no ©marked
720 tokenª. In other words, it has been completely matched and will be
721 reduced by the next input.
722
723 If there is more than one completed rule in a state,
724 the decision as to which to reduce is made based on the
725 next input token. If there is only one completed rule
726 in a state, it will be reduced by default unless the
727 ©default reductionsª switch has been reset, i.e.,
728 turned off.
729 ##
730
731 Configuration File
732
733 If it can find them, AnaGram reads two configuration
734 files to set up ©configuration parameterªs. At program
735 initialization, it will first attempt to read a
736 configuration file in the directory that contains
737 the AnaGram executable file you are running. Then it
738 will read a configuration file in your working
739 directory. Both files should have the name
740 "AnaGram.cfg" if they exist. Neither is necessary.
741
742 If a parameter is specified in both files, the
743 specification in the file from the working directory
744 takes precedence.
745
746 The effect of this two stage process is to allow you to
747 set your standard preferences in the principal
748 directory, with specific overrides in your working
749 directories.
750
751 The values for configuration parameters in ©syntax
752 filesª override those read from configuration files.
753
754 AnaGram does not save configuration parameters in
755 the Windows registry, nor does it provide any
756 mechanism for setting or changing the values of
757 configuration parameters within AnaGram itself.
758 ##
759
760 Configuration Parameter
761
762 Configuration parameters may be specified either in
763 ©configuration filesª or in your ©syntax fileª. In your
764 syntax files, configuration parameters are specified,
765 one per line, in a ©configuration sectionª.
766
767 AnaGram ignores case when identifying a configuration
768 parameter, so that "ALLOW MACROS", "Allow Macros", and
769 "allow macros" are all equivalent forms.
770
771 There may be any number of configuration sections in a
772 ©syntax fileª. Any parameter may be specified any
773 number of times. Since AnaGram maintains only one value
774 in storage for these parameters, whenever it refers to
775 one it will see the most recently specified value.
776 Every configuration parameter has a default value which
777 has been chosen to correspond to a standard if it
778 exists, customary usage if such can be determined, or
779 otherwise to the most likely usage.
780
781 Before executing an Analyze Grammar or Build Parser command, AnaGram
782 resets configuration parameters to their initial values, as
783 determined by the built in defaults and the configuration files read
784 at program initialization.
785
786 The ©Configuration Parameters Windowª shows the current settings of all
787 of the configuration parameters. When this window is active you may
788 press ©F1ª or click with the ©help cursorª to pop up a help window
789 describing the parameter under the cursor bar.
790
791 There are several varieties of configuration
792 parameters. Some simply set or reset a condition. These
793 need simply be stated to set the condition or negated
794 with the tilde (~) to reset the condition. Thus
795 [ nest comments ]
796 causes AnaGram to allow nested comments, and
797 [ ~nest comments ]
798 causes AnaGram to disallow nested comments.
799
800 If you prefer you may explicitly specify a switch value as on or off:
801 [ nest comments = on]
802
803 A second kind
804 of configuration parameter takes a value
805 which is the name of a token. Thus
806 [ grammar token = c grammar]
807 specifies that the token, c grammar, is the ©grammar
808 tokenª which is to be analyzed.
809
810 A third variety of configuration parameter takes a
811 value which is a C data type. Thus
812 [ default token type = unsigned char *]
813 signifies that the ©semantic valueª of a token, unless
814 otherwise specified is a pointer to an unsigned char.
815
816 A fourth variety of configuration parameter takes a
817 string value to set some ascii string used by AnaGram.
818 Thus
819 [ header file name = "widget.h" ]
820 signifies that the header file created by AnaGram
821 should be called "widget.h".
822
823 In string-valued parameters used to specify the names
824 of output files or the name of your parser, you may use
825 the '#' character to indicate the name of your syntax
826 file: When the string is actually used, AnaGram will
827 substitute the syntax file name for the '#'.
828
829 In string-valued parameters used to specify the names
830 of functions or variables that AnaGram generates, you
831 may use '$' to specify the name of your parser. When
832 the string is actually used, AnaGram will substitute
833 the name of your parser for the '$'.
834
835 In the "©enum constant nameª" configuration parameter
836 you may use '%' to specify where a token name is to be
837 substituted.
838
839 The final variety of configuration parameter takes a
840 numeric value. The value may be decimal, octal
841 or hexadecimal, following the C conventions, and may
842 have an optional sign. Thus
843 [parser stack size = 50]
844 tells AnaGram to allocate space for at least fifty stack entries
845 when it creates your parser.
846 ##
847
848 Configuration Parameters Window
849
850 The Configuration Parameters window lists the
851 ©configuration parameterªs AnaGram accepts with their
852 current values, as set by the ©configuration filesª it
853 has read and by the most recent ©syntax fileª it has
854 analyzed. Configuration parameters cannot be changed
855 from within AnaGram.
856 ##
857
858 Configuration Section
859
860 A configuration section is one of the main divisions of
861 your ©syntax fileª. It begins with a left square
862 bracket on a fresh line. It then contains definitions
863 of ©configuration parameterªs, ©configuration switchª
864 settings and ©attribute statementªs. These
865 specifications must each start on a new line. The
866 configuration section is closed with a right bracket.
867 Any further component of your syntax file, other than a
868 ©commentª, must start on a fresh line.
869
870 There can be any number of configuration sections in a
871 syntax file.
872 ##
873
874 Configuration Switch
875
876 A configuration switch is a ©configuration parameterª
877 which can take on only the two values true and false,
878 or on and off. You set a configuration switch, or turn
879 it on, by simply naming it in your ©configuration fileª
880 or in a ©configuration sectionª of your ©syntax fileª.
881 You turn it off, or "reset" it, by use of the tilde:
882 "~nest comments", for example, resets, or turns off,
883 the ©nest commentsª switch. If you prefer, you may
884 assign the value "on" to set the switch, or "off" to
885 reset it. For example:
886 nest comments = on
887 ##
888
889 Conflict
890
891 "Conflicts" arise during the ©grammar analysisª when
892 AnaGram cannot determine how to treat a given input
893 token. There are two sorts of conflicts: ©shift-reduce
894 conflictsª and ©reduce-reduce conflictsª. Conflicts may
895 arise either because the grammar is inherently
896 ambiguous, or simply because the grammar analyzer
897 cannot look far enough ahead to resolve the conflict.
898 In the latter case, it is often possible to rewrite the
899 grammar in such a way as to eliminate the conflict. In
900 particular, ©null productionsª are a common source of
901 conflicts.
902
903 When AnaGram analyzes your grammar, it lists all
904 unresolved conflicts in the ©Conflictsª window. A number
905 of ©Auxiliary Windowsª available from the Conflicts window
906 provide help in identifying the source of the conflict.
907
908 There are a number of ways to deal with conflicts. If
909 you understand the conflict well, you may simply choose
910 to ignore it. When AnaGram encounters a shift-reduce
911 conflict while building parse tables it resolves it by
912 choosing the ©shift actionª. When AnaGram encounters a
913 reduce-reduce conflict while building parse tables, it
914 resolves it by selecting the ©grammar ruleª which
915 occurred first in the grammar.
916
917 A second way to deal with conflicts is to set ©operator
918 precedenceª parameters. If you set these parameters,
919 AnaGram will use them preferentially to resolve
920 conflicts. Any conflicts so resolved will be listed in
921 the ©Resolved Conflictsª window.
922
923 A third way to resolve a conflict is to declare some
924 tokens as ©stickyª. This is particularly useful for
925 ©productionªs whose sole purpose is to skip over
926 uninteresting input.
927
928 A fourth way to resolve conflicts is to declare a token
929 to be a ©subgrammarª. When you do this, AnaGram does
930 not look beyond the definition of the subgrammar token
931 itself for reducing tokens. This is not a particularly
932 selective way to resolve conflicts and should be used
933 only when the subgrammar token is naturally defined
934 only by internal criteria. The tokens identified by
935 lexical scanners are prime examples of this genre.
936
937 The fifth way to deal with conflicts is to rewrite the
938 grammar to eliminate them. Many people prefer this
939 approach since it yields the highest level of
940 confidence in the resulting program.
941
942 Please refer to the AnaGram User's Guide for more information about
943 dealing with conflicts.
944 ##
945
946 Conflicts
947
948 If there are ©conflictªs in your grammar which are not
949 resolved by ©precedence rulesª, they will be listed in
950 the Conflicts window. The Conflicts window will also be
951 listed in the ©Browse Menuª. Conflicts which have been
952 resolved by ©precedence rulesª are listed in the
953 ©Resolved Conflictsª window.
954
955 The Conflicts window lists the conflicts, or
956 ambiguities, which AnaGram found in your grammar. The
957 table identifies the ©parser statesª in which it found
958 conflicts, the ©conflict tokenªs for which it had more
959 than one option, and the ©marked rulesª for each such
960 option. If one of the rules for a particular conflict
961 has a ©marked tokenª, the conflict is
962 a ©shift-reduce conflictª. The marked token is the token
963 to be shifted. If none of the rules has a marked token the conflict is
964 a ©reduce-reduce conflictª.
965
966 AnaGram provides a number of ©Auxiliary Windowsª to help
967 you find and fix the source of the conflict. The
968 ©Conflict Traceª window is a pre-built ©Grammar Traceª
969 window which shows you one of perhaps many ways to
970 encounter the conflict. The ©Reduction Traceª window
971 shows the result of reducing a particular ambiguous
972 rule.
973
974 In addition, the ©Rule Derivationª and ©Token
975 Derivationª windows show you why the conflict token is a
976 ©reducing tokenª. They are particularly useful for
977 shift-reduce conflicts.
978
979 The ©Expansion Chainª window is helpful for understanding
980 reduce-reduce conflicts.
981
982 Other Auxiliary Windows which are often useful are the
983 ©State Definitionª window, the ©Reduction Statesª
984 window, and the ©Problem Statesª window.
985
986 Please refer to the AnaGram User's Guide for more information on how to
987 deal with conflicts.
988 ##
989
990 Conflicts Resolved by Precedence Rules
991
992 This ©warningª message indicates that AnaGram has
993 resolved conflicts in your grammar by using ©precedence
994 rulesª: guidelines you supplied either by explicit
995 ©precedence declarationsª, by using a ©stickyª
996 statement or ©distinguish lexemesª statement, or
997 implicitly by using a ©disregardª statement. These
998 conflicts are listed in the ©Resolved Conflictsª
999 window, and are not listed in the ©Conflictsª window.
1000 ##
1001
1002 Conflict Token
1003
1004 In any given ©conflictª, there is a ©tokenª for which
1005 an unambiguous ©parser actionª cannot be determined.
1006 This token is called the "conflict token".
1007 ##
1008
1009 Conflict Trace
1010
1011 The Conflict Trace is a ready-made ©Grammar Traceª
1012 which shows you one of perhaps many ways to get to the
1013 state which has the ©conflictª selected by the cursor
1014 bar. The Conflict Trace window is an option in the
1015 ©Auxiliary Windowsª menu for the ©Conflictsª window and
1016 the ©Resolved Conflictsª window.
1017 ##
1018
1019 Const Data
1020
1021 The const data ©configuration switchª controls the use
1022 of CONST qualifiers in generated code. If the switch is
1023 set, all fixed data arrays in the ©parser fileª will be
1024 qualified as CONST, unless the ©old styleª switch is
1025 set. The default setting is ON. Other configuration
1026 switches which control declaration qualifiers in the
1027 parser file are ©near functionsª and ©far tablesª.
1028 ##
1029
1030 CONTEXT
1031
1032 "CONTEXT" is a macro which AnaGram defines for you if
1033 you have defined a ©context typeª. It provides access
1034 to the top value of the ©context stackª. Your
1035 ©GET_CONTEXTª macro may store the current context by
1036 assigning a value to CONTEXT. Suppose your parser uses
1037 ©pointer inputª, and you wish to know the value of the
1038 ©pointerª for every production. You could define
1039 GET_CONTEXT thus:
1040 #define GET_CONTEXT CONTEXT = PCB.pointer
1041
1042 In ©reduction procedureªs, you may use the CONTEXT
1043 macro to find the context for the rule you are
1044 reducing, that is to say, the value the context
1045 variables had when the first token in the rule was
1046 encountered.
1047 ##
1048
1049 Context Stack
1050
1051 It is often convenient, when writing ©reduction
1052 procedureªs, to know the actual context of the ©grammar
1053 ruleª your procedure is reducing. To do this you need
1054 to know the values that certain variables, such as
1055 stack pointers, or input pointers, in your program had
1056 at various stages as your parser matched the rule. You
1057 can accomplish this by maintaining a context stack.
1058
1059 If you wish, AnaGram will keep track, on a stack, of any
1060 context variables you wish. To do so, define a structure
1061 which can hold all the values you need to stack. Use the
1062 ©context typeª ©configuration parameterª to tell AnaGram
1063 how to declare the stack. Then define the ©GET_CONTEXTª
1064 macro to gather the appropriate values and store them on
1065 the stack. The ©CONTEXTª macro evaluates to the proper
1066 location into which the GET_CONTEXT macro should store
1067 the context value. AnaGram will invoke the GET_CONTEXT
1068 macro whenever necessary to make sure the right values
1069 are stacked. In a reduction procedure, you can then use
1070 the macro ©RULE_CONTEXTª to find the value of the
1071 context structure as of the beginning of each token in
1072 the rule you are reducing.
1073
1074 If your parser is ©event drivenª, store the context of
1075 the input token in PCB.input_context. The default
1076 version of GET_CONTEXT will stack the context as
1077 appropriate.
1078
1079 If your parser should encounter an error, you may use
1080 ©ERROR_CONTEXTª to determine the values of the context
1081 variables at the beginning of the aborted grammar rule.
1082 ##
1083
1084 context type
1085
1086 "Context type" is a ©configuration parameterª whose
1087 value is a C type name, possibly as defined by a
1088 typedef statement. By default, "context type" is
1089 undefined. If you define it, AnaGram will set up a
1090 ©context stackª in your ©parser control blockª so you
1091 can track the context of ©productionªs.
1092
1093 Each time your parser pushes values onto the state
1094 stack and value stack it will invoke the ©GET_CONTEXTª
1095 macro to store the current context on the context
1096 stack. The macro ©CONTEXTª names the current stack
1097 location. In your GET_CONTEXT macro you can use it as
1098 the destination for the current context. In a
1099 ©reduction procedureª, CONTEXT names the context as of
1100 the beginning of the production. Two other macros are
1101 available to inspect the values of the context stack.
1102 In a reduction procedure, you may use ©RULE_CONTEXTª[k]
1103 to determine the value of the context variable as it
1104 was as of the (k+1)th token in the rule. In particular,
1105 RULE_CONTEXT[0] is the value the context variable had
1106 when the first token in the rule was seen.
1107
1108 If you enable the ©error frameª ©configuration switchª,
1109 you may use ©ERROR_CONTEXTª to determine the context of
1110 the production your parser was trying to identify at
1111 the time of the error.
1112 ##
1113
1114 CONVERT_CASE
1115
1116 CONVERT_CASE is a user definable macro which AnaGram
1117 invokes to convert the case of input characters when
1118 the ©case sensitiveª switch has been turned off. If
1119 you do not define the macro yourself, AnaGram will
1120 provide a macro which will convert case correctly
1121 for characters in the ASCII character range and
1122 also for ©ISO latin 1ª characters if the corresponding
1123 ©configuration switchª is on.
1124
1125 ##
1126
1127 Coverage File Name
1128
1129 If you have set the ©rule coverageª ©configuration
1130 switchª to include coverage analysis in your parser,
1131 AnaGram uses the value of the coverage file name
1132 ©configuration parameterª to find the results of your
1133 testing. The value of the parameter is a string. The
1134 default value is "#.nrc", where '#' represents the name
1135 of your syntax file.
1136 ##
1137
1138 cs
1139
1140 cs is a field in a ©parser control blockª which
1141 contains your ©context stackª. cs will be defined only
1142 if you have defined the ©configuration parameterª
1143 ©context typeª.
1144 ##
1145
1146 Current Grammar
1147
1148 The Current Grammar is the ©grammarª you presently have
1149 loaded. Its name is displayed on the title bar of
1150 each AnaGram window.
1151
1152 A status field at the right center of the ©Control Panelª
1153 indicates the state of processing that has been
1154 carried out on the grammar.
1155
1156 "Loaded" means that the ©syntax fileª has been read
1157 into memory, but that syntax errors have been found.
1158
1159 "Parsed" means that AnaGram has tried to analyze the
1160 grammar, but got into some kind of difficulty and did
1161 not complete the job. The explanation should be
1162 apparent from the messages in the ©Warningsª window.
1163
1164 "Analyzed" means that a ©grammar analysisª has been
1165 completed, but no ©output filesª have been written.
1166
1167 "Built" means that an analysis has been completed and
1168 output files have been written.
1169 ##
1170
1171 Data Type
1172
1173 The ©tokensª in your ©parserª usually have ©semantic
1174 valuesª. The data types for these values will be
1175 determined by the ©default input typeª and ©default
1176 token typeª ©configuration parameterªs unless you
1177 explicitly provide ©token declarationsª in your grammar.
1178 You may also define the data type for any ©nonterminalª
1179 token by preceding the token name with an ordinary C
1180 cast when you write a production. For example:
1181
1182 (int) integer
1183 -> '0-9':d =d-'0';
1184 -> integer:n, '0-9':d =10*n + d - '0';
1185
1186 The data type may be any simple C or C++ data type, with
1187 arbitrary indirection and qualification. You may also
1188 use any type you have defined by means of typedef,
1189 struct or class definitions. Template classes may also
1190 be used. If you specify a type of your own definition,
1191 you must provide a definition in the ©C prologueª at the
1192 beginning of your ©syntax fileª.
1193
1194 A token may have the type "void" if its value has no
1195 interest for the parser. Since your parser will not
1196 stack a value for a void token, your parser may run
1197 somewhat faster when tokens are declared as void.
1198 ##
1199
1200 Declare pcb
1201
1202 "Declare pcb" is a ©configuration switchª that defaults
1203 to on. If this switch is set when you invoke the ©Build
1204 Parserª command, AnaGram will automatically declare a
1205 ©parser control blockª for you, at the beginning of
1206 your parser file. If you have used data types that you
1207 define yourself, the typedef statements need to precede
1208 the parser control block declaration. In this case, you
1209 should turn "declare pcb" off and declare it yourself.
1210
1211 For more information, see the AnaGram User's Guide.
1212 ##
1213
1214 Default Input Type
1215
1216 The default input type is a ©configuration parameterª
1217 which determines the ©data typeª for the ©semantic
1218 valueªs of ©terminal tokensª if they are not explicitly
1219 declared. Normally, you would explicitly declare
1220 terminal tokens only when you have set the ©input
1221 valuesª ©configuration switchª. If you do not set the
1222 default input type, it will default to "int".
1223
1224 The default data type for the values of ©nonterminal
1225 tokensª is given by the ©default token typeª
1226 configuration parameter.
1227 ##
1228
1229 Default Reduction
1230
1231 "Default reductions" is a ©configuration switchª which
1232 defaults to on.
1233
1234 A "default reduction" is a ©parser actionª which may be
1235 used in your parser in any state which has precisely
1236 one ©completed ruleª.
1237
1238 If a given ©parser stateª has, among its ©characteristic
1239 rulesª, exactly one completed rule, it is usually faster
1240 to reduce it on any input than to check specifically for
1241 correct input before reducing it. The only time this
1242 default reduction causes trouble is in the event of a
1243 ©syntax errorª. In this situation you may get an
1244 erroneous reduction. Normally when you are parsing a
1245 file, this is inconsequential because you are not going
1246 to continue semantic action in the presence of error.
1247 But, if you are using your parser to handle real-time
1248 interactive input, you have to be able to continue
1249 semantic processing after notifying your user that he
1250 has entered erroneous input. In this case you would want
1251 default reductions to have been turned off so that
1252 ©productionªs are reduced only when there is correct
1253 input.
1254 ##
1255
1256 Default reduction value
1257
1258 If a ©grammar ruleª does not have a ©reduction procedureª
1259 the ©semantic valueª of the first token in the rule will
1260 be taken as the semantic value of the token on the left
1261 hand side. If these tokens do not have the same ©data typeª
1262 a ©warningª will be given.
1263 ##
1264
1265 Default Token Type
1266
1267 "Default token type" is a ©configuration parameterª
1268 which determines the ©data typeª for the ©semantic
1269 valueª of a ©nonterminal tokenª if no other type is
1270 explicitly specified. It defaults to void. Therefore, if
1271 any ©reduction procedureª returns a value, you must
1272 either explicitly set the type of the ©reduction tokenª
1273 or you must set default token type to an appropriate
1274 value.
1275
1276 The default token type cannot have a ©wrapperª class
1277 defined.
1278
1279 The default data type for the value of a ©terminal
1280 tokenª is given by the ©default input typeª
1281 configuration parameter.
1282 ##
1283
1284 Definition, Definition Statement
1285
1286 AnaGram syntax files may contain definition statements
1287 which assign new names to ©character setsª, ©virtual
1288 productionsª, ©keyword stringsª, ©immediate actionsª,
1289 or ©tokensª. Definitions have the form
1290 name = <character set>
1291 name = <virtual production>
1292 name = <keyword string>
1293 name = <immediate action>
1294 name = <token name>
1295
1296 For example,
1297 letter = 'a-z' + 'A-Z'
1298 statement list = statement?...
1299 include = "include"
1300
1301 The symbols thus defined may be used anywhere the
1302 expression on the right hand side might be used. Such
1303 definitions, in and of themselves, do not define tokens.
1304 Tokens are defined only by their usage in productions.
1305
1306 ##
1307
1308 DELETE_WRAPPERS
1309
1310 If your parser uses ©wrapperªs and exits with an error condition, there
1311 may be objects remaining on the ©parser value stackª. The DELETE_WRAPPERS macro
1312 can be used to delete any remaining objects on the stack.
1313 If you have enabled
1314 ©auto resynchª, DELETE_WRAPPERS will be invoked automatically.
1315 ##
1316
1317 Diagnose Errors
1318
1319 "Diagnose errors" is a ©configuration switchª which
1320 defaults to on. When this switch is on, AnaGram includes a
1321 function, ag_diagnose(), in your parser which provides simple
1322 syntax error disgnoses. When your parser encounters a
1323 syntax error, this function will be called immediately prior
1324 to the invocation of the ©SYNTAX_ERRORª macro. A pointer to the message will be
1325 stored in the ©error_messageª field of the ©parser control blockª.
1326
1327 If you wish to implement your own ©error diagnosisª, you
1328 should turn this switch off, and include a call to your
1329 own diagnostic procedure in your SYNTAX_ERROR macro.
1330
1331 ag_diagnose() provides three possible error messages,
1332 governed by three macros: ©MISSING_FORMATª, ©UNEXPECTED_FORMATª, and
1333 ©UNNAMED_TOKENª. You may override the definitions of
1334 these macros with your own definitions if you wish
1335 to provide diagnostics in another language
1336
1337 If you have set the ©error frameª
1338 switch it will also set the ©error_frame_tokenª field.
1339 The "error_frame_token" is the non-terminal token which
1340 the parser was trying to complete when the error was
1341 encountered.
1342
1343 When the "diagnose errors" switch is set, AnaGram also
1344 includes the a ©token namesª table in the parser which
1345 contains the ascii names of the tokens in the grammar,
1346 including entries for character constants and keywords.
1347
1348 Use the ©token names onlyª switch to limit the table
1349 to explicitly named tokens only.
1350 ##
1351
1352 MISSING_FORMAT
1353
1354 MISSING_FORMAT is a macro that is used by the error
1355 diagnositic function created by the ©diagnose errorsª
1356 switch. If you do not define it in your parser,
1357 AnaGram will define it thus:
1358 #define MISSING_FORMAT "Missing %s"
1359
1360 This format is used when the diagnostic function can
1361 identify a unique terminal or nonterminal token that
1362 would satisfy the syntactic rules and is named
1363 in the ©token namesª table.
1364 ##
1365
1366 UNEXPECTED_FORMAT
1367
1368 UNEXPECTED_FORMAT is a macro that is used by the error
1369 diagnositic function created by the ©diagnose errorsª
1370 switch. If you do not define it in your parser,
1371 AnaGram will define it thus:
1372 #define UNEXPECTED_FORMAT "Unexpected %s"
1373
1374 This format is used when the diagnostic function cannot
1375 identify a named, unique terminal or nonterminal token that
1376 would satisfy the syntactic rules and finds an
1377 incorrect token, the name of which can be found
1378 in the ©token namesª table.
1379 ##
1380
1381 UNNAMED_TOKEN
1382
1383 UNNAMED_TOKEN is a macro that is used by the error
1384 diagnositic function created by the ©diagnose errorsª
1385 switch. If you do not define it in your parser,
1386 AnaGram will define it thus:
1387 #define UNNAMED_TOKEN "input"
1388
1389 This macro is used as argument for the ©UNEXPECTED_FORMATª
1390 macro when the actual, erroneous input cannot be identified.
1391 ##
1392
1393 Difference
1394
1395 In set theory, the difference of two sets, A and B, is
1396 defined to be the set of all elements of A that are not
1397 elements of B. In an AnaGram ©syntax fileª, you
1398 represent the difference of two ©character setsª by
1399 using the '-' operator. Thus the difference of A and B
1400 is A - B. The difference operator is ©left
1401 associativeª.
1402 ##
1403
1404 Disregard
1405
1406 The purpose of the "disregard" statement is to skip over
1407 uninteresting ©white spaceª and comments in your input
1408 file. It allows you to specify a token that should be
1409 passed over in the input to your parser. The statement
1410 takes the form:
1411 disregard ws
1412 where "ws" is a token name or character set. Disregard
1413 statements, like other ©attribute statementªs, may be
1414 placed in any ©configuration sectionª.
1415
1416 You may have more than one disregard statement in your
1417 ©grammarª. If you do, AnaGram will create a shell
1418 production. For example, suppose you write:
1419 [ disregard alpha
1420 disregard beta ]
1421 AnaGram will proceed as though you had written:
1422 gamma -> alpha | beta
1423 [ disregard gamma ]
1424
1425 It frequently happens that you wish your ©parserª to
1426 disregard blanks or comments, except that ©white spaceª
1427 within names, numbers, strings, and other elementary
1428 constructs is subject to special rules and thus should
1429 not be disregarded blindly. In this case, you can use
1430 the "©lexemeª" statement to declare these constructs off
1431 limits for the disregard statement. Within these
1432 constructs, the disregard statement will be inoperative
1433 and the admissibility of white space is determined
1434 solely by the productions which define these constructs.
1435
1436 Outside those productions which define lexemes, you
1437 should not generally use a token which is supposed to be
1438 disregarded. If you do, your grammar will have
1439 ©conflictªs, since the token could satisfy both the
1440 explicit usage, as well as the implicit rules set up by
1441 the disregard statement. Such conflicts, however, are
1442 resolved automatically in favor of your explicit use of
1443 the token. The conflicts will appear in the ©Resolved
1444 Conflictsª window.
1445
1446 If you have "open ended" lexemes in your grammar such
1447 as variable names or numeric constants, your grammar
1448 will detect a conflict if one of these lexemes may
1449 follow another such lexeme immediately. To deal with
1450 these conflicts, you should turn on the "©Distinguish
1451 Lexemesª" configuration switch. It will cause white
1452 space to be required as a separator between the
1453 lexemes.
1454
1455 In order to implement the "disregard" statement AnaGram
1456 will redefine some tokens in your grammar. For example,
1457 '+' may be redefined to consist of a simple plus sign
1458 followed by optional white space:
1459 '+' -> '+'%, white space?...
1460 The ©percent signª is used to indicate the original,
1461 simple plus without the optional white space attached.
1462 You will probably notice the percent sign appearing in
1463 some windows and traces.
1464 ##
1465
1466 distinguish keywords
1467
1468 "distinguish keywords" is an ©attribute statementª
1469 which you may include in a ©configuration sectionª. It
1470 is used to tell AnaGram how to distinguish ©keywordªs
1471 from similar sequences of characters in your input
1472 stream. For example, you may want your parser to
1473 recognize "int" as a keyword when it appears in the
1474 following context:
1475 int x;
1476 but not when in appears in the middle of such words as
1477 "integral" and "intolerant". The operand of
1478 "distinguish keywords" is a list of character set
1479 ©expressionªs separated by commas and enclosed in braces
1480 ({ }).
1481
1482 Once AnaGram has read your entire syntax file, it
1483 evaluates all of these character sets and tests each
1484 keyword string against the character sets in the order
1485 in which they were encountered in the program. If all
1486 the characters which constitute a particular keyword
1487 are members of the specified set, the keyword logic is
1488 set up so that it will recognize the keyword only if
1489 the immediately following character is not in the set.
1490
1491 In the example above,
1492 [distinguish keywords {'a-z'} ]
1493 will do the trick.
1494
1495 The "©stickyª" statement also affects the recognition
1496 of keywords.
1497 ##
1498
1499 Distinguish Lexemes
1500
1501 The "distinguish lexemes" ©configuration switchª is
1502 used in conjunction with the "©disregardª" statement
1503 and the "©lexemeª" statement to resolve the
1504 ©shift-reduce conflictªs which often crop up when
1505 suppressing white space.
1506
1507 The difficulty with suppressing white space is that you
1508 wish it to be optional in cases like "x+y", where it is
1509 not necessary in order to parse correctly, but you want
1510 to require it in situations such as "mytype x", where
1511 it is necessary to separate otherwise indistinguishable
1512 constructs. If the white space were optional, it would
1513 be necessary to allow for "mytypex", but it would be
1514 impossible to determine if this were to be interpreted as
1515 "mytype x", "mytyp ex", or any of the many other
1516 possibilities.
1517
1518 The distinguish lexemes switch causes AnaGram to make
1519 the white space optional where doing so causes no
1520 ambiguity and makes it mandatory where to make it
1521 optional would lead to ambiguity. In the example given
1522 above, "mytypex" would be treated as a single name, and
1523 another name would have to follow separating white
1524 space.
1525
1526 The default value for distinguish lexemes is OFF. It is
1527 anticipated that this will be changed to ON in future
1528 releases of AnaGram.
1529 ##
1530
1531 Duplicate Production
1532
1533 This ©warningª message appears when a ©productionª
1534 appears twice in your ©grammarª. You will have a
1535 number of ©reduce-reduce conflictªs as a consequence.
1536 Eliminate the duplicate, and the conflicts it caused
1537 will go away.
1538 ##
1539
1540 Edit Command
1541
1542 "Edit command" is a ©configuration parameterª which
1543 accepts a string value. It is no longer used and is
1544 retained only for file compatiblity with the DOS
1545 version of AnaGram.
1546 ##
1547
1548 Embedded C
1549
1550 You may encapsulate pieces of C or C++ code in your ©syntax
1551 fileª more or less arbitrarily. Such pieces of code will
1552 simply be copied to the ©parser fileª in the order in
1553 which they are encountered. Each such piece of code must
1554 be enclosed with braces({}). The left brace must be on a
1555 new line, and nothing except comments may follow the
1556 right brace. AnaGram does not inspect the interior of
1557 such a piece of C code except to identify character
1558 constants, strings, comments and blocks surrounded with
1559 braces so that it does not identify the end of the
1560 embedded C prematurely. Note that AnaGram will use the
1561 status of the ©nest commentsª ©configuration switchª in
1562 effect at the beginning of the embedded C.
1563
1564 AnaGram, of course, can be confused by unterminated
1565 strings, unbalanced brackets, and unterminated comments.
1566 The most likely outcome, in such a situation, is that
1567 AnaGram will encounter an end of file looking for the
1568 end of the embedded C. Should this happen, AnaGram will
1569 identify the beginning of the piece of embedded C which
1570 caused the problem.
1571
1572 If your syntax file begins with a block of embedded C,
1573 called the "©C prologueª", it will be copied to the very
1574 beginning of the parser file, preceding all of AnaGram's
1575 output. You may use such an initial block of embedded C
1576 to guarantee that program title comments, copyright
1577 notices and important definitions are at the very
1578 beginning of your parser file.
1579
1580 The code you include as embedded C, of course, has to
1581 coexist with the code AnaGram generates. In order to
1582 keep the potential for name conflicts to a minimum, all
1583 variables and functions which AnaGram defines begin with
1584 the letters "ag_". You should avoid variable names which
1585 begin with these letters.
1586
1587 If AnaGram finds no embedded C in a syntax file, and you
1588 ask it to build a parser, it will automatically generate
1589 a main program that calls your parser. If you don't want
1590 it to do this, you may turn off the ©main programª
1591 ©configuration switchª.
1592 ##
1593
1594 Empty Keyword String
1595
1596 This ©warningª appears when you have a keyword string
1597 that contains no characters whatsoever. ©Keyword
1598 stringsª must contain at least one character. If you
1599 wish a null match, use a ©null productionª instead.
1600 ##
1601
1602 Enable Mouse
1603
1604 "Enable mouse" is a ©configuration switchª that defaults
1605 to on. It is not used in the Windows version of AnaGram
1606 and has been retained only for file compatibility with
1607 the DOS version.
1608 ##
1609
1610 Enum Constant Name
1611
1612 The "enum constant name" ©configuration parameterª
1613 allows you to select the name AnaGram will use for the
1614 set of enumeration constants it defines in the ©parser
1615 headerª file for your ©parserª. The value of "enum
1616 constant name" should be a string containing the '%'
1617 character. AnaGram will substitute each token name in
1618 turn into this template as it creates the list of
1619 enumeration constants. If it finds a '$' character it
1620 will substitute the name of your parser. The default
1621 value of "enum constant name" is "$_%_token".
1622 ##
1623
1624 Enumeration Constants
1625
1626 In your ©parser headerª file, AnaGram includes a typedef
1627 enum statement which provides enumeration constants
1628 corresponding to all the named constants in your
1629 grammar. The names of the enumeration constants
1630 themselves are defined by the ©enum constant nameª
1631 ©configuration parameterª. These constants are useful
1632 when dealing with ©semantically determined productionsª.
1633 ##
1634
1635 Enum
1636
1637 Within a ©configuration sectionª, you may use an "enum"
1638 statement to define numeric values for any number of
1639 tokens just as you define enumeration constants in C.
1640 The syntax is effectively the same as the enum statement
1641 in C:
1642
1643 [
1644 enum {
1645 first = 60,
1646 second,
1647 third,
1648 fourth = 'a',
1649 fifth,
1650 }
1651 ]
1652
1653 is exactly equivalent to
1654 first = 60
1655 second = 61
1656 third = 62
1657 fourth = 'a'
1658 fifth = 'b'
1659 ##
1660
1661 eof
1662
1663 "eof" is a quasi reserved word in AnaGram, used to
1664 specify an end of file token. You may use another token
1665 as an end of file delimiter by setting the ©Eof Tokenª
1666 ©configuration parameterª. eof is not required unless
1667 you use ©automatic resynchronizationª in your ©parserª.
1668
1669 If you have not defined eof or specified an Eof Token
1670 parameter, ©File Traceª may show a syntax error when it
1671 encounters the end of a test file.
1672
1673 There are various ascii values that are commonly used
1674 to represent an end of file. The end of a string in
1675 memory is commonly 0, DOS uses ^Z, Unix uses ^D, and
1676 Unix style stream I/O uses -1. It is often convenient
1677 then to define
1678
1679 eof = -1 + 0 + ^D + ^Z
1680 ##
1681
1682 Eof Token
1683
1684 "Eof token" is a ©configuration parameterª which accepts
1685 a token name as a value. There is no default value.
1686 AnaGram does not need a specification for the eof token
1687 unless you are using its ©automatic resynchronizationª
1688 facility.
1689
1690 If you use the ©automatic resynchronizationª capability
1691 of AnaGram, you must specify explicitly an end of file
1692 token. You can do this either by defining a ©terminal
1693 tokenª in your ©grammarª called eof or by using the "eof
1694 token" parameter to identify some other terminal token
1695 to be used as the end of file marker. You would do this
1696 only if you must use the name "©eofª" for some other
1697 purpose.
1698
1699 Note that "eof" is case sensitive. Neither Eof nor
1700 EOF will qualify as end of file tokens unless you
1701 explicitly specify them using the eof token parameter.
1702 ##
1703
1704 Eof Token Not Defined
1705
1706 This ©warningª appears if you have requested either
1707 ©error token resynchronizationª or ©automatic
1708 resynchronizationª and you have not defined an ©eof
1709 tokenª. The resynchronization procedure will not work
1710 correctly at end of file.
1711 ##
1712
1713 Error Action
1714
1715 The error action is one of the four ©parser actionªs of a
1716 traditional ©parsing engineª. The error action is
1717 performed when the parser has encountered an input
1718 token which is not admissible in the current state.
1719 The further behavior of a traditional parser is
1720 undefined.
1721 ##
1722
1723 Error Defining
1724
1725 "Error defining TXXX: <token representation>" is a
1726 ©warningª message which appears if errors are encountered
1727 while attempting to evaluate the ©character setª for
1728 the specified ©tokenª. This warning is always generated
1729 in addition to more detailed warnings that are made
1730 when the actual errors are encountered.
1731 ##
1732
1733 Error frame
1734
1735 "Error frame" is a ©configuration switchª which defaults
1736 to off. You use this switch to specify the ©error
1737 diagnosisª capabilities of your parser. If this switch
1738 is set and the ©diagnose errorsª switch is set, i.e.,
1739 on, your parser will include a function which will
1740 determine the "context" of any ©syntax errorª, that is,
1741 the token the parser was trying to complete.
1742
1743 To determine the context of an error, your parser will
1744 scan backwards through the ©parser state stackª,
1745 examining ©characteristic rulesª until it finds a state
1746 which can accept a unique ©nonterminalª reduction token
1747 that you have not marked as ©hiddenª. It will then set
1748 PCB.©error_frame_ssxª to the ©parser stack indexª for
1749 that level.
1750 ##
1751
1752 ERROR_CONTEXT
1753
1754 ERROR_CONTEXT is a macro AnaGram defines for you. If
1755 your parser encounters a ©syntax errorª, you have
1756 enabled the ©error frameª ©configuration switchª, and
1757 you have defined a ©context typeª, ERROR_CONTEXT will
1758 enable you to access the ©contextª as of when the parser
1759 encountered the beginning of the ©error_frame_tokenª.
1760 ##
1761
1762 Error Diagnosis
1763
1764 "Error diagnosis" and ©error recoveryª are the two
1765 aspects of ©error handlingª. If in the ©embedded Cª
1766 portion of your syntax file you define a macro called
1767 ©SYNTAX_ERRORª, it will be invoked by the parser when a
1768 ©syntax errorª is encountered. If you have set the
1769 ©diagnose errorsª ©configuration switchª, the
1770 ©error_messageª field of the ©parser control blockª will
1771 contain a pointer to a string containing a diagnostic
1772 message. The diagnostic is of the form "Missing <token
1773 name>" or "Unexpected <token name>".
1774
1775 If you do not define SYNTAX_ERROR it will be
1776 automatically defined so that a message will be written
1777 to stderr.
1778
1779 If the ©lines and columnsª switch has been set you will
1780 have the current line number and column number available
1781 for your diagnostic message.
1782
1783 If you have set the ©error frameª switch as well as the
1784 diagnose errors switch, the variable
1785 PCB.©error_frame_tokenª will identify the ©nonterminal
1786 tokenª the parser was trying to recognize when the
1787 error was encountered.
1788
1789 Of course, if your parser is controlling direct keyboard
1790 input, a diagnosis might be unnecessary. In this case
1791 you might define SYNTAX_ERROR so that it simply beeps at
1792 the user and let it go at that.
1793 ##
1794
1795 Error Handling
1796
1797 Rarely is a parser built to read an arbitrary input
1798 file. The normal situation is that the parser is built
1799 to read files that conform to the rules specified in a
1800 grammar, rules that describe a class of input files
1801 rather than all possible input files. If the input file
1802 does not conform to the grammar, the parser will detect
1803 a ©syntax errorª.
1804
1805 There are two aspects to error handling in your parser:
1806 ©error diagnosisª and ©error recoveryª. Error diagnosis
1807 consists in informing your user that something
1808 unexpected has happened. Error recovery consists in
1809 either aborting the parse, or getting it started again
1810 in some reasonable manner. AnaGram provides several
1811 options for both error diagnosis and error recovery.
1812
1813 When a syntax error is encountered, first your error
1814 diagnosis option is executed and then your error
1815 recovery option is executed.
1816 ##
1817
1818 error_message
1819
1820 error_message is a field in a ©parser control blockª to
1821 which your ©error handlingª procedures may refer. If you
1822 have set the ©diagnose errorsª ©configuration switchª,
1823 on encountering a ©syntax errorª your ©parserª will
1824 create a string containing an appropriate diagnostic
1825 message and store a pointer to it into
1826 PCB.error_message.
1827 ##
1828
1829 Error Trace
1830
1831 "Error Trace" is both a ©configuration switchª and the
1832 name of an option in the ©Action Menuª. If the switch
1833 is on, AnaGram adds code to your parser to capture
1834 state information to a file in case of a ©syntax errorª. The Error
1835 Trace option can then read this information and prepare a pre-built
1836 ©Grammar Traceª showing you the state of the parser at the time of
1837 the error.
1838
1839 The name of the file is determined by the macro
1840 ©AG_TRACE_FILE_NAMEª. AnaGram will provide a default
1841 definition for the macro consisting of the name of
1842 your ©syntax fileª plus the extension ".etr". You
1843 may override this definition by defining AG_TRACE_FILE_NAME
1844 in your ©embedded Cª.
1845
1846 If error trace is enabled, AnaGram will also enable the
1847 Error Trace option on the ©Action Menuª. If you select
1848 Error Trace AnaGram will initialize a ©Grammar Traceª
1849 window from the error trace file you select. The parser
1850 stack of the trace will be as it was when the error
1851 occurred. The last line of the parser stack pane will
1852 show the ©lookahead tokenª that caused the syntax error. You may
1853 then use the Grammar Trace to explore the nature of
1854 the syntax error your parser encountered.
1855
1856 AnaGram will
1857 warn you if the error trace file is older than
1858 the syntax file, since under those conditions, the
1859 error trace file might be invalid.
1860 ##
1861
1862 AG_TRACE_FILE_NAME
1863
1864 AG_TRACE_FILE_NAME is a C macro used to determine the
1865 name of the file your parser will write when it
1866 encounters a ©syntax errorª if you have enabled
1867 the ©error traceª ©configuration switchª.
1868
1869 You may define AG_TRACE_FILE_NAME in your ©embedded Cª.
1870 AnaGram provides a default definition given by the
1871 name of your ©syntax fileª with the extension ".etr".
1872 ##
1873
1874 Error Recovery
1875
1876 Error recovery is the process of continuing after a
1877 ©syntax errorª. AnaGram offers several options. These
1878 are controlled by ©configuration parameterªs and by
1879 your grammar.
1880
1881 If you do not specify any error recovery, your parser
1882 will simply return to the calling program when it
1883 encounters a syntax error. ©PCBª.©exit_flagª will be set
1884 to two, to indicate termination on syntax error.
1885
1886 If you wish your parser to simply ignore the erroneous
1887 token and continue, set PCB.exit_flag to zero in your
1888 ©SYNTAX_ERRORª macro. You might use this option if your
1889 parser is dealing directly with keyboard input.
1890
1891 You may wish to use YACC type error handling. To do
1892 this, simply incorporate a token called "error" in your
1893 grammar, or specify some other token as an ©error
1894 tokenª. On syntax error, your parser will back up to
1895 the most recent state where "error" was acceptable
1896 input, treat the bad input as an instance of error, and
1897 then skip all input until it finds an acceptable input
1898 token. At that point it will proceed as though nothing
1899 had happened.
1900
1901 AnaGram also provides an ©automatic resynchronizationª
1902 option, which uses a complex heuristic to compare input
1903 tokens against all stacked states in order to find the
1904 best state from which to continue.
1905 ##
1906
1907 Error Token Resynchronization
1908
1909 One of your options for ©error recoveryª after a ©syntax
1910 errorª is a technique similar to that provided in YACC.
1911 You include a terminal token called "error" in your
1912 grammar. (Or, use the ©error tokenª configuration
1913 parameter to specify some other token to serve this
1914 purpose.) When the parser encounters an error in the
1915 input, after invoking the ©SYNTAX_ERRORª macro, it backs
1916 up the ©parser state stackª to the most recent state in
1917 which "error" was an acceptable input. It then shifts to
1918 the new state as though it had seen an actual "error"
1919 token. At this point, it skips over any character in the
1920 input which is not an acceptable input character for
1921 this state. Once it does find an acceptable input
1922 character, it continues processing as though nothing had
1923 happened.
1924 ##
1925
1926 error_frame_ssx
1927
1928 error_frame_ssx is a field in a ©parser control blockª
1929 to which your ©error handlingª routines may refer. When
1930 your ©SYNTAX_ERRORª macro is called, if you have set
1931 both the ©diagnose errorsª and ©error frameª
1932 configuration switches, error_frame_ssx will contain the
1933 value of the ©parser stack indexª at the beginning of
1934 the ©error_frame_tokenª. For example, if in a syntax
1935 file, you fail to close a comment, AnaGram will
1936 encounter an illegal end of file in the comment. In this
1937 situation, error_frame_token is the token for a comment,
1938 and error_frame_ssx gives the parser stack depth at the
1939 beginning of the comment.
1940 ##
1941
1942 error_frame_token
1943
1944 error_frame_token is a field in a ©parser control blockª
1945 to which your ©error handlingª routines may refer. If
1946 you have set both the ©diagnose errorsª and ©error
1947 frameª ©configuration switchªes, when your
1948 ©SYNTAX_ERRORª macro is called, it will contain the
1949 ©token numberª of the error_frame_token.
1950 ##
1951
1952 error, Error Token
1953
1954 "Error token" is a ©configuration parameterª that takes
1955 a token name for a value. It has no default value. If
1956 you do not specify it, and your grammar has a terminal
1957 token called "error", it will be used as the error
1958 token. If you have an error token defined your parser
1959 will presume that you wish to use the ©error token
1960 resynchronizationª method of ©error recoveryª.
1961 ##
1962
1963 Escape Backslashes
1964
1965 "©Escape backslashesª" is a ©configuration switchª that
1966 defaults to off. When turned on, the ©line numbersª switch
1967 will write pathnames with doubled backslashes. The switch
1968 is no longer necessary, since AnaGram now uses forward slashes
1969 in the pathnames in #line directives rather than backslashes.switch.
1970 ##
1971
1972 Event Driven
1973
1974 It is often convenient to configure your parser to be
1975 "event driven". In this situation, instead of calling
1976 your parser once to process the entire input, you call
1977 an ©initializerª to initialize the parser, and then you
1978 call the parser once for each input token. Each time you
1979 call it, the parser processes the single input token
1980 until it can do no more.
1981
1982 You can interrogate the ©exit_flagª field of the
1983 ©parser control blockª to determine whether the parse is
1984 complete or whether the parser encountered an error.
1985
1986 Event driven parsers are especially convenient for
1987 dealing with terminal input or communications protocols.
1988 ##
1989
1990 Event Driven Parser Cannot Use Pointer Input
1991
1992 This ©warningª message appears if you specify pointer
1993 input for your ©parserª and also specify that it should
1994 be event driven. If you are going to use ©pointer
1995 inputª, you should not specify your ©parserª as event
1996 driven. Conversely, if you really want an ©event
1997 drivenª parser, you cannot specify pointer input.
1998 ##
1999
2000 Excessive Recursion
2001
2002 This ©warningª message appears if an internal stack in
2003 AnaGram overflows because of the complexity of an
2004 expression in your ©grammarª. Simplify your grammar by
2005 using ©definitionª statements to name subexpressions.
2006 ##
2007
2008 exit_flag
2009
2010 exit_flag is a field in the ©parser control blockª.
2011 When your parser returns, PCB.exit_flag contains an exit
2012 code describing the outcome of the parse. Mnemonic
2013 values for the exit codes are defined in the parser
2014 header file AnaGram generates. These mnemonics, their
2015 values and their meanings are:
2016 AG_RUNNING_CODE = 0: Parse is not yet complete
2017 AG_SUCCESS_CODE = 1: Parse terminated successfully
2018 AG_SYNTAX_ERROR_CODE = 2: Syntax error encountered
2019 AG_REDUCTION_ERROR_CODE = 3: Bad reduction token encountered
2020 AG_STACK_ERROR_CODE = 4: Parser stack overflowed
2021 AG_SEMANTIC_ERROR_CODE = 5: Semantic error, user defined
2022
2023 An AnaGram parser checks exit_flag on return
2024 from every ©reduction procedureª. AnaGram will exit with
2025 the flag unchanged if it is non-zero. To halt a parse
2026 from a reduction procedure, then, you need only set the
2027 exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value
2028 greater than zero that suits your needs.
2029 ##
2030
2031 Expansion, Expansion Rule
2032
2033 In analyzing a ©grammarª, we are often interested in the
2034 full range of input that can be expected at a certain
2035 point. The expansion of a ©tokenª or state shows us
2036 all the expected input. An expansion yields a set of
2037 ©marked ruleªs. The ©marked tokenª in each rule
2038 shows us what input to expect.
2039
2040 The set of expansion rules of a (©nonterminalª) token
2041 shows all the expected input that can occur whenever the
2042 token appears in the grammar. The set consists of all
2043 the ©grammar ruleªs produced by the token, plus all the
2044 rules produced by the first token of any rule in the
2045 set. A ©marked tokenª for an expansion rule of a token
2046 is the first element in the rule.
2047
2048 The expansion of a state consists of its ©characteristic
2049 ruleªs plus the expansion rules of the marked token in each
2050 characteristic rule.
2051 ##
2052
2053 Expansion Chain
2054
2055 You may select an Expansion Chain window from the
2056 ©Auxiliary Windowsª popup menu of most windows that contain
2057 ©expansion ruleªs.
2058
2059 The Expansion Chain window is extremely useful for
2060 indicating why a particular ©grammar ruleª is an
2061 ©expansion ruleª in a particular state. To see a chain
2062 of productions that produces a desired expansion rule,
2063 select the expansion rule with the cursor bar, press
2064 the right mouse button for the Auxiliary Windows menu, and select
2065 Expansion Chain.
2066
2067 The Expansion Chain window will then present a sequence
2068 of expansion rules, using the same format as the
2069 Expansion Rules window, but subject to the constraint
2070 that each rule is produced by the ©marked tokenª in the previous line.
2071
2072 The first rule in the window is a ©characteristic ruleª
2073 for the given state. The last rule in the window is
2074 the rule selected by the cursor bar in the window from
2075 which you chose the Expansion Chain. It should be noted
2076 that this expansion is not unique. There may be other
2077 derivations.
2078 ##
2079
2080 Expansion Rules
2081
2082 You may select an Expansion Rules window from the
2083 ©Auxiliary Windowsª popup menu of most windows which display
2084 ©marked rulesª. The Expansion Rules window shows the
2085 complete set of ©expansion ruleªs for the ©marked
2086 tokenª in the highlighted rule.
2087
2088 In other windows, including all trace windows, the
2089 Expansion Rules window shows the expansion of the token
2090 on the highlighted line.
2091 ##
2092
2093 F1
2094
2095 Use the F1 key to bring up a context sensitive help window. Because of
2096 various peculiarities of the Windows API, there are a few contexts
2097 where the F1 key does not work; however, generally the ©help cursorª
2098 works where F1 does not and vice versa.
2099
2100 ©Helpª windows have hypertext links to related help windows.
2101 In a help window, the right mouse button pops up a menu of
2102 all the links for the window.
2103 ##
2104
2105 extend pcb
2106
2107 The "extend pcb" statement is an ©attribute statementª that allows you to
2108 add declarations of your own to the ©parser control blockª. With this
2109 feature, data needed by ©reduction procedureªs can be stored in the pcb
2110 rather than in global or static storage. This capability greatly
2111 facilitates the construction of ©thread safe parsersª.
2112
2113 The extend pcb statement may be used in any configuration section.
2114 The format is as follows:
2115 extend pcb { <C or C++ declaration>... }
2116
2117 It may, of course, extend over multiple lines and may contain any
2118 C or C++ declarations. AnaGram will append it to the end of the parser
2119 control block declaration in the generated parser ©header fileª. There may
2120 be any number of extend pcb statements. The extensions are appended to
2121 the pcb in the order in which they occur in the syntax file.
2122
2123 The extend pcb statement is compatible with both C and C++ parsers. Note
2124 that even if you are deriving your own class from the parser control
2125 block, you might want to use the extend pcb to provide virtual function
2126 definitions or other declarations appropriate to a base class.
2127 ##
2128
2129 Far Tables
2130
2131 "Far tables" is a ©configuration switchª which defaults
2132 to off. If it is set, when AnaGram builds a ©parserª it
2133 will declare the larger tables it builds as FAR. This
2134 can be a convenience when using some memory models with
2135 8086 architecture.
2136 ##
2137
2138 Fatal Syntax Errors
2139
2140 This ©warningª message occurs when AnaGram cannot
2141 complete the ©Analyze Grammarª command on your ©syntax
2142 fileª because of errors in your syntax file.
2143 ##
2144
2145 File Trace
2146
2147 You can use the File Trace facility to verify your grammar,
2148 even before you have implemented ©reduction proceduresª or
2149 any other code. Thus you can defer writing procedural code
2150 until you have the grammar working to your specifications.
2151
2152 To run File Trace, select
2153 File Trace from the ©Action Menuª or click on the File Trace button.
2154
2155 Select a test file. When the ©File Trace Windowª appears,
2156 double click at any point in the ©test file paneª, or
2157 click the ©Parse Fileª button to parse the entire file.
2158 AnaGram will parse up to the point you have selected
2159 according to the rules in your ©grammarª. If the test file does not
2160 conform to the rules of the grammar, the parse will halt with a
2161 ©syntax errorª. You can then inspect the ©Parser Stack paneª and the
2162 ©Rule Stack paneª to get an idea of the nature of the problem.
2163
2164
2165 AnaGram uses different colors to
2166 distinguish the portion of the test file that has
2167 been parsed from the portion that has not been parsed,
2168 so the location of the error should be readily apparent.
2169
2170 Since the syntax error often occurs somewhat downstream
2171 from the actual error, you may need to back the parse up
2172 and approach the error slowly. In the Test File pane,
2173 double click at any point prior to the error to back
2174 the parse up to that point. You can then click on the
2175 ©Single Stepª button to perform a single parser action.
2176
2177 You may also use the cursor keys to control the parse.
2178 As long as no error is encountered, the parse is locked
2179 to the blinking cursor. If you cursor past the syntax
2180 error, however, the parse can no longer track the cursor
2181 so the cursor location will differ from the parse location . The
2182 cursor and parse locations will also differ after you single click
2183 at any point other than the current parse location.
2184
2185 When the cursor and the parse location are thus out of synch, the
2186 Single Step button is replaced with a ©Synch Parseª button. You
2187 can click on Synch Parse to get the parse back in synch with the
2188 cursor.
2189
2190 The File Trace option will be greyed out on the ©Action Menuª
2191 if your grammar has ©empty recursionª, since
2192 such a grammar may cause infinite loops in the parser.
2193
2194 Because a File Trace is based on character codes, it will also be greyed out
2195 on the ©Action Menuª if your parser uses ©token inputª rather than
2196 character input.
2197
2198 All parser actions performed by a File Trace update the ©trace
2199 coverageª counts, enabling you to verify the extent to which
2200 your test files exercise your parser.
2201
2202 Normally, AnaGram reads test files in "text" mode,
2203 discarding carriage return characters. If your parser
2204 needs to recognize carriage return characters
2205 explicitly, you should turn the "©test file binaryª"
2206 switch on.
2207 ##
2208
2209 File Trace Window
2210
2211 The ©File Traceª window normally consists of three panes:
2212 The ©Parser Stack paneª
2213 The ©Test File paneª
2214 The ©Rule Stack paneª
2215
2216 If your grammar uses ©semantically determined productionsª,
2217 the ©Reduction Choices paneª will appear when necessary
2218 to allow you to select a ©reduction tokenª. The choice that
2219 you make will be remembered and reused if you should back up
2220 the parse and parse past this point again. The remombered choice
2221 is not made automatically when you use ©Single Stepª. Thus,
2222 if you wish to
2223 change your choice, position the cursor at the location where
2224 the choice must be made and Single Step past the choice.
2225
2226 If you ©reloadª the test file, the choices you have made will
2227 be discarded.
2228
2229 The active pane has
2230 a distinctively colored title panel and cursor bar. You can
2231 use the tab key to tab among the panes. The function of
2232 other keyboard keys depends on which pane is active.
2233
2234 Along the bottom of the File Trace Window is a toolbar with
2235 two status boxes:
2236 ©Parse Locationª
2237 ©Parse Statusª
2238 and five buttons:
2239 ©Single Stepª
2240 ©Parse Fileª
2241 ©Resetª
2242 ©Reloadª
2243 ©Helpª
2244
2245 If the blinking cursor loses synch with the current
2246 parse location, the Single Step button is replaced with
2247 the ©Synch Parseª button.
2248 ##
2249
2250 Grammar Trace Window
2251
2252 The ©Grammar Traceª window normally consists of three panes:
2253 The ©Parser Stack paneª
2254 The ©Allowable Input paneª
2255 The ©Rule Stack paneª
2256
2257 If your grammar uses ©semantically determined productionsª,
2258 the ©Reduction Choices paneª will appear when necessary
2259 to allow you to select a ©reduction tokenª.
2260
2261 The active pane has
2262 a distinctively colored column header and cursor bar. You
2263 can use the tab key to tab among the panes. The function of other
2264 keyboard keys depends on which pane is active.
2265
2266 Along the bottom of the Grammar Trace Window is a toolbar with
2267 a ©Parse Statusª box, a ©text entryª field
2268 and four buttons:
2269 ©Proceedª
2270 ©Single Stepª
2271 ©Resetª
2272 ©Helpª
2273
2274 In the ©Parser Stack paneª you can see a
2275 representation of the ©parser state stackª and ©parser stateª as they
2276 might appear in the course of execution of your ©parserª. You can
2277 examine the ©allowable inputª tokens and see the changes to the
2278 state and the state stack caused by any input token you
2279 choose. The ©Rule Stack paneª shows the relationship between the
2280 contents of the parser stack and your ©grammarª. If your grammar
2281 uses ©semantically determined productionsª, you can select the
2282 appropriate ©reduction tokenª from the ©Reduction Choices paneª.
2283
2284 You can enter text characters directly in the ©text entryª
2285 field. This means you can run a Grammar Trace like a ©File Traceª
2286 where the test file is replaced by the characters you type in the
2287 text entry field. This is a very convenient way to check out your
2288 grammar.
2289 ##
2290
2291 Test File, Test File Pane
2292
2293 In the ©File Traceª, the file under test is displayed in the
2294 upper right pane. To parse to a specific point, double
2295 click at that point.
2296
2297 As long as the parse location and the cursor are synchronized,
2298 when you use the cursor keys to
2299 move the cursor, the parse will track the cursor.
2300
2301 If the parse encounters a ©syntax errorª, it will not be able
2302 to go beyond the location of the error. In this situation,
2303 moving the cursor right or down will cause the cursor position to
2304 differ from the parse location. The parse and cursor positions can also
2305 differ if you single click anywhere in the Test File pane.
2306
2307 If the
2308 parse location and the cursor are thus not synchronized, the
2309 ©Single Stepª button will be replaced with a ©Synch Parseª
2310 button. Click on the Synch Parse button to get the cursor
2311 and the parse back in synch. Of course, the parse will still
2312 not be able to proceed past a syntax error.
2313
2314 In the default color scheme, parsed text is shown on a lighter
2315 background than is unparsed text.
2316
2317 If your grammar uses ©semantically determined productionªs,
2318 the parse will halt when one is encountered and the ©reduction
2319 choices paneª will be displayed so you may select the appropriate
2320 ©reduction tokenª.
2321
2322 At any time you can click on the ©Reset buttonª to reset the parse to
2323 the beginning of the test file. If you modify the test file, you
2324 can click on the ©Reload buttonª to load the modified file and
2325 reset the parse.
2326
2327 Normally, AnaGram reads test files in "text" mode, discarding carriage
2328 return characters. If your parser needs to recognize carriage return
2329 characters explicitly, you should turn the ©test file binaryª
2330 ©configuration switchª on.
2331
2332 Sample test files are provided with the FFCALC and FC ©examplesª.
2333 ##
2334
2335 Parse Location
2336
2337 The current location of the ©File Traceª parser in the
2338 ©test file paneª. The format is <line number>:<column number>.
2339 ##
2340
2341 Parse Status
2342
2343 The current state of the ©File Traceª or ©Grammar Traceª parser.
2344
2345  Ready: The parser is ready for input.
2346  Running: The parser is processing input.
2347  Parse Complete: The parser has reached the end of the input. Click
2348 on ©resetª or ©reloadª to restart the parse.
2349  Syntax error: A syntax error has been encountered. The parser cannot
2350 go any further.
2351  Unexpected end of file: The parser has reached the end of the actual
2352 input but the grammar still expects more.
2353  Select reduction token: The parser encountered a ©semantically determined
2354 productionª. Select a ©reduction tokenª from the ©Reduction Choices paneª.
2355  Selection error: The reduction token selected from the Reduction Choices
2356 pane was not allowable input in the present state. Select another
2357 reduction token.
2358 ##
2359
2360 Parse File
2361
2362 Use the Parse File button in the ©File Traceª to parse all the way
2363 to the end of file. The parse will not stop until it encounters a
2364 ©syntax errorª, a ©semantically determined productionª, or the end of file.
2365 ##
2366
2367 Reset
2368
2369 Use the Reset button in the ©File Traceª or ©Grammar Traceª to reset
2370 the parse to its initial state. This is most convenient when using
2371 a ©Conflict Traceª, ©Error Traceª, or other ©Auxiliary Traceª
2372 since these traces seldom begin at state 0.
2373 ##
2374
2375 Reload
2376
2377 The Reload button in the ©File Trace Windowª rereads the test file.
2378 This is convenient if you modify the test file while you are testing
2379 the ©grammarª.
2380 ##
2381
2382 Lookahead Token
2383
2384 In an ©LALR-1 parserª the "lookahead token" is the next token to be
2385 processed. For each ©parser stateª there is a list of tokens that
2386 may be seen in this state. For each token there is a corresponding
2387 ©parser actionª. The parser scans the list looking for the lookahead
2388 token and then performs the corresponding parser action. If the
2389 lookahead token cannot be found and there is no ©default reductionª,
2390 the parser signals a ©syntax errorª.
2391
2392 In File Trace, and in some circumstances in Grammar Trace, the
2393 lookahead token can be seen on the last line of the
2394 ©Parser Stack paneª.
2395 ##
2396
2397 GET_CONTEXT
2398
2399 If you have defined a "©context typeª" ©configuration
2400 parameterª, and wish to maximize the performance of your
2401 parser, you should write a GET_CONTEXT macro which
2402 stores the context of the input token directly in
2403 ©CONTEXTª, the current stack location. Otherwise, you
2404 can write your ©GET_INPUTª macro so that it stores
2405 context into ©PCBª.©input_contextª. The default
2406 definition for GET_CONTEXT will then copy
2407 PCB.input_context to the ©context stackª at the
2408 appropriate time.
2409 ##
2410
2411 GET_INPUT
2412
2413 GET_INPUT is a macro which you should define to control
2414 ©parser inputª if your
2415 parser is not ©event drivenª and you are not using
2416 ©pointer inputª. If you don't define it, AnaGram will
2417 define it by default to read a single character from
2418 stdin:
2419
2420 #define GET_INPUT (PCB.input_code = getchar())
2421
2422 ©PCBª.©input_codeª is an integer field in the ©parser control blockª
2423 which is used to hold the current character code. You
2424 may also want GET_INPUT to set the values of ©input_contextª or
2425 ©input_valueª. It may call an input function, or it may execute
2426 in-line code when it is invoked.
2427 ##
2428
2429 iso latin 1
2430
2431 The "iso latin 1" ©configuration switchª controls case
2432 conversion on input characters when the ©case sensitiveª
2433 switch is set to off. When "iso latin 1" is set, the
2434 default ©CONVERT_CASEª macro is defined to convert
2435 correctly all characters in the latin 1 character set.
2436 When the switch is off, only characters in the ASCII
2437 range (0-127) are converted.
2438 ##
2439
2440 Dragon Book
2441
2442 The "dragon book" is the classic reference on formal parsing:
2443 Compilers: Principles, Techniques, and Tools
2444 Aho, Sethi, and Ullman
2445 Addison-Wesley, 1986.
2446
2447 It is called the "dragon book" because of its
2448 colorful cover illustration showing a knight in
2449 armour ("data flow analysis") armed with sword
2450 ("©LALR parser generatorª") and shield ("syntax
2451 directed translation") at his PC attacking a
2452 bright red dragon ("complexity of compiler design").
2453 ##
2454
2455 LALR-1 Parser
2456
2457 An LALR-1 parser is a ©parserª created from a
2458 ©grammarª by an ©LALR parser generatorª.
2459 ##
2460
2461 LALR Parser Generator
2462
2463 LALR(k) (LookAhead Left-to-right Rightmost derivation)
2464 parser generators are
2465 programs that create parsers algorithmically from
2466 formal grammars. The (k) refers to the number of
2467 lookahead symbols used to make parsing decisions.
2468 Normally, k = 1.
2469
2470 LALR parsers are a subset of the class of
2471 so-called LR parsers. LALR parsers are generally more compact
2472 and less costly to create. These advantages are
2473 obtained at a slight sacrifice in generality. Although
2474 is possible to contrive an LR grammar which has
2475 ©conflictªs when analyzed with the LALR algorithm,
2476 such situations rarely occur in practice, and can
2477 be easily resolved by rewriting a few rules.
2478
2479 In the ©dragon bookª, section 4.7, the authors list the following
2480 attractive properties of LR parsing:
2481  LR parsers can be constructed to recognize virtually
2482 all programming-language constructs for which context-free
2483 grammars can be written.
2484  The LR parsing method is the most general nonbacktracking
2485 shift-reduce parsing method known, yet it can be implemented as
2486 efficiently as other shift-reduce methods.
2487  The class of grammars that can be parsed using LR methods is
2488 a superset of the class of grammars that can be parsed with
2489 predictive parsers.
2490  An LR parser can detect a syntactic error as soon as it is
2491 possible to do so on a left-to-right scan of the input.
2492 ##
2493
2494 Getting Started
2495
2496 AnaGram is an ©LALR parser generatorª. Its input is
2497 a ©syntax fileª, which you prepare with an ordinary
2498 programming editor. Its output is a ©parser fileª. which
2499 you can compile with a C or C++ compiler on any platform
2500 and link into your program. To compile on Unix platforms, set
2501 the ©no crª ©configuration switchª.
2502
2503 AnaGram has extensive context-sensitive hypertext
2504 ©helpª. In any AnaGram window, press ©F1ª or select an item with the
2505 ©Help Cursorª. Further documentation in HTML format, including
2506 documentation of examples, is found in the html subdirectory. AnaGram
2507 also has a comprehensive hard-copy manual, the AnaGram User's Guide.
2508
2509 If you are new to AnaGram, you might begin by reviewing the Help
2510 Topics ©How AnaGram Worksª and ©Program Developmentª, and looking at
2511 An Annotated Example and Summary of AnaGram Notation in the HTML
2512 documentation.
2513
2514 If you are not already familiar with formal parsing techniques, you
2515 may want to read Introduction to Syntax Directed Parsing in the HTML
2516 documentation. Note also the Fahrenheit to Celsius conversion
2517 examples in the examples/fc directory, which comprise a graded
2518 sequence of syntax files illustrating most of the basic
2519 principles of ©syntax directed parsingª in easy steps. Documentation
2520 is in html/fc.html.
2521
2522 AnaGram has many features, many of which are not
2523 commonly found in parser generators:
2524  the ©configuration sectionª
2525  ©thread safe parsersª
2526  C++ support
2527  the ©disregardª and ©lexemeª statements
2528  ©event drivenª parsers
2529  ©character setsª
2530  ©virtual productionsª
2531  ©File Traceª, ©Grammar Traceª
2532  ©automatic resynchronizationª
2533  ©error token resynchronizationª
2534
2535 To familiarize yourself with the many options available for configuring
2536 your parsers, select ©Configuration Parametersª from the ©Browse Menuª.
2537 Use ©F1ª or the ©Help Cursorª to pop up explanations of the various
2538 parameters.
2539
2540
2541 If you don't find the information you need, please visit the
2542 AnaGram web page at http://www.parsifalsoft.com for further
2543 information and support.
2544
2545 ##
2546
2547 How AnaGram Works
2548
2549 AnaGram contains an ©LALR Parser Generatorª which creates a
2550 table driven ©LALR-1 parserª from a ©grammarª written in a variant
2551 of Backus-Naur Form. AnaGram works in two steps. In the
2552 first step, or analysis phase, it reads a ©syntax fileª and
2553 compiles a number of tables describing the grammar. In the
2554 second step, or build phase, it writes two output files:
2555 a ©parser fileª written in C or C++ and a ©header fileª.
2556
2557 Syntax files normally have the extension .syn. The rules for
2558 writing syntax files are given in the AnaGram User's Guide
2559 and in the Summary of AnaGram Notation in the HTML documentation.
2560
2561 The header file contains definitions and declarations, including
2562 the definition of a ©parser control blockª.
2563
2564 The parser file consists of:
2565  The ©C prologueª, if any.
2566  Definitions and declarations provided by AnaGram.
2567  ©Reduction procedureªs.
2568  a customized ©parsing engineª.
2569  a ©parse functionª to be called when input is to be parsed.
2570
2571 The name of the parser file is controlled by the ©parser
2572 file nameª ©configuration parameterª. The name of the
2573 parse function itself is controlled by ©parser nameª. In the
2574 default case, the parser file will have the same name as
2575 the syntax file, with the extension .c. The name of the
2576 parse function is given by the ©parser nameª parameter. It defaults
2577 to the name of the syntax file.
2578 ##
2579
2580 Examples
2581
2582 The EXAMPLES directory of the AnaGram distribution disk
2583 contains a number of examples to help you get started.
2584 Documentation for the examples, in HTML format, is located
2585 in the html directory (start at index.html or examples.html).
2586
2587 The traditional Hello, World, in examples/hw, is a good
2588 example for getting familiar with the mechanical
2589 procedures of building both C and C++ parsers from
2590 ©syntax fileªs.
2591
2592 The Fahrenheit/Celsius conversion examples in the
2593 examples/fc directory on your AnaGram diskette comprise
2594 a graded sequence of syntax files which illustrate
2595 most of the basic principles of ©syntax directed
2596 parsingª in easy steps. In addition, these examples
2597 demonstrate many features of AnaGram which are not
2598 found in other parser generators:
2599  the ©configuration sectionª
2600  ©character setsª
2601  ©virtual productionsª
2602  ©error token resynchronizationª
2603  ©File Traceª
2604  the ©disregardª and ©lexemeª statements
2605  ©event drivenª parsers
2606
2607 The Four Function Calculator (examples/ffcalc) is used
2608 traditionally to demonstrate parser generators. If you
2609 are already familiar with ©syntax directed parsingª this
2610 example will give you a good overview of the basics of
2611 AnaGram. An annotated version of this example may be
2612 found in AnaGram's HTML documentation.
2613 The FFCALC example illustrates the use of ©precedence
2614 rulesª to resolve ©conflictsª.
2615
2616 Other examples are available to demonstrate additional
2617 features of AnaGram.
2618
2619 RCALC (examples/rcalc) is a simple four function
2620 calculator which accepts roman numeral input. It
2621 illustrates the following AnaGram features:
2622  ©pointer inputª
2623  ©SYNTAX_ERRORª macro
2624  ©context stackª
2625
2626 DSL (examples/dsl) is a complete DOS script language,
2627 which provides capabilities well in excess of DOS batch
2628 files. DSL is a complete working program, used in the
2629 past to create AnaGram's install program. Some of the
2630 specific features of AnaGram which it illustrates are:
2631  ©distinguish lexemesª
2632  ©distinguish keywordsª
2633  ©far tablesª
2634
2635 MPP is a fully functional macro preprocessor for C or
2636 C++. Included with MPP are two C grammars, either of
2637 which may be incorporated into MPP. MPP uses several
2638 parsers that work together:
2639  TS.SYN is the primary token scanner parser that
2640 identifies tokens, and handles preprocessor
2641 commands.
2642  MAS.SYN is used to do macro argument substitution.
2643  CT.SYN is used to identify tokens that result from
2644 string concatenation during macro argument
2645 substitution.
2646  EX.SYN is used to evaluate constant expressions in
2647 #if preprocessor statements.
2648
2649 Among the more powerful features of AnaGram that MPP
2650 illustrates are:
2651  ©semantically determined productionsª
2652  ©event drivenª parsers
2653 ##
2654
2655 Goal, Goal Token, Start Token
2656
2657 The ©grammar tokenª is the token which represents the
2658 "top level" in your grammar. Some people refer to it as
2659 the "goal" or "goal token" and others as the "start
2660 token". Whichever it is called, it is the single token
2661 which describes the complete input to your parser.
2662
2663 The most common way to specify a grammar token is as
2664 follows:
2665 grammar -> statements?..., eof
2666 This production tells AnaGram that the input to your
2667 parser consists of a (possibly empty) sequence of
2668 statements followed by an end of file token.
2669
2670 There are a number of ways of specifying which token in
2671 your ©syntax fileª represents the top level of your
2672 grammar. You may simply name it "grammar", or you may
2673 tag it with a '$' character when you define it, or you
2674 may set the ©grammar tokenª ©configuration parameterª.
2675
2676 If you should inadvertently tag several tokens with the
2677 '$' character and/or set the grammar token parameter,
2678 it is the last such specification in the file which
2679 wins. Some people develop their grammars bottom up,
2680 gradually adding new levels of complexity. In the
2681 course of development, they may specify a number of
2682 tokens as grammar tokens and forget to remove the old
2683 specifications.
2684
2685 Notice that if you define the token "grammar" anywhere
2686 in your syntax and specify the grammar token otherwise,
2687 "grammar" will not be the grammar token. This is to
2688 keep "grammar" from being a reserved word. If you need
2689 to use it in your syntax for something other than the
2690 whole grammar, you are free to do so.
2691 ##
2692
2693 Grammar
2694
2695 Traditionally, a "grammar" is a set of ©productionªs
2696 which taken together specify precisely a set of
2697 acceptable input streams, in terms of an abstract set
2698 of ©terminal tokensª. The set of acceptable input
2699 streams is often called the "language" defined by the
2700 grammar.
2701
2702 In AnaGram, the term "grammar" also includes
2703 ©configuration sectionsª as well as the ©definitionsª
2704 of ©character setsª and ©virtual productionsª which
2705 augment the collection of productions. The term is
2706 often used in contrast to the term "©syntax fileª"
2707 which is used to signify the complete AnaGram source
2708 file including reduction procedures and embedded C or
2709 the term "©parserª" which refers to AnaGram's output
2710 file.
2711
2712 A grammar is often called a "syntax", and the rules of
2713 the grammar are often called syntactic rules.
2714 ##
2715
2716 Grammar Analysis
2717
2718 The major function of AnaGram is the analysis of
2719 context-free grammars written in a particular variant
2720 of Backus-Naur Form.
2721
2722 The analysis of a grammar proceeds in four stages. In
2723 the first, the input grammar is analyzed and a number
2724 of tables are built which describe all of the
2725 ©productionªs and components of the ©grammarª.
2726
2727 In the second stage, AnaGram analyzes all of the
2728 character sets defined in the grammar, and where
2729 necessary, defines auxiliary tokens and productions.
2730
2731 In the third stage, AnaGram identifies all of the
2732 states of the parser and builds the go-to table for the
2733 parser.
2734
2735 In the fourth stage, Anagram identifies ©reduction
2736 tokensª for each completed ©grammar ruleª in each state
2737 and checks for ©conflictªs.
2738
2739 Use the ©Analyze Grammarª command to cause AnaGram to
2740 analyze your grammar.
2741 ##
2742
2743 Grammar Is Ambiguous
2744
2745 This ©warningª message appears if your ©grammarª
2746 contains ©conflictªs. AnaGram will resolve ©shift-reduce
2747 conflictsª by selecting the shift option. It will
2748 resolve ©reduce-reduce conflictsª by selecting from the
2749 conflicting ©grammar ruleªs the one which appears first
2750 in the ©syntax fileª.
2751 ##
2752
2753 Grammar Rule
2754
2755 A "grammar rule" is the right hand side of a production.
2756 It is a sequence of ©rule elementsª. Each rule element
2757 identifies some token, which can be either a ©terminal
2758 tokenª or ©nonterminal tokenª.
2759
2760 A grammar rule is "matched" by a
2761 corresponding sequence of tokens in the input stream to
2762 the parser. The rule elements in the grammar rule may be
2763 ©token nameªs, ©set expressionsª, ©character constantsª,
2764 ©immediate actionªs, ©keyword stringsª, or ©virtual
2765 productionsª.
2766
2767 A grammar rule may be followed by an
2768 optional ©reduction procedureª. The ©semantic valuesª of
2769 the tokens that comprise the rule may be passed to the
2770 reduction procedure by using ©parameter assignmentsª.
2771
2772 A grammar rule always makes up the right hand side of a
2773 production. The left hand side of the production
2774 identifies one or more ©nonterminal tokensª, or
2775 ©reduction tokensª, to which the rule reduces when
2776 matched. If there is more than one reduction token,
2777 the production is called a ©semantically determined productionª and
2778 there should be a ©reduction procedureª to select
2779 the correct reduction token at run time.
2780 ##
2781
2782 Grammar Token
2783
2784 The "grammar token" ©configuration parameterª may be
2785 used to specify the ©goalª, or "start" token for the
2786 syntax analyzer portion of AnaGram. Alternatively, you
2787 could simply call the token "grammar", or you could
2788 append a '$' character to it when you define it.
2789
2790 Each grammar must have a grammar token specified before
2791 it can be analyzed or before a parser can be built. The
2792 grammar token is the single token to which the grammar
2793 finally condenses. When this token is identified by the
2794 parser, the parse is complete.
2795 ##
2796
2797 Grammar Trace
2798
2799 AnaGram's Grammar Trace facility lets you examine the workings of your
2800 ©parserª in detail. You can use the Grammar Trace as soon as you have
2801 analyzed your ©grammarª, even before you have written any ©reduction
2802 procedureªs or other code. Thus you can defer writing procedural code
2803 until you have the grammar working to your specifications.
2804
2805 Select the ©Grammar Trace Windowª
2806 from the ©Action Menuª or click on the Grammar Trace
2807 button.
2808
2809 In the ©Parser Stack paneª you can see a representation of the
2810 ©parser state stackª and ©parser stateª as they might appear in the
2811 course of execution of your ©parserª. The ©Rule Stack paneª shows the
2812 relationship between the contents of the parser stack and your
2813 ©grammarª. If your grammar uses ©semantically determined
2814 productionsª, you can select the appropriate ©reduction tokenª from
2815 the ©Reduction Choices paneª.
2816
2817 At any stage, the ©Parser Stackª represents a parse
2818 in progress. It shows the sequence of ©tokenªs that have
2819 been input so far and the states in which they were
2820 seen. When a production is complete and the grammar rule
2821 is reduced, the tokens that make up the rule are removed
2822 from the stack and replaced by the token on the left
2823 side of the production. Initially, the Parser Stack contains
2824 only a ©lookahead lineª.
2825
2826 To explore your grammar, choose ©tokenªs one by one from
2827 the ©Allowable Inputª
2828 pane. This pane shows the tokens allowable at the current state of the
2829 grammar, and the actions that result when the tokens are chosen.
2830
2831 You can also enter text characters directly in the ©text entryª
2832 field. This means you can run a Grammar Trace like a ©File Traceª
2833 where the test file is replaced by the characters you type in the
2834 text entry field. This is a very convenient way to check out your
2835 grammar. Text entry is, of course, not appropriate for grammars that
2836 expect ©token inputª.
2837
2838 In a ©File Traceª you can advance the parse no matter which pane is
2839 active. In a Grammar Trace there is a question as to whether input is
2840 intended to come from the Allowable Input pane or the text entry
2841 field. Therefore the parse can only be advanced when one of these
2842 two is active to indicate that it is the source of input.
2843
2844 Specialized prebuilt Grammar Traces such as the ©Conflict Traceª and
2845 the ©Auxiliary Traceª can be selected from ©Auxiliary Windowsª popup
2846 menus where appropriate.
2847
2848 All Grammar Trace activity updates the ©trace coverageª counts.
2849 ##
2850
2851 Text Entry
2852
2853 It is sometimes more convenient to enter text in the
2854 text entry box on the ©Grammar Traceª toolbar than to
2855 select individual tokens from the ©Allowable Input paneª.
2856
2857 By entering text you can proceed quickly to a troublesome
2858 state without having to choose each individual token
2859 en route.
2860
2861 After entering text, press Enter or click on the Proceed
2862 button to parse the text. Click on the single step button
2863 to work slowly through the text step by step.
2864 ##
2865
2866 header file name
2867
2868 The "header file name" parameter names the ©parser
2869 headerª file that AnaGram will generate when it builds
2870 your parser. This header file can be used with your
2871 parser or with other modules in your program. The
2872 header file contains a number of typedef statements and
2873 an number of macro definitions which are needed in your
2874 parser and may be useful in other modules.
2875
2876 If the value of this parameter contains a '#' character,
2877 AnaGram will substitute the name of your syntax file for
2878 the '#'. The default value of "header file name" is
2879 "#.h".
2880 ##
2881
2882 Help, Using Help
2883
2884 There are 3 main ways to access AnaGram Online Help:
2885  Press F1 for context-sensitive help from most windows and menu items.
2886  Similarly, use the ©Help Cursorª from most windows and menu items.
2887  From the Help menu, you can bring up ©Help Topicsª and choose a topic.
2888
2889 You can also get fly-over help for the toolbar buttons on the ©Control
2890 Panelª. File and Grammar Traces have a Help button.
2891
2892 AnaGram's Help windows, unlike most others, remain on-screen until you
2893 dismiss them. This means you can refer to several topics at once. They
2894 have hypertext links to other Help topics. Also, right-clicking
2895 the mouse on a Help window or pressing F1 will pop up an Auxiliary
2896 Windows menu of all linked topics in the window. "Using Help" is always
2897 available from this popup menu.
2898
2899 Note that, for the ©Warningsª, ©Configuration Parameterªs and ©Help
2900 Topicsª windows, F1 will give you help for the item
2901 on the highlighted line, whereas the Help Cursor allows you
2902 to select any line by clicking on it.
2903
2904 AnaGram also has documentation in HTML format, indexed in the index.html
2905 file. This documentation covers Getting Started, examples, and some
2906 further topics mainly condensed from the User's Guide. Hard copy
2907 documentation is in the AnaGram User's Guide, which has the most
2908 detail.
2909 ##
2910
2911 Hidden
2912
2913 In a ©configuration sectionª of your grammar you may use
2914 an ©attribute statementª to declare one or more tokens
2915 to be "hidden". Tokens that are "hidden" do not appear
2916 in the ©token namesª table, and thus do not appear in syntax error
2917 diagnoses. When your parser attempts to determine the
2918 ©error frameª of a ©syntax errorª, it will disregard the
2919 tokens that have been declared hidden. The hidden
2920 declaration consists simply of the keyword hidden
2921 followed by a list of tokens, separated by commas and
2922 enclosed in braces ({ }):
2923 [ hidden { widget, wombat, foo, bar } ]
2924
2925 You would use the "hidden" attribute primarily for
2926 tokens whose name would not mean anything to your users.
2927 ##
2928
2929 Immediate Action
2930
2931 Immediate actions are snippets of C code which are to
2932 be executed in the middle of a ©grammar ruleª. Immediate
2933 actions are denoted by a '!' character followed by
2934 either a C expression, terminated by a semicolon; or a
2935 block of C code enclosed in braces. For example, in a
2936 simple desk calculator example one might write the
2937 following:
2938 transaction
2939 -> !printf('#');, expression:x =printf("%d\n",x);
2940
2941 Notice that the only apparent difference between an
2942 immediate action and a ©reduction procedureª is that the
2943 immediate action is preceded by '!' instead of '='.
2944 Notice that the immediate action must be followed by a
2945 comma to separate it from the following ©rule elementª.
2946
2947 Immediate actions may also be used in ©definitionªs:
2948 prompt = !printf('#');
2949
2950 The above example, using this definition would then be:
2951 transaction
2952 -> prompt, expression:x =printf("%d\n",x);
2953
2954 You could accomplish the same result by writing a ©null
2955 productionª and a reduction procedure:
2956 prompt
2957 -> =printf('#');
2958
2959 This is exactly how AnaGram implements immediate
2960 actions.
2961 ##
2962
2963 Implementation Errors
2964
2965 "Implementation errors" are errors your parser detects
2966 which are not the immediate result of bad input. When
2967 it encounters an implementation error, your parser will
2968 call a macro which you can define to deal with the
2969 problem in a manner suitable to your needs. If you don't
2970 provide these macros, AnaGram will make default
2971 definitions. There are two macros corresponding to two
2972 implementation errors:
2973 ©PARSER_STACK_OVERFLOWª
2974 ©REDUCTION_TOKEN_ERRORª
2975 ##
2976
2977 Inappropriate Value
2978
2979 This ©warningª message appears when the value assigned to
2980 a ©configuration parameterª is not appropriate to that
2981 parameter. Check the definition of the parameter, by
2982 opening the ©Configuration Parameters Windowª,
2983 selecting the parameter and pressing F1.
2984 ##
2985
2986 Initializer
2987
2988 For every ©parserª it generates, AnaGram generates an
2989 "initializer" function to call the parser. AnaGram
2990 names the initializer by prefixing the ©parser nameª
2991 with "init_". If your parser is ©event drivenª, you must
2992 call the initializer before you call the parser.
2993
2994 If your parser is not event driven, AnaGram will
2995 normally include a call to the initializer in the
2996 parser. If you wish to be able to call your parser more
2997 than once without its being re-initialized, you may turn
2998 off the ©auto initª ©configuration switchª. When you do
2999 this, you assume responsibility for calling the
3000 initializer. If your parser is event driven, you must
3001 always call the initializer function.
3002
3003 If the ©reentrant parserª switch is set, the initializer takes
3004 a pointer to the ©parser control blockª as its sole argument. Otherwise
3005 it takes no arguments. The initializer returns no value. All
3006 communication is by means of the ©parser control blockª.
3007 ##
3008
3009 Input Character
3010
3011 The actual unit of ©parser inputª is usually a
3012 single character. Note that you are not limited to
3013 eight-bit characters. Your parser will use the input
3014 character to index a translation table, ©ag_tcvª, to
3015 determine the ©token numberª for that character. The
3016 ©token numberª identifies the actual syntactic token.
3017 The character code itself will be the ©semantic valueª
3018 of the token. Note that AnaGram groups together all
3019 input characters that are syntactically
3020 indistinguishable into a single input token.
3021 ##
3022
3023 input_code
3024
3025 input_code is a field in the ©parser control blockª
3026 which contains the current ©input characterª, or, if your
3027 ©GET_INPUTª macro supplies ©token numberªs directly, the
3028 token number.
3029
3030 If you write your own ©GET_INPUTª macro, you must make
3031 sure that you store the input character, or token
3032 number, you get into ©PCBª.input_code.
3033 ##
3034
3035 INPUT_CODE(t)
3036
3037 If you set both the ©pointer inputª and the ©input
3038 valuesª ©configuration parameterªs, you must provide an
3039 INPUT_CODE macro for your parser. In this situation,
3040 your parser will use the pointer to load the
3041 ©input_valueª field of the ©parser control blockª and
3042 uses the INPUT_CODE macro to extract the appropriate
3043 value for the ©input_codeª field. For example, if the
3044 input_value is a structure and the appropriate member
3045 field is called "id" you would write:
3046
3047 #define INPUT_CODE(t) (t).id
3048 ##
3049
3050 input_context
3051
3052 "input_context" is a field which AnaGram adds to the
3053 definition of the ©parser control blockª structure when
3054 you define a ©context typeª ©configuration parameterª.
3055 If you choose, you can write your GET_INPUT macro so
3056 that it stores the context value in ©PCBª.input_context.
3057 The default definition for ©GET_CONTEXTª will then stack
3058 the context value at the appropriate time. You can think
3059 of PCB.input_context as a sort of temporary "parking
3060 place" for the context value.
3061 ##
3062
3063 Input Scan Aborted
3064
3065 This ©warningª message appears if AnaGram is unable to
3066 finish scanning your ©syntax fileª because of previous
3067 errors.
3068 ##
3069
3070 input values
3071
3072 "Input values" is a ©configuration switchª which
3073 defaults to off. If your ©parser inputª includes
3074 explicit ©token valueªs which are not simply the ascii
3075 values of corresponding ascii input characters, you must
3076 set the "input values" switch to inform AnaGram. Unless
3077 your parser is ©event drivenª or uses ©pointer inputª,
3078 you must also provide your own ©GET_INPUTª macro.
3079
3080 If your parser uses pointer input, you must provide an
3081 ©INPUT_CODE(t)ª macro.
3082
3083 The semantic value of an input token is to be stored in the
3084 ©input_valueª field of the parser control block.
3085 ##
3086
3087 input_value
3088
3089 input_value is a field in the ©parser control blockª
3090 which is used to store the semantic value of the input
3091 token.
3092
3093 If you write your own ©GET_INPUTª macro, and you have
3094 set the ©input valuesª ©configuration switchª, you
3095 should make sure that you store the value of the ©input
3096 characterª or token into ©PCBª.input_value.
3097 ##
3098
3099 Internal Error
3100
3101 "AnaGram internal error: ..." is a ©warningª message which
3102 appears if one of AnaGram's internal consistency tests
3103 fails. This message should never appear if AnaGram is
3104 working properly. Usually AnaGram will abort on
3105 encountering an internal error, although under
3106 a small set of circumstances it may continue. Should
3107 this happen, it would be wise to close AnaGram and
3108 restart it.
3109
3110 If you do get an internal error, please note the complete
3111 message identifing the problem and file a bug report,
3112 following the directions posted on the AnaGram web page
3113 at http://www.parsifalsoft.com.
3114 A copy of the relevant
3115 syntax file and a summary of the circumstances surrounding
3116 the problem would be greatly appreciated.
3117 ##
3118
3119 Intersection
3120
3121 In set theory, the intersection of two sets, A and B, is
3122 defined to be the set of all elements of A which are
3123 also elements of B. In an AnaGram ©syntax fileª, the
3124 intersection of two ©character setsª is represented with
3125 the '&' operator. The intersection operator has lower
3126 ©precedenceª than the ©complementª operator, but higher
3127 precedence than the ©unionª and ©differenceª operators.
3128 The intersection operator is ©left associativeª.
3129 ##
3130
3131 Keyboard Support
3132
3133 AnaGram can be controlled entirely from the keyboard. In the Control
3134 Panel, you
3135 can tab to any button and press Enter to select it. In addition to
3136 the conventional
3137 Windows keyboard functions, the following keys have been implemented:
3138  Escape closes any AnaGram window except the Control Panel.
3139  F8 toggles between an active AnaGram window and the Control Panel
3140  F10 accesses the Control Panel menu from any
3141 AnaGram Window.
3142  Shift F10 pops up the Auxiliary Windows menu
3143 ##
3144
3145 Keyword, Keyword String
3146
3147 Keywords are a very important feature of AnaGram. They
3148 provide an easy way to pick up special character
3149 sequences in your input, thereby eliminating the need
3150 for a lot of tedious ©productionªs.
3151
3152 If AnaGram finds, on the right hand side of one of your
3153 ©grammarª productions, a string enclosed in double
3154 quotes, such as "IF", it automatically creates from the
3155 string a "keyword" which is incorporated into your
3156 parser. You may have any number of keywords. A keyword
3157 is treated as a single terminal token. Recognition of
3158 keywords is governed by the ©case sensitiveª switch.
3159
3160 Your parser will look for a keyword in its input stream
3161 wherever you have defined this particular keyword to be
3162 legitimate input. It will do whatever lookahead is
3163 necessary in order to pick up the entire keyword. If
3164 several keywords match the input, such as IF and IFF,
3165 it will select the longest match, IFF in this example.
3166
3167 Important points to notice about keywords:
3168  1) Keywords take precedence over ordinary
3169 characters in the input stream - thus if the character
3170 I and the keyword IF are both legitimate input at some
3171 point, IF will be selected, if present, in preference
3172 to I.
3173  2) Keywords are not reserved words. Your parser
3174 will only look for a keyword when it is in a state
3175 where that keyword is legitimate input.
3176  3) Keywords do not participate in character sets
3177 and should not appear in definitions of character sets.
3178 In particular, they are not considered as belonging to
3179 the complement of a character set. Thus
3180 a keyword would not be considered legitimate input
3181 for the production
3182 next char -> ~( '/' + '*' )
3183
3184  4) Keywords may appear in virtual productions.
3185
3186  5) Keywords may be named by means of a definition.
3187
3188 AnaGram will list all the keywords in your grammar in
3189 the ©Keywordsª window. In addition, in numerous
3190 windows where the cursor bar selects a state, the
3191 ©Auxiliary Windowsª popup menu will list a Keywords option.
3192 This window will provide a list of the keywords
3193 acceptable in the selected ©parser stateª.
3194
3195 On occasion, a kind of conflict, called a ©keyword
3196 anomalyª may occur. If so, such conflicts will be listed
3197 in the ©Keyword Anomaliesª window. The "©stickyª"
3198 ©attribute statementª is useful in dealing with keyword
3199 anomalies.
3200 ##
3201
3202 Keyword Anomalies Found
3203
3204 This ©warningª message indicates that AnaGram has found
3205 at least one ©keyword anomalyª in your ©grammarª. Open
3206 the ©Keyword Anomaliesª window to see a list of those
3207 that have been found.
3208 ##
3209
3210 Keyword Anomaly
3211
3212 In ©syntax directed parsingª, it is assumed that input
3213 ©tokenªs can be uniquely identified. In the case of
3214 ©keywordªs, however, there is the possibility that the
3215 individual characters making up the keyword, as well as
3216 the keyword taken as a whole, could constitute
3217 legitimate input under some circumstances. Thus
3218 ©keywordsª, though a powerful and useful tool, are not
3219 completely consistent with the assumptions that underlie
3220 ©syntax directed parsingª. This can occasionally give
3221 rise to a type of conflict, diagnosed by AnaGram,
3222 called a "keyword anomaly". AnaGram is quite
3223 conservative in its diagnoses, so that many keyword
3224 anomalies it reports are actually innocuous and can be
3225 safely ignored.
3226
3227 Basically, a keyword anomaly is a situation where a
3228 keyword is recognized, causes a reduction, and the
3229 parser arrives in a state where the keyword is not
3230 legal input. If the keyword, seen simply as a sequence
3231 of characters, might have been legal input in the
3232 original state, AnaGram notes the existence of a
3233 keyword anomaly.
3234
3235 If you have a keyword that causes a keyword anomaly and
3236 it is actually a reserved word in your grammar, the
3237 anomaly is by definition innocuous. You should use the
3238 ©reserve keywordsª statement to inform AnaGram that the
3239 keyword is reserved and the anomaly need not be
3240 diagnosed.
3241
3242 To help identify and correct any problems associated
3243 with keyword anomalies, AnaGram provides the ©Keyword
3244 Anomaliesª window to identify the anomalies, and the
3245 ©Keyword Anomaly Traceª to help you understand a
3246 particular anomaly.
3247 ##
3248
3249 Keyword Anomaly Trace
3250
3251 A Keyword Anomaly Trace is a ready made ©grammar traceª
3252 window which you may select from the ©Auxiliary Windowsª
3253 menu of the ©Keyword Anomaliesª window. The anomaly
3254 trace provides a path to a state which illustrates the
3255 ©keyword anomalyª. In this state, the keyword is a
3256 reducing token, but after the reduction, it is not
3257 allowable input.
3258 ##
3259
3260 Keyword Anomalies
3261
3262 The Keyword Anomalies window is available only if your
3263 grammar has ©keywordª anomalies.
3264
3265 Each entry in the Keyword Anomalies window consists of
3266 two lines. The first line identifies the ©parser stateª
3267 at which the ©keyword anomalyª occurs and the offending
3268 keyword. The second line identifies the ©grammar ruleª
3269 which the keyword may erroneously reduce.
3270
3271 The ©Auxiliary Windowsª menu provides three auxiliary
3272 windows keyed directly to the anomaly to help you
3273 determine the nature of the problem: The ©Keyword
3274 Anomaly Traceª window, the ©Reduction Traceª window, and
3275 the ©Rule Derivationª window. Three other windows provide
3276 supporting information: the ©Reduction Statesª window,
3277 the ©Rule Contextª window and the ©State Definitionª
3278 window.
3279 ##
3280
3281 Keywords
3282
3283 The Keywords entry in the ©Browse Menuª pops up a
3284 window which lists all of the keywords defined in your
3285 ©grammarª. The ©token numberª is also specified.
3286
3287 A Keywords window is also an option in the ©Auxiliary
3288 Windowsª popup menu for any window which distinguishes
3289 various states of your parser. The Keywords window will
3290 show all of the ©keywordªs which will be recognized in
3291 the state selected by the cursor bar in the parent
3292 window.
3293
3294 The ©Auxiliary Windowsª menu for a Keywords window
3295 provides a ©Token Usageª option which will allow you to
3296 all the uses of a particular keyword in your grammar.
3297 ##
3298
3299 left
3300
3301 "left" controls a ©precedence declarationª, indicating
3302 that all of the listed ©rule elementsª are to be
3303 considered ©left associativeª.
3304 ##
3305
3306 Left Associative
3307
3308 A binary operator is said to be left associative if
3309 an expression with repeated instances of the operator
3310 is to be evaluated from the left. Thus, for example,
3311 x = a/b/c
3312
3313 is normally taken to mean x = (a/b)/c The division
3314 operator is said to be left associative.
3315
3316 In ©grammarªs with ©conflictªs, you may use ©precedence
3317 declarationªs to specify that an operator should be left
3318 associative.
3319 ##
3320
3321 Lexeme
3322
3323 The "lexeme" ©attribute statementª is used to fine-tune
3324 the "©disregardª" statement. The lexeme statement takes
3325 the form:
3326 lexeme { T1, T2,....Tn }
3327
3328 where T1,...Tn is a list of ©nonterminalª tokens
3329 separated by commas. Lexeme statements may be placed in
3330 any ©configuration sectionª, and there may be any number
3331 of them.
3332
3333 When you specify that a ©tokenª is to be disregarded,
3334 AnaGram rewrites your ©grammarª so that the token will be
3335 passed over whenever it occurs at the beginning of a
3336 file or following a lexical unit, or "lexeme". If you
3337 have no lexeme statement, then the lexemes in your
3338 grammar are just the terminal tokens.
3339
3340 The lexeme statement allows you to specify that certain
3341 nonterminal tokens are also to be treated as lexemes.
3342 This means that the disregard token will be skipped
3343 following the lexeme, but not between the characters
3344 that constitute the lexeme.
3345
3346 Lexemes correspond to the tokens that a lexical scanner,
3347 if you were using one, would commonly identify and pass
3348 to a parser as single tokens. You don't usually wish to
3349 disregard ©white spaceª within these tokens. For
3350 example, in a grammar for a conventional programming
3351 language where blank characters are to be disregarded,
3352 you might include:
3353 [
3354 lexeme {string, character constant, name, number}
3355 ]
3356
3357 since blank characters must not be overlooked within
3358 strings and constants, and should not be permitted
3359 within names or numbers.
3360
3361 If your grammar allows for situations where successive
3362 lexemes could run together if they were not separated
3363 by space, a name followed by a number, for example, you
3364 may use the "©distinguish lexemesª" ©configuration
3365 switchª to force a separation between the tokens.
3366
3367 White space may be used explicitly within definitions of
3368 lexeme tokens in your grammar if desired, without
3369 causing conflicts. Thus, if you wish to allow embedded
3370 space in variable names, you might write:
3371 [
3372 disregard space
3373 lexeme {variable name}
3374 ]
3375 space = ' ' + '\t'
3376 letter = 'a-z' + 'A-Z'
3377 digit = '0-9'
3378
3379 variable name
3380 -> letter
3381 -> variable name, letter + digit
3382 -> variable name, space..., letter + digit
3383 ##
3384
3385 line
3386
3387 line is a field in your ©parser control blockª used for
3388 keeping track of the line number of the current
3389 character in your input. Line and column numbers are
3390 tracked only if the ©lines and columnsª ©configuration
3391 switchª has been set.
3392 ##
3393
3394 line length
3395
3396 Line length is an ©obsolete configuration parameterª.
3397 ##
3398
3399 Line Numbers
3400
3401 "Line numbers" is a ©configuration switchª which
3402 defaults to off. If it is on, the ©Build Parserª
3403 command will put "#line" directives into the generated
3404 C code file so that your compiler diagnostics will
3405 refer to lines in the ©syntax fileª rather than in the
3406 generated C code file. For more information on the
3407 "#line" directive, see Kernighan and Ritchie, second
3408 edition, section A12.6.
3409
3410 If the "line numbers" switch is off, AnaGram will put
3411 comments into your parser file to help you find
3412 reduction procedures and embedded C in your syntax
3413 file.
3414
3415 Prior to AnaGram 2.01, if your C or C++ compiler required that the
3416 backslashes in the pathname in the #line directive be doubled, you
3417 would have used AnaGram's ©escape backslashesª switch to make this
3418 happen. Although you may still use ©escape backslashesª, it should no
3419 longer be necessary because AnaGram now puts forward slashes into #line
3420 pathnames instead of backslashes.
3421
3422 If you wish, you may specify the pathname in the #line
3423 directives explicitly by using the ©Line Numbers Pathª
3424 configuration parameter.
3425
3426 You may also wish to change the "©parser file nameª"
3427 parameter to provide a full path name for your parser
3428 file.
3429 ##
3430
3431 Line Numbers Path
3432
3433 "Line Numbers Path" is a ©configuration parameterª
3434 which takes a string value. It defaults to NULL.
3435
3436 When you have set the ©Line Numbersª ©configuration
3437 switchª and Line Numbers Path is not NULL, AnaGram
3438 uses it in the #line directive in place of the full
3439 path name of your ©syntax fileª.
3440
3441 Note that Line Numbers Path should be the complete
3442 pathname for your syntax file.
3443
3444 Line Numbers Path is useful when using AnaGram in cross
3445 platform development. When parsers are to be compiled
3446 and tested on a platform different from that used to run
3447 AnaGram, you may use Line Numbers Path to provide a
3448 pathname on the platform used for compiling and
3449 testing.
3450 ##
3451
3452 Lines and Columns
3453
3454 "Lines and columns" is a ©configuration switchª which
3455 defaults to on. When set, i.e., on, it causes the
3456 ©Build Parserª command to incorporate code into your
3457 parser which will automatically track the line number
3458 and column number of the input token.
3459
3460 You would normally set the "lines and columns" switch
3461 when you are planning to build a parser which will read
3462 an input file and which will need to diagnose ©syntax
3463 errorsª with some precision.
3464
3465 Your parser will store the line and column numbers in
3466 the ©lineª and ©columnª fields respectively in the
3467 ©parser control blockª.
3468
3469 If the input to your parser includes tab characters, you
3470 should either set the ©tab spacingª ©configuration
3471 parameterª appropriately or provide a ©TAB_SPACINGª
3472 macro for your parser.
3473
3474 Your parser will count line and column numbers beginning
3475 with one.
3476 ##
3477
3478 Main Program
3479
3480 The "main program" ©configuration switchª determines
3481 what AnaGram does if you invoke the ©Build Parserª
3482 command, but have no ©embedded Cª in your ©syntax
3483 fileª. If the switch is on and you have not specified
3484 ©pointer inputª or an ©event drivenª parser, AnaGram
3485 creates a main program which does nothing but call your
3486 ©parserª. The "main program" switch defaults to on.
3487
3488 This feature, along with the default definitions for
3489 ©GET_INPUTª and ©error handlingª, makes it possible
3490 to write a grammar with no ©embedded Cª or ©reduction
3491 procedureªs whatsoever and still get an executable
3492 program which will read input from stdin and parse it
3493 according to your grammar.
3494 ##
3495
3496 Marked Rule
3497
3498 A "marked rule" is a ©grammar ruleª together with a
3499 marked token that indicates how much of the rule has already
3500 been matched. The ©marked tokenª and any tokens following it
3501 indicate the input that should be expected if the
3502 remainder of the rule is to be matched.
3503
3504 When marked rules are displayed in AnaGram windows, the
3505 marked token is represented by a difference in the font. The token may
3506 be in bold face, underlined, italicized, shown with a different point
3507 size, or in a different font altogether. Since AnaGram allows you to
3508 change fonts to suit your own preferences, you should be careful that
3509 the font you choose for the marked tokens allows them to be readily
3510 distinguished from the other tokens in your grammar rules. An
3511 underlined font is often suitable.
3512 ##
3513
3514 Max conflicts
3515
3516 The "max conflicts" ©configuration parameterª limits the
3517 number of ©conflictªs AnaGram will record. Sometimes, a
3518 simple error editing your syntax file can cause hundreds
3519 of conflicts, which you don't need to see in gory
3520 detail. The default value of max conflicts is 50. If you
3521 have a grammar that is in serious trouble and you want
3522 to see more conflicts, you may change max conflicts to
3523 suit your needs.
3524 ##
3525
3526 Missing
3527
3528 The ©warningª message Missing <element 1> in <element 2>
3529 indicates that AnaGram expects to see an instance of
3530 syntactic element 1 at the specified location, internal
3531 to an instance of syntactic element 2. AnaGram cannot
3532 reliably continue parsing its input after an error of
3533 this type. Therefore, it limits further analysis of
3534 your grammar to scanning for syntax errors.
3535 ##
3536
3537 Missing Production
3538
3539 "Missing production, TXXX: <token name>" is a ©warningª
3540 message which indicates that the specified ©tokenª
3541 appears to be defined recursively, but there is no
3542 initial ©productionª to get the recursion started. If
3543 you get this warning, check your ©grammarª closely.
3544 ##
3545
3546 Missing Reduction Procedure
3547
3548 "Missing reduction procedure, RXXX" is a ©warningª
3549 message which appears either when the ©grammar ruleª indicated
3550 specifies a ©parameter assignmentª but does not have a
3551 ©reduction procedureª to use it, or when the rule has no reduction
3552 procedure but the value of the token on the left hand side is used in
3553 as an argument for some other reduction procedure and the ©default reduction valueª
3554 does not have the same type as the token on the left hand side.
3555 In this latter case, a reduction procedure may be needed to effect
3556 correct type conversion.
3557
3558 This warning is
3559 provided in case the lack of a reduction procedure is an
3560 oversight.
3561 ##
3562
3563 Multiple Definitions
3564
3565 "Multiple definitions for TXXX: <token name>" is a
3566 ©warningª message which indicates that the specified
3567 ©tokenª has been defined both as a ©character setª and
3568 as a ©nonterminal tokenª. It cannot be both.
3569 ##
3570
3571 Near Functions
3572
3573 "Near Functions" is a ©configuration switchª that
3574 defaults to off. It controls the use of the "near"
3575 keyword for static functions in your parser. If your
3576 parser is to run on an 80x86 processor you might wish
3577 to turn it on. Your parser will then be a slight bit
3578 smaller and will run a little bit faster.
3579
3580 If you are going to run your parser on some other
3581 processor or use a C or C++ compiler that does not
3582 support the "near" keyword you should make sure "near
3583 functions" is set to off.
3584 ##
3585
3586 Negative Character Code in Pointer Mode
3587
3588 This ©warningª message appears if your ©grammarª defines
3589 negative character codes and uses ©pointer inputª. If
3590 your grammar uses the default definition for ©pointer
3591 typeª it will be reading unsigned characters so that
3592 the parser will never see the negative codes that have
3593 been defined. You may correct the problem by providing
3594 your own definition of pointer type.
3595 ##
3596
3597 Nest Comments
3598
3599 "Nest comments" is a ©configuration switchª which
3600 defaults to off. It controls the treatment of ©commentsª
3601 while scanning your ©syntax fileª. It defaults to off,
3602 in accordance with the ANSI standard for C which
3603 disallows ©nested commentsª. Note that AnaGram scans
3604 comments in any ©embedded Cª code as well as in the
3605 grammar specification. You may turn this switch on and
3606 off as many times as necessary in a single file.
3607 ##
3608
3609 Nested Comment
3610
3611 As delivered, AnaGram treats C style ©commentsª
3612 according to the ANSI standard: They do not nest. For
3613 those who prefer nested comments, however, the ©nest
3614 commentsª ©configuration switchª allows them to nest.
3615 ##
3616
3617 Nesting too deep
3618
3619 This ©warningª message indicates that ©set
3620 expressionªs or ©virtual productionsª are
3621 nested so deeply they have exhausted the available
3622 stack space and AnaGram cannot continue its analysis.
3623
3624 Use a ©definitionª statement to name an intermediate
3625 level.
3626 ##
3627
3628 no cr
3629
3630 "no cr" is a ©configuration switchª which
3631 defaults to off. When this switch is set, it will
3632 cause the ©parser fileª and ©header fileª to be
3633 written without carriage returns. This is convenient
3634 if you wish to use the generated parser files in a
3635 Unix environment.
3636 ##
3637
3638 No Grammar Token Specified
3639
3640 This ©warningª message appears if your ©grammarª does not
3641 specify a ©grammar tokenª. Edit your ©syntax fileª to
3642 specify one.
3643 ##
3644
3645 No Productions in Syntax File
3646
3647 This ©warningª message appears if AnaGram did not find
3648 any ©productionsª at all in your ©syntax fileª. Check
3649 to see you have the right file.
3650 ##
3651
3652 No Such Parameter
3653
3654 This ©warningª message appears when AnaGram does not
3655 recognize the name of a ©configuration parameterª you
3656 have tried to set in your ©syntax fileª. Check the
3657 spelling of the parameter you wish to set in the
3658 ©Configuration Parameters Windowª.
3659 ##
3660
3661 No Terminal Tokens in Expansion
3662
3663 No terminal tokens in expansion of TXXX is a ©warningª
3664 message indicating that there are no terminal tokens
3665 to be found in an expansion of the specified token.
3666 Although there are a few circumstances where this could
3667 be legitimate, it is more likely that there is a missing
3668 rule in the grammar.
3669 ##
3670
3671 Not a Character Set
3672
3673 "Not a character set, TXXX: <token name>" is a ©warningª
3674 message which indicates that the specified ©tokenª has
3675 been used both on the left side of a ©productionª and in
3676 a ©character setª expression defining some other token.
3677 AnaGram will use an empty set in place of the
3678 specified token in evaluating the ©character setª. You
3679 will get another warning, ©Error definingª token, when
3680 AnaGram finishes its evaluation of the character set.
3681 ##
3682
3683 Nothing Reduces
3684
3685 "Nothing reduces TXXX -> RYYY" is a ©warningª message
3686 which indicates that the ©grammarª does not specify any
3687 input to follow an instance of the indicated ©grammar
3688 ruleª. In all probability, the grammar does not have
3689 any explicit end of file, or ©eof tokenª. If the grammar
3690 does not have any conflicts with ©tokenª T000, then an
3691 explicit end of file indicator is not necessary.
3692 Otherwise you should modify your grammar to require an
3693 explicit end of file.
3694 ##
3695
3696 Null Character in String
3697
3698 This ©warningª message appears when AnaGram finds an
3699 explicit null character in a quoted string. If you must
3700 allow for a null in a ©keyword stringª
3701 you will have to rewrite your
3702 ©grammar ruleª. For instance, instead of
3703
3704 widget
3705 -> "abc\0def"
3706
3707 write
3708
3709 widget
3710 -> "abc", 0, "def"
3711 ##
3712
3713 nonassoc
3714
3715 "nonassoc" controls a ©precedence declarationª,
3716 indicating that all of the listed ©rule elementsª are
3717 to be considered non-associative.
3718 ##
3719
3720 Nonterminal Token, Nonterminal
3721
3722 A nonterminal token is one which is constructed from a
3723 series of other tokens as specified by one or more
3724 ©productionªs. Nonterminal tokens are to be
3725 distinguished from ©terminal tokenªs, which are the
3726 basic input units appearing in your input stream.
3727 Terminal tokens most often represent single characters
3728 or a character belonging to a ©character setª such as
3729 'a-z'.
3730 ##
3731
3732 Null Production
3733
3734 A "null production" is one that has no tokens on the
3735 right hand side whatsoever. Null ©productionªs
3736 essentially are identified by the first following input
3737 token. Null productions are extremely convenient
3738 syntactic elements when you wish to make some input
3739 optional. For example, suppose that you wish to allow an
3740 optional semicolon at some point in your grammar. You
3741 could write the following pair of productions:
3742 optional semicolon -> | ';'
3743 Note that a null production can never follow a '|'.
3744
3745 This could also be written on multiple lines thus:
3746 optional semicolon
3747 ->
3748 -> ';'
3749
3750 You can always rewrite your grammar to eliminate null
3751 productions if you wish, but you usually pay a price in
3752 conciseness and clarity. Sometimes, however, it is
3753 necessary to do such a rewrite in order to avoid
3754 ©conflictªs, to which null productions are especially
3755 prone. For example suppose you have the following
3756 production:
3757 foo -> wombat, optional semicolon, widget
3758
3759 You can rewrite this as two productions:
3760 foo
3761 -> wombat, widget
3762 -> wombat, ';', widget
3763
3764 This rewrite specifies exactly the same input language,
3765 but is less prone to conflicts. On the other hand, it
3766 may require significantly more table space in your
3767 parser.
3768
3769 If you have a null production with no ©reduction
3770 procedureª specified, your parser will automatically
3771 assign the value zero to ©reduction tokenª.
3772
3773 Null productions can also be generated by ©virtual
3774 productionsª.
3775
3776 A token that has a null production is a "©zero lengthª"
3777 token.
3778 ##
3779
3780 Old Style
3781
3782 "Old Style" is a ©configuration switchª which defaults
3783 to off. It controls the function definitions in the code
3784 AnaGram generates. When "old style" is off, it generates
3785 ANSI style calling sequences with prototypes as
3786 necessary. When "old style" is on, it generates old
3787 style function definitions.
3788 ##
3789
3790 Output Files
3791
3792 When you use the ©Build Parserª command, to request
3793 output from AnaGram, it creates two files: a ©parser
3794 fileª and a ©parser headerª file.
3795 ##
3796
3797 Page Length
3798
3799 "Page length" is an ©obsolete configuration parameterª.
3800 ##
3801
3802 Obsolete Configuration Parameter, Obsolete Configuration Switch
3803
3804 A number of ©configuration parameterªs and ©configuration switchªes
3805 which were used in the DOS version of AnaGram are no longer
3806 used, but are still recognized for the sake of upward
3807 compatibility. These parameters include:
3808  ©bottom marginª
3809  ©line lengthª
3810  ©page lengthª
3811  ©top marginª
3812  ©quick referenceª
3813  ©video modeª
3814
3815 ##
3816
3817 Parameter
3818
3819 "Parameter <name> has type void" is a ©warningª message
3820 which appears when a ©parameter assignmentª is attached
3821 to a ©tokenª that has been defined to have the void
3822 ©data typeª.
3823 ##
3824
3825 Parameter Assignment
3826
3827 In any ©grammar ruleª, the ©semantic valueª of any
3828 ©rule elementª may be passed to a ©reduction procedureª
3829 by means of a parameter assignment. Simply follow the
3830 rule element with a colon and a C variable name. The C
3831 variable name can then be used in the reduction
3832 procedure to reference the semantic value of the token
3833 it is attached to. AnaGram will automatically provide
3834 necessary declarations.
3835
3836 Here are some examples of rule elements with parameter
3837 assignments:
3838
3839 '0-9':d
3840 integer:n
3841 expression:x
3842 declaration : declaration_descriptor
3843
3844 ##
3845
3846 Parameter Not Defined
3847
3848 AnaGram does not have a ©configuration parameterª
3849 with the specified name. Please check the spelling.
3850 ##
3851
3852 Parameter Takes Integer Value
3853 The specified ©configuration parameterª takes
3854 an integer value only.
3855 ##
3856
3857
3858 Parameter Takes String Value
3859
3860 The specified ©configuration parameterª takes
3861 a string value only.
3862 ##
3863
3864 Parse Function
3865
3866 To run your parser, you call the parse function.
3867 The name of the parse function is given by
3868 the ©parser nameª ©configuration parameterª and defaults to the
3869 name of your parser file.
3870
3871 If your parser uses ©pointer inputª, you should set the ©pointerª
3872 field of the ©parser control blockª before calling the parser
3873 function.
3874
3875 If your parser is ©event drivenª, you should first call the
3876 ©initializerª, and then you should call the parser function
3877 for each input token you
3878
3879 If the ©reentrant parserª switch is set, the parse function takes
3880 a pointer to the ©parser control blockª as its sole argument. Otherwise
3881 it takes no arguments. The parse function returns no value. All
3882 communication is by means of the ©parser control blockª.
3883
3884 To retrieve the value of the ©grammar tokenª, once the parse is complete,
3885 use the ©parser value functionª.
3886 ##
3887
3888 Parser
3889
3890 A parser is a program or, more commonly, a procedure within
3891 a program, which scans a sequence of ©input charactersª
3892 or input tokens and accumulates them in an input
3893 buffer or stack as determined by a set of ©productionªs
3894 which constitute a ©grammarª.
3895
3896 When the parser discovers
3897 a sequence of tokens as defined by a ©grammar ruleª, or
3898 right hand side of a production, it "reduces" the
3899 sequence to a single ©reduction tokenª as defined by the
3900 left hand side of the grammar rule. This ©nonterminal
3901 tokenª now replaces the tokens which matched the grammar
3902 rule and the search for matches continues.
3903
3904 If an input
3905 token is encountered which will not yield a match for
3906 any rule, it is considered a ©syntax errorª and some
3907 kind of ©error recoveryª may be required to continue. If
3908 a match, or ©reduce actionª, yields the ©grammar tokenª,
3909 sometimes called the ©goal tokenª or ©start tokenª, the
3910 parser deems its work complete and returns to whatever
3911 procedure may have called it.
3912
3913 The ©Grammar Traceª and ©File Traceª functions in
3914 AnaGram provide a convenient means for understanding the
3915 detailed operation of a syntax directed parser.
3916
3917 ©Tokensª may have ©semantic valuesª. If the ©input
3918 valuesª ©configuration switchª is on, your parser will
3919 expect semantic values to be provided by the input
3920 process along with the token identification code. If the
3921 input values switch is off, your parser will take the
3922 ascii value of the input character, that is, the actual
3923 input code, as the value of the character.
3924
3925 When the
3926 parser reduces a production, it can call a ©reduction
3927 procedureª or ©semantic actionª to analyze the values of
3928 the constituent tokens. This reduction procedure can
3929 then return a value which characterizes the reduced
3930 token.
3931 ##
3932
3933 Parser Control Block
3934
3935 A "Parser Control Block" is a structure which contains
3936 all of the data necessary to describe the instantaneous
3937 state of a parser. The typedef statement which defines
3938 the structure is included in the ©parser headerª file
3939 for your parser. AnaGram creates the name of the data
3940 type for the structure by appending "_pcb_type" to the
3941 ©parser nameª.
3942
3943 You may add your own declarations to the parser control
3944 block by using the ©extend pcbª statement.
3945
3946 If the ©declare pcbª ©configuration switchª is on, its
3947 normal state, AnaGram will declare a parser control
3948 block for you at the beginning of your parser file.
3949 AnaGram will determine the name of the parser control
3950 block by appending "_pcb" to the ©parser nameª. AnaGram
3951 will also define the macro PCB as a short hand notation
3952 for use within the parser. All references to the parser
3953 control block within the code that AnaGram generates
3954 are made using the PCB macro.
3955
3956 If you wish to declare your own parser control block,
3957 you must include the ©parser headerª file for your
3958 parser before your declaration. Then you declare a
3959 control block and define PCB to refer to the control
3960 block you have declared.
3961
3962 Suppose your grammar is called widget. You would then
3963 write the following statements in your ©embedded Cª:
3964 #include "widget.h"
3965 widget_pcb_type widget_control_pcb;
3966 #define PCB widget_control_pcb
3967
3968 Alternatively, you could write the following:
3969 #include "widget.h"
3970 widget_pcb_type *widget_control_pcb_pointer;
3971 #define PCB (*widget_control_pcb)
3972
3973 and then allocate storage for the structure when
3974 necessary.
3975
3976 Some fields of interest in the parser control block are
3977 as follows:
3978 ©input_codeª
3979 ©input_valueª
3980 ©input_contextª
3981 ©pointerª
3982 ©token_numberª
3983 ©reduction_tokenª
3984 ©ssxª
3985 ©snª
3986 ©ssª[©parser stack sizeª]
3987 ©vsª[parser stack size];
3988 ©csª[parser stack size];
3989 ©lineª
3990 ©columnª
3991 *©error_messageª
3992 ©error_frame_ssxª
3993 ©error_frame_tokenª
3994 ##
3995
3996 PCB
3997
3998 "PCB" is a macro AnaGram defines for use in the code it
3999 generates to refer to the ©parser control blockª for
4000 your ©parserª. Normally, AnaGram automatically declares
4001 storage for a parser control block and defines PCB for
4002 you. If you turn off the ©declare PCBª switch, you may
4003 define PCB yourself.
4004 ##
4005
4006 PCB_TYPE
4007
4008 If you are writing your parser in C++, you may prefer to derive
4009 a class from the ©parser control blockª rather than use the
4010 ©extend pcbª statement. In this case you may define the
4011 PCB_TYPE macro in your syntax file to specify your derived
4012 class.
4013
4014 For instance, you have defined
4015
4016 class MyPcb : public parser_pcb_type {...};
4017
4018 You would then add the following line:
4019
4020 #define PCB_TYPE MyPcb
4021
4022 If you do not define PCB_TYPE, AnaGram will define it as the
4023 type of your parser control block.
4024 ##
4025
4026 Parser File
4027
4028 The "parser file" is the C (or C++) file output by AnaGram when
4029 you execute the ©Build Parserª command. It contains all
4030 of the ©embedded Cª from your ©syntax fileª, all of the
4031 ©reduction procedureªs defined in your ©grammarª,
4032 syntax tables which represent, in a condensed form, all
4033 of the intricacies of your grammar, and a customized
4034 ©parsing engineª. The name of the parser file is given
4035 by the ©parser file nameª ©configuration parameterª. The
4036 name of the ©parserª itself is given by the ©parser
4037 nameª configuration parameter.
4038
4039 If you wish the parser file to be written without carriage
4040 returns, suitable for a Unix environment, set the ©no crª
4041 configuration switch.
4042 ##
4043
4044 Parser File Name
4045
4046 "Parser file name" is a ©configuration parameterª which
4047 takes a string value. The default value is "#.c".
4048 AnaGram uses this parameter to generate the name of the
4049 output C file, or ©parser fileª, created by the ©Build
4050 Parserª command. The '#' character is used in this
4051 string as a wild card to indicate the name of the
4052 current ©syntax fileª. If the first character of the
4053 parser file name string is a '.' character, AnaGram
4054 will substitute the name of the current working
4055 directory for the dot. Thus ".\\#.c" will create the
4056 file name as a complete path. This can sometimes be
4057 important when using the ©line numbersª switch to
4058 enable a debugger to find code in your parser file.
4059
4060 Note that the parser file name is not the same as the
4061 ©parser nameª.
4062 ##
4063
4064 Parser Generator
4065
4066 A parser generator, such as AnaGram, is a program that
4067 converts a ©grammarª, a rule-based description of the
4068 input to a program, into a conventional, procedural
4069 module called a ©parserª. The parsers AnaGram generates
4070 are simple C modules which can be compiled on almost
4071 any platform. AnaGram parsers are also compatible with
4072 C++.
4073 ##
4074
4075 Header File, Parser Header
4076
4077 When you use the command ©Build Parserª to generate
4078 source code for a parser, AnaGram creates two files, a
4079 header file and a C source file. Unless different
4080 paths are specified in the ©parser file nameª and
4081 ©header file nameª parameters, both files will be
4082 written to the directory that contains the ©syntax fileª.
4083
4084 The header file contains a number of typedef statements,
4085 including the definition of the ©parser control blockª,
4086 and a number of macro
4087 definitions which may be useful in your parser
4088 or in other modules of your program.
4089
4090 If you do not alter
4091 the ©header file nameª parameter, the
4092 name of the header file will be the same as the name of
4093 your ©syntax fileª and it will have the extension ".h".
4094
4095 If you wish the header file to be written without carriage
4096 returns, suitable for a Unix environment, set the ©no crª
4097 configuration switch.
4098 ##
4099
4100 Parser Input
4101
4102 AnaGram ©parserªs may be configured to accept input in any of
4103 three different ways:
4104
4105  By default, a ©parse functionª gets its input by invoking the
4106 ©GET_INPUTª macro each time it is ready for another input token. The
4107 default implementation of GET_INPUT reads ©input characterªs from stdin. For
4108 most practical problems, you will want to override this definition of
4109 GET_INPUT, storing the current input character in PCB.input_code.
4110
4111  Alternatively, you may configure a parser to read input from an
4112 array in memory. Set the ©pointer inputª switch and load the
4113 ©pointerª field of the parser control block before calling the
4114 parse function. The parser will then run, incrementing the
4115 pointer, until it finishes or encounters an error.
4116
4117  The third alternative is to set the ©event drivenª switch. The
4118 parser will be configured as a callback routine. Begin by calling
4119 the ©initializerª. Then, for each input character, store the
4120 character in the ©input_codeª field of the parser control block and
4121 call the parse function. Each time
4122 you call the parse function it will continue until it needs more
4123 input. You can check its status by inspecting the ©exit_flagª in the
4124 parser control block.
4125
4126 The input to your parser may be either text characters or ©tokensª
4127 accumulated by a pre-processor, or ©lexical scannerª. The latter
4128 case is referred to as ©token inputª. If you use a lexical scanner,
4129 you may find it convenient to configure your parser as event driven.
4130
4131 Altlhough lexical scanners are often not necessary
4132 when you use AnaGram, if you do need one you can write it in AnaGram.
4133 ##
4134
4135 Parser Name
4136
4137 "Parser Name" is a ©configuration parameterª which
4138 defaults to "#", where "#" represents the name of your
4139 ©syntax fileª. AnaGram uses this parameter to name your
4140 ©parse functionª. The ©initializerª for your parser will have the
4141 same name preceded by "init_". Note that "©parser file
4142 nameª" is not the same configuration parameter as "parser
4143 name".
4144 ##
4145
4146 Parser Stack
4147
4148 Your ©parserª uses a "parser stack" to keep track of the
4149 ©grammar rulesª it is trying to match and its progress
4150 in matching them. Normally, there are two separate
4151 stacks defined by AnaGram: ©PCBª.©ssª, the ©parser state
4152 stackª which maintains ©parser stateª numbers, and
4153 PCB.©vsª, the ©parser value stackª which maintains the
4154 ©semantic valueªs of tokens that have been identified so
4155 far. If you wish to maintain a stack tracking other
4156 variables you may set the ©context typeª ©configuration
4157 parameterª, and AnaGram will define a third stack,
4158 PCB.©csª. All are indexed by the same stack index,
4159 PCB.©ssxª.
4160
4161 To see how tokens accumulate on the parser stack, run
4162 the ©Grammar Traceª or the ©File Traceª.
4163
4164 Normally, when the return value of a ©reduction procedureª
4165 is stored on the parser value stack, it is stored by
4166 simply coercing the stack pointer to the correct type.
4167 If the return value is a C++ object, this can cause
4168 serious problems. These problems can be avoided by
4169 using the ©wrapperª statement.
4170 ##
4171
4172 Parser stack alignment
4173
4174 Parser stack alignment is a ©configuration parameterª whose
4175 value is a C or C++ data type. It defaults to "long". If
4176 any tokens have type "double", it will be automatically set
4177 to double. Thus, you will normally not need to change this
4178 parameter if your parser is to run on a PC or compatible
4179 processor. It provides alignment control for processors
4180 which restrict address for multibyte data access. The
4181 default setting provides for correct operation on 64 bit
4182 processors.
4183
4184 To control byte alignment of the parser stack,
4185 ©PCBª.©vsª, AnaGram normally adds a field of the
4186 specified data type to the "union" statement which
4187 defines the data type for the ©parser stackª. This
4188 parameter can be used to deal with byte alignment
4189 problems when a ©parserª is to be run on a processor
4190 with byte alignment restrictions. For instance, if your
4191 ©grammarª has ©tokenªs of type "long double" and your
4192 processor requires long double variables to be
4193 properly aligned, you can include the following
4194 statement in a ©configuration sectionª in your grammar
4195 or in your ©configuration fileª:
4196
4197 parser stack alignment = long double
4198
4199 If the data type specified is "void", no alignment declaration
4200 will be made.
4201 ##
4202
4203 Parser Stack Index, Stack Index
4204
4205 The parser stack index, ©PCBª.©ssxª, tracks the depth
4206 of the ©parser state stackª, the ©parser value stackª,
4207 and the ©context stackª if you defined one. The parser
4208 stack index is incremented by ©shift actionsª and
4209 reduced by ©reduce actionsª.
4210 ##
4211
4212 Parser Stack Overflow
4213
4214 Your ©parserª uses a ©parser stackª to keep track of the
4215 ©grammar rulesª it is trying to match and its progress
4216 in matching them. If your grammar has any ©recursive
4217 ruleªs that are not strictly left recursive, then no
4218 matter how big you make the parser stack, it will be
4219 possible to create a syntactically correct input which
4220 will cause the stack to overflow. As a practical matter,
4221 however, it is usually possible to set the ©parser stack
4222 sizeª to a value large enough so that an overflow is a
4223 freak occurrence. Nevertheless, it is necessary to check
4224 for overflow, and in the case overflow should occur,
4225 your parser has to do something. What it does is invoke
4226 the ©PARSER_STACK_OVERFLOWª macro. If you don't define
4227 it, AnaGram will define it for you, although not
4228 necessarily to your taste.
4229 ##
4230
4231 Recursive rule, Recursion
4232
4233 A ©grammar ruleª is said to be "recursive" if the ©tokenª on the left side
4234 of the rule also appears on the right side of the rule, or
4235 in an ©expansion ruleª of any token on the right side of the rule.
4236
4237 If the token on the left side is the
4238 first token on the right side, the rule is said to be "left recursive".
4239 If it is the last token on the right side, the rule is said to be
4240 "right recursive". Otherwise, the rule is "center recursive".
4241
4242 For example:
4243 statement list
4244 -> statement
4245 -> statement list, statement // left recursive
4246
4247 fraction part
4248 -> digit
4249 -> fraction part, digit // right recursive
4250
4251 expression
4252 -> factor
4253 -> expression, '+' + '-', factor
4254
4255 factor
4256 -> primary
4257 -> factor, '*' + '/', primary
4258
4259 primary
4260 -> number
4261 -> name
4262 -> '(', expression, ')' // center recursive
4263
4264 Note that if all the tokens in the rule other then the recursive token itself
4265 are ©zero lengthª tokens, it is possible for the
4266 rule to be matched arbitrarily many times without any input whatsoever. In
4267 other words, such a rule creates an infinite loop in the parser. AnaGram can
4268 detect this condition and issues an ©empty recursionª diagnostic if it occurs.
4269
4270 ##
4271
4272 PARSER_STACK_OVERFLOW
4273
4274 PARSER_STACK_OVERFLOW is a user definable macro. If you
4275 do not define it, AnaGram will define it so that it
4276 will print a message on stderr and abort the ©parserª in
4277 case of a ©parser stack overflowª.
4278 ##
4279
4280 Parser Stack Size
4281
4282 "Parser stack size" is a ©configuration parameterª with
4283 a default value of 128. It is used to define the sizes
4284 of your ©parser stacksª in your ©parser control blockª.
4285 When analyzing your grammar, AnaGram will determine the
4286 minimum amount of stack space required for the deepest
4287 left ©recursionª. To this depth it will add one half the
4288 value of the parser stack size parameter. It will then
4289 set the actual stack size to the larger of this value
4290 and the parser stack size parameter.
4291 ##
4292
4293 Parser State, State Number
4294
4295 The essential part of your ©parserª is a group of tables
4296 which describe in detail what to do for each "state" of
4297 the parser.
4298
4299 The states of a parser are determined by sets of
4300 "©characteristic rulesª". The ©State Definition Tableª
4301 shows the characteristic rules for each state of your
4302 parser.
4303
4304 AnaGram numbers the states of a parser as it identifies
4305 them, beginning with zero. In all windows, state numbers
4306 are displayed as three digit numbers prefixed with the
4307 letter 'S'.
4308 ##
4309
4310 Parser State Stack, State Stack
4311
4312 The parser state stack is a stack maintained by your
4313 ©parserª and which is an integral part of the parsing
4314 process. At any point in the parse of your input
4315 stream, the parser state stack provides a summary of
4316 what has been found so far. The parser state stack is
4317 stored in ©PCBª.©ssª and is indexed by PCB.©ssxª, the
4318 ©parser stack indexª.
4319 ##
4320
4321 Parser Value Stack, Value Stack
4322
4323 In parallel with the ©parser state stackª, your parser
4324 maintains a "value stack", ©PCBª.©vsª, each entry of
4325 which corresponds to the ©semantic valueª of the token
4326 identified at that state. Since the semantic values of
4327 different tokens might well have different ©data typeªs,
4328 AnaGram gives you the opportunity, in your ©syntax
4329 fileª, to define the data type for any token. AnaGram
4330 then builds a typedef statement creating a data type
4331 which is a union of the all the types you have defined.
4332 AnaGram creates the name for this ©data typeª by
4333 appending "_vs_type" to the ©parser nameª. AnaGram uses
4334 this data type to define the value stack.
4335 ##
4336
4337 Parser Action
4338
4339 In a traditional LR parser, there are only four actions: the ©shift
4340 actionª, the ©reduce actionª, the ©accept actionª and the ©error
4341 actionª. AnaGram, in doing its ©grammar analysisª, identifies a
4342 number of special cases, and creates a number of extra actions which
4343 make for faster processing, but which can be represented as
4344 combinations of these primitive actions.
4345
4346 When a shift action is performed, the current state
4347 number is pushed onto the ©parser state stackª and the
4348 new state number is determined by the current state
4349 number and the current input token. Different tokens
4350 cause different new states.
4351
4352 When a reduce action is performed, the length of the
4353 rule being reduced is subtracted from the ©parser stack
4354 indexª and the new state number is read from the top of
4355 the parser state stack. The ©reduction tokenª for the
4356 rule being reduced is then used as an input token.
4357 ##
4358
4359 Parsing Engine
4360
4361 A parser consists of three basic components: A set of
4362 syntax tables, a set of ©reduction procedureªs and a
4363 parsing engine. The parsing engine is the body of code
4364 that interprets the parsing table, invokes input
4365 functions, and calls the reduction procedures. The
4366 ©Build Parserª command configures a parsing engine
4367 according to the implicit requirements of the syntax
4368 specification and according to the explicit values of
4369 the ©configuration parameterªs.
4370
4371 The parsing engine itself is a simple automaton,
4372 characterized by a set of states and a set of inputs.
4373 The inputs are the tokens of your grammar. Each state
4374 is represented by a list of tokens which are admissible
4375 in that state and for each token a ©parser actionª to perform
4376 and a parameter which further defines the action.
4377
4378 Each state in the grammar, with the exception of state
4379 zero, has a ©characteristic tokenª which must have been
4380 recognized in order to jump to that state. Therefore,
4381 the ©parser state stackª, which is essentially a list
4382 of state numbers, can also be thought of as a list of
4383 token numbers. This is the list of tokens that have
4384 been seen so far in the parse of your input stream.
4385 ##
4386
4387 Partition
4388
4389 If you use ©character setsª in your grammar, AnaGram
4390 will compute a "partition" of the ©character universeª.
4391 This partition is a collection of non-overlapping
4392 character sets such that every one of the sets you have
4393 defined can be written as a ©unionª of partition sets.
4394
4395 Each partition set is assigned a unique ©tokenª. If one
4396 of your character sets requires more than one partition
4397 set to represent it, AnaGram will create appropriate
4398 ©productionªs and add them to your grammar so your parser
4399 can make the necessary distinctions.
4400
4401 To see how AnaGram has partitioned the character
4402 universe, you may inspect the ©Partition Setsª window
4403 found in the ©Browse Menuª.
4404 ##
4405
4406 Partition Set Number
4407
4408 Each ©partitionª set is identified by a unique
4409 reference number called the partition set number.
4410 Partition set numbers are displayed in the form Pxxx.
4411 Partition sets are numbered starting with zero, so the
4412 first set is P000.
4413
4414 To see the elements of a given partition set, call up
4415 the ©Partition Setsª window from the ©Browse Menuª,
4416 then, after selecting a partition set, call up the ©Set
4417 Elementsª window from the ©Auxiliary Windowsª popup menu.
4418 ##
4419
4420 Partition Sets
4421
4422 The Partition Sets option in the ©Browse Menuª pops up
4423 a window which shows the complete ©partitionª of the
4424 ©character universeª for your parser.
4425
4426 The Partition Sets option in the ©Auxiliary Windowsª popup menu
4427 for the ©Character Setsª window lets you see the
4428 partition sets which cover the specified character set.
4429
4430 Each entry in a Partition Sets window identifies a
4431 token number and a ©partition set numberª. The ©Auxiliary
4432 Windowsª menu provides a ©Set Elementsª entry which
4433 enables you to see precisely which characters belong to
4434 the partition set. It also has a Token Usage entry to show you
4435 what rules the set is used in.
4436 ##
4437
4438 PCONTEXT
4439
4440 PCONTEXT is an alternate form of the ©CONTEXTª macro
4441 which takes an explicit argument to specify the
4442 ©parser control blockª. PCONTEXT is defined in the ©parser
4443 headerª file.
4444 ##
4445
4446 PERROR_CONTEXT
4447
4448 PERROR_CONTEXT is an alternative form of the
4449 ©ERROR_CONTEXTª macro. It differs only in that it takes
4450 an argument so you can specify the appropriate
4451 ©parser control blockª explicitly. PERROR_CONTEXT is defined in
4452 the ©parser headerª file.
4453 ##
4454
4455 pointer
4456
4457 "pointer" is a field which will be included in the
4458 ©parser control blockª for your parser if you have set
4459 the ©pointer inputª ©configuration switchª. Your main
4460 program should set PCB.pointer before it calls your
4461 parser. Thereafter, your parser will increment it
4462 appropriately. When you are executing a ©reduction
4463 procedureª or a ©SYNTAX_ERRORª macro PCB.pointer will
4464 always point to the next input character to be read.
4465 ##
4466
4467 Pointer input
4468
4469 "Pointer input" is a ©configuration switchª which you
4470 may set to control ©parser inputª. It defaults to off. When you set
4471 pointer input, you tell AnaGram that the input to your parser is in
4472 memory and can be scanned simply by incrementing a pointer. Before
4473 calling your parser you should make sure that ©PCBª.©pointerª is
4474 properly initialized to point to the first character or token in your
4475 input.
4476
4477 Use the ©configuration parameterª "©pointer typeª" to
4478 specify the type of the pointer. The default value of
4479 "pointer type" is "unsigned char *"
4480
4481 There is no particular reason why pointer type should
4482 be limited to variants on char. It could define a
4483 pointer to int or a structure just as well.
4484
4485 If you use pointer input with structures or C++
4486 classes, you should set the ©input valuesª switch and
4487 define an ©INPUT_CODEª(t) macro.
4488
4489 If you are using a 16 bit compiler and your input array
4490 is so large that you need "huge"
4491 pointers, make sure that "pointer type" is properly
4492 defined.
4493 ##
4494
4495 Pointer Type
4496
4497 "Pointer Type is a ©configuration parameterª which
4498 defaults to "unsigned char *". When you have specified
4499 ©pointer inputª, AnaGram uses the value of pointer type
4500 to declare a pointer field in your ©parser control
4501 blockª.
4502 ##
4503
4504 Precedence, Operator Precedence
4505
4506 In expressions of the form a+b*c, the convention is to
4507 perform the multiplication before the addition.
4508 Multiplication is said to take precedence over
4509 addition. In general the rank order in which operations
4510 are to be performed if there are no parentheses forcing
4511 an order of computation is called the precedence of the
4512 operators.
4513
4514 If you have an ambiguous ©grammarª, that is, a grammar
4515 with a number of ©conflictªs, you may use ©precedence
4516 declarationªs to resolve the conflicts and to set
4517 operator precedence.
4518 ##
4519
4520 Precedence Declaration
4521
4522 Precedence declarations are ©attribute statementsª which
4523 may be used to resolve ©conflictªs in your grammar by
4524 assigning precedence and associativity to operators.
4525 Precedence declarations must be made inside
4526 ©configuration sectionsª. Each declaration consists of
4527 the keyword ©leftª, ©rightª, or ©nonassocª followed by a
4528 list of ©rule elementsª. The rule elements in the list
4529 must be separated by commas and the entire list must be
4530 enclosed in braces ({ }).
4531
4532 Each of the rule elements is assigned the same
4533 precedence level, which is higher than that assigned in
4534 all previous precedence declarations and lower than that
4535 in all subsequent declarations. The rule elements are
4536 defined to be left, right, or nonassociative,
4537 depending on whether the keyword was "left", "right", or
4538 "nonassoc".
4539
4540 All conflicts which are resolved by precedence
4541 declarations are listed in the ©Resolved Conflictsª
4542 window.
4543 ##
4544
4545 Precedence Rules
4546
4547 AnaGram can resolve certain types of ©conflictªs in your
4548 grammar by applying precedence rules. There are three
4549 classes of rules available: explicit ©precedence
4550 declarationsª, the "©stickyª" statement, and the
4551 implicit rule associated with the use of a "©disregardª"
4552 token outside a ©lexemeª.
4553
4554 Whenever AnaGram uses a precedence rule of any kind to
4555 resolve a conflict, it produces a ©warningª message and
4556 lists the conflict in the ©Resolved Conflictsª window.
4557 ##
4558
4559 Previous States
4560
4561 The Previous States window can be accessed via the
4562 ©Auxiliary Windowsª popup menu from any window that identifies
4563 ©parser stateªs. It shows the ©characteristic ruleªs
4564 for all of the states which jump to the presently
4565 selected state.
4566 ##
4567
4568 Print File Name
4569
4570 "Print file name" is a configuration parameter which
4571 is not used in the Windows version of AnaGram. It is
4572 retained only for compatibility with pre-existing
4573 ©configuration fileªs.
4574 ##
4575
4576 Problem States
4577
4578 The Problem States window is essentially a trimmed
4579 version of the ©Reduction Statesª window. It is
4580 available in the ©Auxiliary Windowsª popup menu for the
4581 ©Conflictsª and ©Resolved Conflictsª windows.
4582
4583 The Problem States window has the same format as the
4584 Reduction States window, and differs only in that it
4585 shows only those reduction states for which the
4586 ©conflict tokenª is acceptable input.
4587 ##
4588
4589 Production
4590
4591 Productions are the mechanism you use to describe how
4592 complex input structures are built up out of simpler
4593 ones. Each production has a left hand side and a right
4594 hand side. The right hand side, or ©grammar ruleª, is a
4595 sequence of ©rule elementsª, which may represent either
4596 ©terminal tokensª or ©nonterminal tokensª. The left
4597 hand side is a list of ©reduction tokensª. In most
4598 cases there would be only a single reduction token.
4599 Productions with more than one ©tokenª on the left hand
4600 side are called ©semantically determined productionsª.
4601
4602 The "->" symbol is used to separate the left hand side
4603 from the right hand side. If you have several
4604 productions with the same left hand side, you can avoid
4605 rewriting the left hand side either by using '|' or by
4606 using another "->".
4607
4608 A ©null productionª, or empty right hand side, cannot
4609 follow a '|'.
4610
4611 Productions may be written thus:
4612 name
4613 -> letter
4614 -> name, digit
4615
4616 This could also be written
4617 name -> letter | name, digit
4618
4619 In order to accommodate semantic analysis of the data,
4620 you may attach to any grammar rule a ©reduction
4621 procedureª which will be executed when the rule is
4622 identified. Each token may have a ©semantic valueª. By
4623 using ©parameter assignmentªs, you may provide the
4624 reduction procedure with access to the semantic values of
4625 tokens that comprise the grammar rule. When it finishes, the
4626 reduction procedure may return a value which will be
4627 saved on the ©parser value stackª as the semantic value of the
4628 ©reduction tokenª.
4629 ##
4630
4631 Productions
4632
4633 The ©Productionªs window is available via the ©Auxiliary
4634 Windowsª popup menu in any window which identifies tokens.
4635 If the token identified by the highlighted line is
4636 ©nonterminalª, the Productions window will show the
4637 rules produced by that ©tokenª.
4638 ##
4639
4640 PRULE_CONTEXT
4641
4642 PRULE_CONTEXT is an alternative form of the
4643 ©RULE_CONTEXTª macro. It differs only in that it takes
4644 an argument so you can specify the appropriate ©parser control blockª
4645 explicitly. PRULE_CONTEXT is defined in
4646 the ©parser headerª file.
4647 ##
4648
4649 Quick Reference
4650
4651 "Quick reference" is an ©obsolete configuration switchª.
4652 ##
4653
4654 Range Bounds Out of Order
4655
4656 This is a ©warningª message that appears when you have a
4657 ©character rangeª of the form 'z-a'. AnaGram interprets
4658 this range as being equal to 'a-z', but provides a
4659 warning in case the unusual order was the result of a
4660 clerical error.
4661 ##
4662
4663 Recursive Definition of Char Set
4664
4665 This ©warningª appears when AnaGram discovers a
4666 recursively defined ©character setª. Character sets
4667 cannot be defined recursively.
4668 ##
4669
4670 Redefinition
4671
4672 "Redefinition of <name>" is a ©warningª message which
4673 appears when AnaGram discovers a redefinition of a
4674 ©symbolª. The new ©definitionª is ignored.
4675 ##
4676
4677 Redefinition of Grammar Token
4678
4679 This ©warningª appears when AnaGram encounters a new
4680 definition of the ©grammar tokenª. AnaGram discards the
4681 old definition. The last definition in the syntax file
4682 wins. If you get this warning, check your ©syntax fileª
4683 to make sure you have the grammar token you want.
4684 ##
4685
4686 Redefinition of token
4687
4688 "Redefinition of token, TXXX: <name>" is a ©warningª
4689 message which occurs when AnaGram encounters a
4690 ©definitionª statement and the specified ©grammar tokenª
4691 has already been seen on the left side of a
4692 ©productionª. AnaGram will ignore the definition
4693 statement.
4694 ##
4695
4696 Reduce Action, Reduction
4697
4698 The reduce action, or reduction, is one of the four
4699 actions of a traditional ©parsing engineª. The reduce
4700 action is performed when the parser has succeeded in
4701 matching all the elements of a ©grammar ruleª, and the
4702 next input token is not erroneous. Reducing the grammar
4703 rule amounts to subtracting the length of the rule from
4704 the ©parser stack indexª, identifying the ©reduction
4705 tokenª, stacking its ©semantic valueª and then doing a
4706 ©shift actionª with the reduction token as though it had
4707 been input directly.
4708 ##
4709
4710 Reduce-Reduce Conflict
4711
4712 A grammar has a "reduce-reduce" ©conflictª at some
4713 state if a single token turns out to be a ©reducing
4714 tokenª for more than one ©completed ruleª.
4715 ##
4716
4717 Reducing Token
4718
4719 In a ©parser stateª with more than one ©completed ruleª,
4720 your parser must be able to determine which one was
4721 actually found. Therefore, during analysis of your
4722 grammar, AnaGram examines each completed rule in order
4723 to determine all the states the ©parserª will branch to
4724 once the rule is reduced. These states are called the
4725 "reduction states" for the rule. In any window that
4726 displays ©marked ruleªs, these states may be found in
4727 the ©Reduction Statesª window listed in the ©Auxiliary
4728 Windowsª popup menu.
4729
4730 The acceptable input tokens for those states are the
4731 "reducing tokens" for the completed rules in the state
4732 under investigation. If there is a single token which is
4733 a reducing token for more than one rule, then the
4734 grammar is said to have a ©reduce-reduce conflictª at
4735 that state. If in a particular state there is both a
4736 ©shift actionª and a ©reduce actionª for the same token
4737 the grammar is said to have a ©shift-reduce conflictª in
4738 that state.
4739
4740 Note that a "reducing token" is not the same as a
4741 "©reduction tokenª".
4742 ##
4743
4744 Reduction Choices
4745
4746 "Reduction choices" is a ©configuration switchª which
4747 defaults to off. If it is set, AnaGram will include in
4748 your ©parser fileª a function which will identify the
4749 acceptable choices for ©reduction tokenª in the current
4750 state. This function, of course, is useful only if you
4751 are using ©semantically determined productionsª. The
4752 prototype of this function is:
4753 int $_reduction_choices(int *);
4754 where '$' represents the name of your parser. You must
4755 provide an integer array whose length is at least as
4756 long as the maximum number of reduction choices you
4757 might have. The function will fill the array with
4758 the token numbers of those which are acceptable in the
4759 current state and will return a count of the number of
4760 acceptable choices it found.
4761 ##
4762
4763 reduction_token
4764
4765 "reduction_token" is a field in your ©parser control
4766 blockª. If your grammar uses ©semantically determined
4767 productionsª, your ©reduction procedureªs need a
4768 mechanism to specify which token the rule is to reduce
4769 to. ©PCBª.reduction_token names the variable which
4770 contains the ©token numberª of the ©reduction tokenª.
4771 Prior to calling your reduction procedure, your parser
4772 will set this field to the token number of the default
4773 ©reduction tokenª, i.e., the leftmost syntactically correct token in the
4774 reduction token list for the production being reduced.
4775 If the reduction procedure establishes that a different
4776 reduction token is appropriate, it should store the
4777 appropriate token number in PCB.reduction_token.
4778 ##
4779
4780 Reduction Procedures
4781
4782 The Reduction Procedures window lists the C function
4783 prototypes for the ©reduction procedureªs in your grammar.
4784
4785 When this window is active, the ©syntax fileª window, if
4786 visible, is synchronized with it so you can see the body of
4787 the reduction procedure as well as its usage.
4788 ##
4789
4790 REDUCTION_TOKEN_ERROR
4791
4792 REDUCTION_TOKEN_ERROR is a user definable macro which your ©parserª
4793 invokes when it encounters an inadmissible reduction
4794 token. This error should occur only if your parser uses
4795 ©semantically determined productionsª and your
4796 ©reduction procedureª provides an incorrect ©token
4797 numberª. If you do not define it, AnaGram will define
4798 it so that it will print an error message on stderr and
4799 abort the parse.
4800
4801 ##
4802
4803 Reduction Procedure, Semantic Action
4804
4805 A "reduction procedure", or "semantic action", is a
4806 function you write which your ©parserª executes when it
4807 has identified the grammar rule to which the reduction
4808 procedure is attached in your grammar.
4809
4810 When your parser has identified a particular ©grammar
4811 ruleª, that is to say, a particular sequence of ©tokensª
4812 that you have specified in your grammar, it "reduces"
4813 the production to the token at the head of the
4814 production, or ©reduction tokenª.
4815
4816 If you choose, you can
4817 specify a "reduction procedure" which your parser will
4818 call so that your program can do semantic analysis on
4819 the production just identified. Your reduction procedure
4820 will be called using, as arguments, the ©semantic
4821 valuesª of tokens on the right side of the production.
4822
4823 Your reduction procedure may, if you choose, return a
4824 value which will become the semantic value of the
4825 reduction token. Since many of the tokens in
4826 ©productionªs are there for only syntactic purposes, you
4827 may specify, when you write your grammar, the tokens
4828 whose values are needed as arguments for your reduction
4829 procedure.
4830
4831 To attach a reduction procedure to a grammar rule, just
4832 write it immediately following the rule. There
4833 are two formats for reduction procedures,
4834 depending on the size and complexity of the procedure.
4835
4836 The first form consists of an equal sign followed by a C
4837 expression and a semicolon. When the rule is matched the
4838 expression will be evaluated and its value will be
4839 stacked on the ©parser value stackª as
4840 the value of the reduction token. For example:
4841 =-a;
4842 =myProcedure(x, q);
4843
4844 The second form consists of an equal sign followed by a
4845 block of C code enclosed in curly braces. If you wish to
4846 return a value for the reduction token you have to use a
4847 return statement. For example:
4848 ={
4849 if (x > y) return x;
4850 return x+2y;
4851 }
4852
4853 In both forms of the reduction procedure, ©parameter
4854 assignmentªs may be attached to ©rule elementªs in
4855 order to make their semantic values available to the reduction
4856 procedure. When the reduction procedure is executed,
4857 local variables
4858 will defined with the names specified in the parameter
4859 assignments. The values of these variables
4860 will have been set to the value of the corresponding
4861 token.
4862
4863 If the return value of your reduction procedure is a
4864 C++ object, you may wish to spacify that AnaGram
4865 enclose it in a ©wrapperª so that constructor calls
4866 and destructor calls are made. Otherwise the object
4867 pushed onto and popped from the parser value stack simply by
4868 coercing the stack pointer to the appropriate type.
4869
4870 The reduction procedures in your grammar are summarized
4871 in the ©Reduction Proceduresª window.
4872 ##
4873
4874 Reduction States
4875
4876 The Reduction States window can be accessed via the
4877 ©Auxiliary Windowsª popup menu from any window which displays
4878 ©parser stateª numbers and ©marked ruleªs. If the highlighted
4879 ©grammar ruleª has no marked token, the Reduction States window will
4880 show the states the parse could reach by reducing the rule and
4881 processing the resultant ©reduction tokenª.
4882 ##
4883
4884 Reduction Token
4885
4886 A ©tokenª which appears on the left hand side of a
4887 ©productionª is called a reduction token. It is so
4888 called because when the ©grammar ruleª on the right side
4889 of the production is matched in the input stream, your
4890 ©parserª will "reduce" the sequence of tokens which
4891 matches the rule by replacing the sequence of tokens
4892 with the reduction token.
4893
4894 If more than one
4895 reduction token is specified,
4896 the production is called a ©semantically determined productionª
4897 and your ©reduction procedureª
4898 should choose the appropriate reduction token. If it does not, your parser
4899 will use the first token in the list that is syntactically
4900 correct as the default.
4901
4902 The ©CHANGE_REDUCTIONª macro can be used to specify the reduction
4903 token.
4904
4905 Note that a "reduction token" is not the same as a
4906 "©reducing tokenª".
4907 ##
4908
4909 Reduction Trace
4910
4911 The Reduction Trace window is available from the
4912 ©Conflictsª window and the ©Resolved Conflictsª window.
4913 It can be used in conjunction with the ©Conflict Traceª
4914 to study ©conflictªs. The Reduction Trace represents the
4915 result of taking the reduce option in the conflict state
4916 of the Conflict Trace.
4917 ##
4918
4919 Reentrant Parser
4920
4921 "Reentrant parser" is a ©configuration switchª which defaults to off.
4922 If it is on when AnaGram builds a parser AnaGram will generate code that
4923 passes the pointer to the ©parser control blockª via calling sequences,
4924 rather than using static references to the pcb.
4925
4926 You can use the reentrant parser switch to help make ©thread safe
4927 parsersª.
4928
4929 The reentrant parser switch is compatible with both C and C++.
4930
4931 The reentrant parser switch cannot be used in conjunction with
4932 the ©old styleª switch.
4933
4934 When you have enabled the reentrant parser switch, the ©parse functionª,
4935 the ©initializerª function, and the ©parser value functionª
4936 will be defined to take a pointer to the parser control block as
4937 their sole argument.
4938 ##
4939
4940 Reload Button
4941
4942 The ©File Traceª window includes a reload button to allow
4943 you to reread your ©test fileª after you have modified
4944 it without having to start a new file trace. After the
4945 file has been reread, the file trace is reset.
4946 ##
4947
4948 rename macro
4949
4950 AnaGram uses a number of macros in its generated code.
4951 It is possible, therefore, to run into naming
4952 collisions with other components of your program. The
4953 rename macro ©attribute statementª allows you to change
4954 the name AnaGram uses for a particular macro to avoid
4955 these problems.
4956
4957 For example, in the Microsoft
4958 Foundation Classes, V4.2, there is a class called
4959 "CONTEXT". If you use the ©context stackª option in
4960 AnaGram, your ©parserª will have a macro called
4961 ©CONTEXTª. To avoid the name collision, add the
4962 following attribute statement to any configuration
4963 section in your grammar:
4964 rename macro CONTEXT AG_CONTEXT
4965 Then, simply use "AG_CONTEXT" where you would otherwise
4966 have used "CONTEXT".
4967 ##
4968
4969 reserve keywords
4970
4971 "reserve keywords" is an ©attribute statementª which
4972 can be used to specify a list of ©keywordªs that are
4973 reserved and cannot be used except as explicitly
4974 specified in the grammar. In particular this switch
4975 enables AnaGram to avoid issuing meaningless ©keyword
4976 anomalyª warnings.
4977
4978 AnaGram does not automatically presume that keywords
4979 are also reserved words, since in many grammars there
4980 is no need to specify reserved words.
4981
4982 Reserve keywords statements must be made inside
4983 ©configuration sectionsª. Each statement consists of
4984 the keyword "reserve keywords" followed by a list of
4985 keyword ©tokensª. The tokens must be separated by
4986 commas and the list must be enclosed in braces ({ }).
4987 Each keyword listed will then be treated as a reserved
4988 word.
4989 ##
4990
4991 Reset Button
4992
4993 The Reset button, found on ©File Traceª and ©Grammar
4994 Traceª windows restores the initial configuration of
4995 the trace. This is especially convenient for ©Conflict
4996 Traceª or other ©Auxiliary Traceªs.
4997 ##
4998
4999 Resolved Conflicts
5000
5001 AnaGram creates the Resolved Conflicts window only when
5002 the grammar it is analyzing has ©conflictªs and when
5003 those conflicts have been resolved by ©precedence
5004 declarationªs, by "©stickyª" statements, or in
5005 connection with the explicit use of a token specified in
5006 a ©disregardª statement. The Resolved Conflicts window
5007 shows the conflicts that have been resolved, using the
5008 same format as that of the ©Conflictsª Window. The rule
5009 chosen is marked with an asterisk in the leftmost column
5010 of the window.
5011 ##
5012
5013 Resynchronization
5014
5015 "Resynchronization" is the process of getting your
5016 parser back in step with its input after encountering a
5017 ©syntax errorª. As such, it is one method of ©error
5018 recoveryª. Of course, you would resynchronize only if it
5019 is necessary to continue after the error. There are
5020 several options available when using AnaGram. You could
5021 use the ©auto resynchª option, which causes AnaGram to
5022 incorporate an automatic resynchronizing procedure into
5023 your parser, or you could use the ©error token
5024 resynchronizationª option, which is similar to the
5025 technique used by YACC programmers.
5026 ##
5027
5028 right
5029
5030 "right" controls a ©precedence declarationª, indicating
5031 that all of the listed ©rule elementsª are to be
5032 considered ©right associativeª.
5033 ##
5034
5035 Right Associative
5036
5037 A binary operator is said to be right associative if
5038 an expression with repeated instances of the operator
5039 is to be evaluated from the right. Thus, for example,
5040 when '=' is used as an assignment operator
5041 x = a = b
5042 is normally taken to mean a = b followed by x = a The
5043 assignment operator is said to be right associative.
5044
5045 In ©grammarªs with ©conflictªs, you may use ©precedence
5046 declarationªs to specify that an operator should be
5047 right associative.
5048 ##
5049
5050 Rule Context
5051
5052 The Rule Context window can be accessed via the
5053 ©Auxiliary Windowsª menu in any window that displays
5054 ©grammar ruleªs. AnaGram displays all occurrences in the
5055 ©grammarª of all the ©reduction tokenªs for the rule.
5056 ##
5057
5058 RULE_CONTEXT
5059
5060 RULE_CONTEXT is a macro you may use if you have defined
5061 a ©context stackª. In any reduction procedure,
5062 RULE_CONTEXT will be a pointer to the context value
5063 stacked before the first token of the rule being
5064 reduced. Since the context stack contains an entry for
5065 each token in the rule, you may inspect the context
5066 value for each token in the rule by subscripting
5067 RULE_CONTEXT. RULE_CONTEXT[k] is the context of the
5068 (k-1)th token in the rule.
5069 ##
5070
5071 Rule Coverage
5072
5073 "Rule Coverage" is the name of both a ©configuration
5074 switchª and a window. The configuration switch
5075 defaults to off. If you set it, AnaGram will include
5076 code in your ©parserª to count the number of times your
5077 parser identifies each ©grammar ruleª in your grammar.
5078 To maintain the counts, AnaGram declares, at the
5079 beginning of your parser, an integer array, whose name
5080 is created by appending "_nrc" to your ©parser nameª.
5081 The array contains one counter for each rule you have
5082 defined in your grammar. There are no entries for the
5083 auxiliary rules that AnaGram creates to deal with set
5084 overlaps or ©disregardª statements. In order to identify
5085 positively all the rules that the parser reduces,
5086 AnaGram has to turn off certain optimization features in
5087 your parser. Therefore a parser that has rule coverage
5088 enabled will run slightly slower that one with the
5089 switch off.
5090
5091 In addition, AnaGram creates a pair of functions to
5092 write the counters to a file and to initialize the
5093 counters from a file. The names of these functions are
5094 given by appending "_write_counts" and "_read_counts" to
5095 the name of your parser. The name of the file is given by the
5096 ©coverage file nameª paramater which defaults
5097 to the name of your ©syntax fileª but with the extension ".nrc".
5098
5099 If rule coverage is enabled, AnaGram will also enable the
5100 Rule Coverage option on the ©Browse Menuª. If you select
5101 Rule Coverage, AnaGram will initialize a ©Rule Coverageª
5102 window from the rule count file you select.
5103
5104 AnaGram will
5105 warn you if the rule count file is older than
5106 the syntax file, since under those conditions, the
5107 coverage file might be invalid.
5108 ##
5109
5110 Rule Derivation, Token Derivation
5111
5112 You can use the Rule Derivation and Token Derivation
5113 windows to understand the nature of ©conflictªs in your
5114 grammar. To create these windows, open the ©Conflictsª
5115 window. Move the cursor bar to a ©completed ruleª, that
5116 is, one which has no marked token. Press the right mouse button to pop
5117 up the ©Auxiliary Windowsª menu. You may then select the Rule
5118 Derivation or the Token Derivation.
5119
5120 The Rule Derivation window and the Token Derivation
5121 window, together, show how a ©conflictª, or ambiguity,
5122 has arisen in your grammar. Both windows contain a
5123 sequence of rules, and both begin with the same rule,
5124 the rule which is the root cause of the conflict.
5125
5126 Each subsequent line in the rule derivation is an
5127 ©expansionª of the marked token in
5128 the previous rule. The last rule in the derivation
5129 window is the rule you selected in the Conflicts
5130 window. Thus the rule derivation window shows you how
5131 the rule involved in the conflict derives from the
5132 root.
5133
5134 Each subsequent line in the token derivation window
5135 shows an expansion of the marked token in the previous rule. The first
5136 token of the last rule in the derivation window is the token that
5137 causes the conflict. This is the usage that is inconsistent with other
5138 usages of this token in the conflict state.
5139
5140 The Rule Derivation and Token Derivation windows each
5141 have five auxiliary windows. The ©Rule Contextª window
5142 is keyed to the highlighted rule. the other four
5143 windows, the ©Expansion Rulesª window, the
5144 ©Productionsª window, the ©Set Elementsª window and the
5145 ©Token Usageª window are keyed to the marked token.
5146 Remember that there is no marked token on the last
5147 line of the Rule Derivation window.
5148 ##
5149
5150 Rule Element
5151
5152 A ©grammar ruleª is a list of "rule elements", separated
5153 by commas. Rule elements may be ©token nameªs,
5154 ©character setsª, ©keywordªs, ©immediate actionªs, or
5155 ©virtual productionsª. When AnaGram encounters a rule
5156 element for which no token presently exists, it creates
5157 one.
5158
5159 Any rule element may be followed by a ©parameter assignmentª
5160 in order to make the ©semantic valueª of
5161 the rule element available to a ©reduction procedureª.
5162 ##
5163
5164 Rule Number
5165
5166 AnaGram assigns a unique rule number to each ©grammar
5167 ruleª that you specify in your grammar. Rules are
5168 numbered sequentially as they are encountered in the
5169 ©syntax fileª. AnaGram constructs rule 0 itself. Rule
5170 zero has a single ©rule elementª, the ©grammar tokenª,
5171 unless you have an ©disregardª statement in your
5172 grammar. In this case, there will be two elements.
5173
5174 In AnaGram displays, rule numbers are displayed with a
5175 prefixed 'R' and a three digit decimal number.
5176 ##
5177
5178 Rule Stack, Rule Stack Pane
5179
5180 The Rule Stack pane appears across the bottom of a ©Grammar
5181 Traceª or ©File Traceª window. It provides an alternate view of the
5182 parser stack for the trace, showing, for each state, rules instead of
5183 the tokens that you see in the ©Parser Stack paneª. Because it is
5184 synched with the syntax file window, the Rule Stack makes it easy to
5185 see the relationship between the trace and your grammar.
5186
5187 For each level of the parser stack, the Rule Stack shows the ©parser
5188 stateª number and all the active rules. The active rules at any
5189 state consist of all the ©expansion ruleªs for the state that are
5190 consistent with the input at all subsequent states.
5191
5192 Except for the last level
5193 of the stack, each rule has a ©marked tokenª, which in the default
5194 configuration is displayed in bold, italic type. The significance of
5195 the marked token is that all tokens in the rule to the left of the
5196 marked token have already been matched in the input, and the input
5197 in subsequent levels is consistent so far with the marked
5198 token. As more input is processed, rules
5199 that are inconsistent with the new input are deleted from the display.
5200
5201 The last level of the stack shows the current state of the parser and
5202 the rules against which the ©lookahead tokenª will be matched. At
5203 this level, there may be rules with no marked tokens. These are
5204 rules which have been matched exactly in the input. If there is
5205 more than one such rule, at the next parser step the parser will use
5206 the lookahead token to determine which rule to reduce.
5207
5208 In the last level of the stack, marked tokens represent the input the
5209 parser expects to see.
5210
5211 The Rule Stack pane is synched with the ©syntax fileª window if it is
5212 visible so that the rule highlighted in the Rule Stack can be seen
5213 in context in the syntax file.
5214 For rules that AnaGram
5215 generated automatically (to implement ©virtual productionsª
5216 or the ©disregardª statement). the cursor bar will move to the
5217 top of the syntax file window.
5218
5219 The Rule Stack pane is also synched with the other panes in the trace.
5220 As you move the cursor bar in the Rule Stack, the cursor bar in the
5221 Parser Stack pane will track the stack level in the Rule Stack. In
5222 a File Trace, text will be highlighted in the ©Test Fileª pane
5223 corresponding to the selected token in the Parser Stack pane. In a
5224 Grammar Trace, the marked token in the highlighted rule will be
5225 highlighted in the ©Allowable Input paneª.
5226
5227 Clicking the right mouse button pops up an ©Auxiliary Windowsª menu to
5228 give you more information about the highlighted rule.
5229 ##
5230
5231 Rule Table
5232
5233 The Rule Table lists, in numerical order, all the
5234 ©grammar ruleªs defined in your ©grammarª. Each rule is
5235 preceded by the ©nonterminalª tokens which produce it.
5236 If you are not using ©semantically determined
5237 productionªs, then there will be precisely one token
5238 line per rule. The Rule Table is synched to your ©syntax
5239 fileª to show the rule in context.
5240 ##
5241
5242 Semantic Value, Token Value
5243
5244 A ©tokenª generally has a "semantic value", or "token
5245 value", as well as the ©token numberª which identifies
5246 it syntactically. Each instance of the token in the
5247 input stream can have a different value. For example,
5248 you might have a token called "variable name". In one
5249 instance the variable name might be "widget" and in
5250 another, "wombat". Then "widget" and "wombat" would be
5251 the semantic values in the two instances. Another token
5252 might have numeric semantic values.
5253
5254 You can specify the C or C++ ©data typeª of the token value.
5255 The data type of "variable name" could be "char *"
5256 where the value is a pointer to a string holding the name. There
5257 are separate default types for the values of ©terminalª
5258 and ©nonterminalª tokens. In the usual case of ordinary
5259 character input, the value of a terminal token is just
5260 the ascii character code.
5261
5262 The value of a nonterminal token is determined by the ©reduction procedureªs
5263 attached to the rules the token produces. If there is no reduction
5264 procedure, the value of the token is the value of the first token
5265 in the rule.
5266
5267 It should be noted that the stack operations have been
5268 implemented in such a way that a C++ object that belongs
5269 to a class for which the assignment operator has been
5270 overridden will encounter serious problems. This shortcoming
5271 will be addressed in a future version of AnaGram. Note that
5272 there is no problem with using a pointer to any C++ object.
5273 ##
5274
5275 Semantically Determined Production
5276
5277 A "semantically determined production" is one which has
5278 more than one ©reduction tokenª specified on the left
5279 side of the ©productionª. You would write such a
5280 production when the reduction tokens are syntactically
5281 indistinguishable. The ©reduction procedureª may then
5282 specify which of the listed reduction tokens the grammar
5283 rule is to reduce to based on semantic considerations.
5284 If there is no reduction procedure, or the reduction
5285 procedure does not specify a reduction token, the parser
5286 will use the first syntactically correct one in the list.
5287
5288 To simplify changing the reduction token, AnaGram
5289 provides a predefined macro, ©CHANGE_REDUCTIONª.
5290
5291 The ©semantic valueªs of all the reduction tokens for a
5292 given semantically determined production must have the
5293 same ©data typeª.
5294
5295 ©File Traceª and ©Grammar Traceª have a ©Reduction Choices paneª which
5296 appears when a semantically determined production is invoked and
5297 you need to choose a reduction token.
5298 ##
5299
5300 Set Elements
5301
5302 The Set Elements window is available via the ©Auxiliary
5303 Windowsª popup menu from windows which specify character sets,
5304 partition sets or tokens. It displays the actual
5305 characters which make up the set, or which map to the
5306 specified token. For each character, the numeric code as
5307 well as its display symbol is given.
5308 ##
5309
5310 Set Expression, Expression
5311
5312 A set expression is an algebraic expression used to
5313 define a ©character setª in terms of individual
5314 characters, ranges of characters, or other sets of
5315 characters as constructed using ©complementsª, ©unionsª,
5316 ©intersectionsª, and ©differencesª.
5317 ##
5318
5319 Shift Action
5320
5321 The shift action is one of the four actions of a
5322 traditional ©parsing engineª. The shift action is
5323 performed when the input token matches one of the
5324 acceptable input tokens for the current ©parser stateª.
5325 The ©semantic valueª of the token and the current
5326 ©state numberª are stacked, the ©parser stack indexª is
5327 incremented and the state number is set to a value
5328 determined by the previous state and the input token.
5329 ##
5330
5331 Shift-Reduce Conflict
5332
5333 A "shift-reduce" ©conflictª occurs if in some ©parser
5334 stateª there exists a ©terminal tokenª that should be
5335 shifted, because it is legitimate input for one of the
5336 ©grammar ruleªs of the state, but should also be used to
5337 reduce some other rule because it is a ©reducing tokenª
5338 for that rule.
5339 ##
5340
5341 sn
5342
5343 sn is a field in a ©parser control blockª to which your
5344 ©error handlingª routines and your ©reduction
5345 procedureªs may refer. Its value is the current ©state
5346 numberª of your ©parserª. sn is modified every time
5347 your parser "shifts" (performs a ©shift actionª on) a
5348 token or reduces (performs a ©reduce actionª on) a
5349 ©productionª.
5350 ##
5351
5352 ss
5353
5354 ss is a field in a ©parser control blockª to which your
5355 ©error handlingª and ©reduction procedureªs may refer.
5356 It is the ©state stackª for your ©parserª. Before every
5357 ©shift actionª, the current ©state numberª, ©snª, is
5358 stored in PCB.ss[PCB.ssx], where ©ssxª is the ©parser
5359 stack indexª. PCB.ssx is then incremented.
5360 ##
5361
5362 ssx
5363
5364 ssx is a field in a ©parser control blockª to which
5365 your ©error handlingª routines and ©reduction
5366 procedureªs may refer. It is the ©parser stack indexª
5367 for your ©parserª. On every ©shift actionª it is
5368 incremented. On every ©reduce actionª the length of
5369 the ©grammar ruleª being reduced is subtracted from
5370 PCB.ssx.
5371 ##
5372
5373 State Definition
5374
5375 The State Definition window can be accessed via the
5376 ©Auxiliary Windowsª popup menu from any window that specifies
5377 states. It displays the ©characteristic rulesª that
5378 define the state. The rules are displayed with a marked token, which is
5379 the next token needed in the input if the particular ©grammar ruleª is
5380 to be matched. If the rule is a completed rule, no token will be
5381 marked.
5382
5383 Each line contains the state number, blank if it is the
5384 same as the state number of the previous line, the ©rule
5385 numberª, and finally the ©marked ruleª.
5386
5387 The ©State Definition Tableª, found in the ©Browse
5388 Menuª, displays the characteristic rules for all states
5389 in the ©grammarª.
5390 ##
5391
5392 State Definition Table
5393
5394 The State Definition Table lists, for each ©parser
5395 stateª, all of the ©characteristic rulesª which define
5396 that state. The rules are displayed with a ©marked tokenª, which is the
5397 next token needed in the input if the particular ©grammar ruleª is to
5398 be matched. If the rule is a completed rule, no token will be
5399 marked.
5400
5401 Each line contains the state number, blank if it is the
5402 same as the state number of the previous line, the ©rule
5403 numberª, and finally the ©marked ruleª.
5404
5405 In the ©Auxiliary Windowsª menu for many states there is
5406 a ©State Definitionª entry which provides the
5407 characteristic rules for the ©parser stateª identified by
5408 the cursor bar.
5409 ##
5410
5411 State Expansion
5412
5413 The State Expansion window may be accessed using the
5414 ©Auxiliary Windowsª menu from any window that identifies
5415 a particular ©parser stateª. It shows the complete set
5416 of ©expansion ruleªs for the state, consisting of the
5417 union of the set of ©characteristic ruleªs and, for each
5418 characteristic rule, the set of expansion rules for the
5419 marked token. Thus the State
5420 Expansion window shows all possible legal input to your
5421 parser in the given state.
5422 ##
5423
5424 Sticky
5425
5426 "Sticky" statements are ©attribute statementªs and may
5427 be used just like a ©precedence declarationª to resolve
5428 ©conflictªs. If a ©shift-reduce conflictª occurs in a
5429 state where the ©characteristic tokenª is "sticky", the
5430 shift action will always be chosen.
5431
5432 Sticky statements must be made inside ©configuration
5433 sectionsª. Each statement consists of the keyword
5434 "sticky" followed by a list of ©tokensª. The tokens must
5435 be separated by commas and the list must be enclosed in
5436 braces ({ }). Each token will then be treated as sticky.
5437
5438 All conflicts which are resolved by sticky statements
5439 are listed in the ©Resolved Conflictsª window.
5440 ##
5441
5442 subgrammar
5443
5444 Declaring a nonterminal token to be a "subgrammar"
5445 changes the way AnaGram searches for reducing tokens.
5446
5447 Normally, if there is a completed rule in a particular
5448 state, AnaGram investigates all states to which the
5449 parser could jump on reducing the rule. It then
5450 considers all terminal tokens that are acceptable input
5451 in these states to be reducing tokens for the given
5452 rule. If this set of tokens overlaps the set of tokens
5453 for which there are shift actions, or the set of tokens
5454 which reduce a different rule, there is a ©conflictª.
5455
5456 Now consider a particular nonterminal token T and all
5457 the rules it produces, whether directly or indirectly.
5458 What the preceding remarks mean is that in determining
5459 the reducing tokens for any of these rules, AnaGram
5460 considers not only the definition, but also the usage
5461 of T.
5462
5463 There are circumstances when it is inappropriate to
5464 consider the usage of T. The most common example occurs
5465 when building a lexical scanner for a language such as
5466 C. In this case, you can write a complete grammar for a
5467 C token with no difficulty. But if you try to extend it
5468 to a sequence of tokens, you get scores of conflicts.
5469 This situation arises because you specify that any C
5470 token can follow another, when in actual practice, an
5471 identifier, for example, cannot follow another
5472 identifier without some intervening space or
5473 punctuation. While it is theoretically possible to write
5474 a grammar for a sequence of tokens that has no
5475 conflicts, it is not usually pretty.
5476
5477 The subgrammar declaration resolves this problem by
5478 telling AnaGram that when it is looking for reducing
5479 tokens for any rule produced directly or indirectly by a
5480 subgrammar token, it should disregard the usage of the
5481 token and only consider usage internal to the definition
5482 of the subgrammar token, as though the subgrammar token
5483 were the start token of the grammar.
5484
5485 The subgrammar declaration is made in a ©configuration
5486 sectionª and consists of the keyword "subgrammar"
5487 followed by a list of token names separated by
5488 commas and enclosed in braces ({ }). For example:
5489 subgrammar { name, number}
5490 ##
5491
5492 Suspicious Production
5493
5494 This ©warningª message appears when AnaGram finds a
5495 ©productionª of the form x -> x. There is probably a
5496 typo somewhere in your ©syntax fileª. This production
5497 causes a ©conflictª in your grammar. AnaGram leaves
5498 this production in your ©grammarª, but if you build a
5499 parser, it will never succeed in recognizing this
5500 production.
5501 ##
5502
5503 Switch Takes on/off Values Only
5504
5505 The specified parameter is a ©configuration switchª. The
5506 only values it may be assigned are ON and OFF.
5507
5508 ##
5509
5510 Symbol
5511
5512 In writing your ©grammarª you use symbols, or names, to
5513 represent most of your ©tokensª. You may also use
5514 symbols to represent ©character setªs, ©virtual
5515 productionªs, ©immediate actionªs, or ©keywordªs.
5516
5517 A symbol, or name, must begin with a letter or an
5518 underscore. It may then contain any number of these
5519 characters as well as digits and embedded white space
5520 (including comments). For identification purposes all
5521 adjacent white space characters within a symbol name
5522 are considered to be a single blank.
5523
5524 Upper case and lower case letters are considered to be
5525 different.
5526
5527 Examples:
5528 token name
5529 token/*embedded comment*/name
5530
5531 All symbols used in your grammar are listed in
5532 the ©Symbol Tableª window found in the ©Browse Menuª.
5533 ##
5534
5535 Symbol Table
5536
5537 The Symbol Table lists all the symbols, or names, you
5538 used in your grammar. ©Symbolªs may be used, of course,
5539 to identify ©tokensª, ©definitionsª, ©virtual
5540 productionsª, ©immediate actionªs, or ©keywordªs.
5541
5542 Each line in this table identifies a single symbol. The
5543 first field is the token number, if any. This is
5544 followed by the name. If the name identifies an
5545 ©expressionª or virtual production, it is followed by an
5546 equal sign and the expression or virtual production.
5547 ##
5548
5549 Syntax Analysis Aborted
5550
5551 This ©warningª message appears if, because of previous
5552 errors, AnaGram is unable to complete the ©Analyze
5553 Grammarª command on your ©syntax fileª.
5554 ##
5555
5556 Syntax Directed Parsing
5557
5558 Syntax directed parsing, or formal parsing, is an
5559 approach to building ©parsersª based on formal language
5560 theory. Given a suitable description of a language,
5561 called a ©grammarª, there are algorithms which can be
5562 used to create parsers for the language automatically.
5563 In this context, the set of all possible inputs to a
5564 program may be considered to constitute a language, and
5565 the rules for formulating the input to the program
5566 constitute the grammar for the language.
5567
5568 The parsers built from a grammar have the advantage
5569 that they can recognize any input that conforms to the
5570 rules, and can reject as erroneous any input that fails
5571 to conform.
5572
5573 Since the program logic necessary to parse input is
5574 often extremely intricate, programs which use formal
5575 parsing are usually much more reliable than those built
5576 by hand. They are also much easier to maintain, since
5577 it is much easier to modify a grammar specification
5578 than it is to modify complex program logic.
5579 ##
5580
5581 Syntax Error
5582
5583 When you specify a ©grammarª, you specify a set of
5584 input character or token sequences which your ©parserª
5585 will "recognize". Usually it is possible for there to
5586 be other sequences of input tokens which deviate from
5587 the rules set down by your grammar. Should your parser
5588 find such a sequence in its input which is not
5589 explicitly allowed for in your grammar, it is said to
5590 have found a "syntax error". The general treatment of
5591 syntax errors is called ©error handlingª, of which there
5592 are two distinct aspects: ©error diagnosisª and ©error
5593 recoveryª. AnaGram allows you to make provision for
5594 error handling to fit your needs, but should you not do
5595 so, it will provide simple default error handling.
5596 ##
5597
5598 Statements
5599
5600 AnaGram source files, or ©syntax fileªs, consist of
5601 the following types of statements:
5602 ©productionªs
5603 ©configuration sectionªs
5604 ©embedded Cª
5605 ©definitionªs
5606 ©token declarationªs
5607
5608 Statements may be in any order. Each statement must
5609 begin on a new line. If a statement cannot be
5610 construed as complete, it may continue onto another
5611 line.
5612
5613 Statements may contain spaces, tabs or comments, but
5614 may not contain blank lines.
5615 ##
5616
5617 Syntax File
5618
5619 Input files to AnaGram are called syntax files. The
5620 default extension for syntax files is .syn. A
5621 syntax file contains a "©grammarª" and supporting C or
5622 C++ code. The file consists of several distinct types
5623 of statements. These are ©token declarationsª,
5624 ©productionªs, ©definitionsª, ©embedded Cª, and
5625 ©configuration sectionsª. There may be as many of each
5626 as you need, in whatever order you find convenient.
5627
5628 Each such statement begins on a new line.
5629 ##
5630
5631 SYNTAX_ERROR
5632
5633 SYNTAX_ERROR is a macro which your parser will invoke
5634 when it encounters a syntax error in its input stream.
5635 If you have set the ©diagnose errorsª ©configuration
5636 switchª, the static variable ©PCBª.©syntax_errorª will
5637 contain a pointer to a diagnostic message when
5638 SYNTAX_ERROR is invoked. If you have also set the
5639 ©error frameª switch, ©PCBª.©error_frame_ssxª and
5640 ©PCBª.©error_frame_tokenª will also be set
5641 appropriately.
5642 ##
5643
5644 Tab Spacing
5645
5646 "tab spacing" is a ©configuration parameterª which
5647 controls the expansion of tabs when AnaGram displays
5648 your source file or test files in the ©File Traceª window.
5649
5650 The value of "tab spacing" is also used to set the
5651 default value of the ©TAB_SPACINGª macro in your parser.
5652
5653 The default value of "tab spacing" is 8. If you prefer
5654 a different value, you should probably include an
5655 appropriate statement in your ©configuration fileª. For
5656 example:
5657
5658 tab spacing = 2
5659 ##
5660
5661 TAB_SPACING
5662
5663 If you have enabled the ©lines and columnsª switch, your
5664 parser needs to know tab spacing in order to increment
5665 the column count when it encounters a tab character. It
5666 is set up to use the value given by the TAB_SPACING
5667 macro. If you do not define TAB_SPACING in your parser,
5668 AnaGram will provide a default definition, setting it to
5669 the value of the ©tab spacingª ©configuration
5670 parameterª.
5671 ##
5672
5673 Terminal, Terminal Token
5674
5675 A "terminal token" is a token which does not appear on
5676 the left side of a ©productionª. It represents,
5677 therefore, a basic unit of input to your ©parserª. If
5678 the input to your parser consists of ascii characters,
5679 you may define terminal tokens explicitly as ascii
5680 characters or as sets of ascii characters. If you have a
5681 lexical scanner, or preprocessor, which produces numeric
5682 codes, you may define the terminal tokens directly in
5683 terms of these numeric codes.
5684 ##
5685
5686 Test File Binary
5687
5688 "Test file binary" is a ©configuration switchª which
5689 defaults to off. When it is off, and you select the
5690 ©File Traceª option, AnaGram will read your test files
5691 in "text" mode, discarding carriage return characters.
5692 When "test file binary" is on, AnaGram will read test
5693 files in "binary" mode, preserving carriage return
5694 characters.
5695
5696 If your parser needs to recognize carriage return
5697 characters explicitly, you should turn "test file
5698 binary" on.
5699 ##
5700
5701 Test File Mask
5702
5703 "Test file mask" is a string-valued ©configuration
5704 parameterª which AnaGram uses to set up the file dialog
5705 for the ©File Traceª command. It defaults to "*.*". If
5706 there is a conventional file name format for the input
5707 to the ©parserª you are developing, you will probably
5708 want to set "test file mask" in a ©configuration
5709 sectionª in your ©syntax fileª so it is easier to pick
5710 out your test files.
5711 ##
5712
5713 Test range
5714
5715 "Test range" is a ©configuration switchª which defaults
5716 to on. When it is set, i.e., on, AnaGram will configure
5717 your parser so that it checks input characters to
5718 verify that they are within the range given by the
5719 ©character universeª before it indexes the ©token
5720 conversionª table. If range testing is not necessary
5721 for your parser, you may turn test range off and get a
5722 slight improvement in the performance of your parser.
5723 ##
5724
5725 Thread Safe Parsers
5726
5727 AnaGram 2.01 incorporates several changes designed to make it
5728 easier to write thread safe parsers.
5729
5730 First, the ©parserªs generated by AnaGram 2.01 no longer use static or global
5731 variables to store temporary data. All nonconstant data have been
5732 moved to the ©parser control blockª.
5733
5734 Second, two new features which make it substantially
5735 easier to build thread safe parsers have been added. The ©reentrant parserª switch
5736 makes the entire parser reentrant, by passing the pointer to the parser control
5737 block as an argument on all function calls. The ©extend pcbª statement allows
5738 you to add your own variable declarations to the ©parser control
5739 blockª so you can avoid references to global or static variables in
5740 your ©reduction procedureªs.
5741
5742 Third, new support has been added for C++ classes, including
5743 the ©wrapperª statement and the ©PCB_TYPEª macro.
5744 ##
5745
5746 token_number
5747
5748 token_number is a field in a ©parser control blockª to
5749 which your ©error handlingª procedures and ©reduction
5750 procedureªs may refer. It contains the actual ©token
5751 numberª of the current input token. If you are supplying
5752 token numbers directly, it is the result of using the
5753 actual input character to index the ©token conversionª
5754 array, ag_tcv.
5755 ##
5756
5757 Token
5758
5759 Tokens are the units with which your parser works.
5760 There are two kinds of tokens: ©terminal tokensª and
5761 ©nonterminal tokensª. These latter are identified by the
5762 parser as sequences of tokens. The grouping of tokens
5763 into more complex tokens is governed by the ©grammar
5764 rulesª, or ©productionªs in your grammar. In your
5765 grammar, tokens are denoted by ©token nameªs, ©virtual
5766 productionsª, explicit ©character representationsª,
5767 ©keywordªs, ©immediate actionªs, or ©expressionªs which
5768 yield ©character setsª.
5769 ##
5770
5771 Token Conversion
5772
5773 By using ©character setª ©expressionªs, you may in your
5774 ©syntax fileª define a number of input characters as
5775 being syntactically equivalent. When your ©parserª gets
5776 an input character, it uses the character code to index
5777 a table called ©ag_tcvª. The value it extracts from this
5778 table is the ©token numberª for the input character. The
5779 actual character code of the input character becomes the
5780 ©token valueª.
5781 ##
5782
5783 Token Declaration
5784
5785 A token declaration is simply a ©productionª with no
5786 right hand side. Token declarations can be used to
5787 define the ©data typeªs of tokens. To define the data type
5788 of a token, simply put the data type in parentheses
5789 preceding the name of the token. You can use a list of
5790 tokens joined by commas, if you wish. Thus:
5791 (char *) variable name, function name
5792 could be used to specify that the ©semantic valueªs of
5793 the tokens "variable name" and "function name" are both
5794 character pointers.
5795
5796 Of course, token types may be specified as part of any
5797 production the token generates, but sometimes, in the
5798 interest of clarity, it is advisable to group all
5799 declarations together.
5800 ##
5801
5802 Token Name
5803
5804 All ©nonterminal tokensª that you define in your
5805 ©grammarª by means of explicit ©productionªs must have
5806 names by which they may be referenced. Token names are
5807 ©symbolsª which represent the token syntactically in
5808 your grammar specification.
5809 ##
5810
5811 Token Names
5812
5813 "Token names" is a ©configuration switchª that defaults
5814 to off. If it is set, it causes AnaGram to include in
5815 the ©parser fileª a static array of character strings, indexed by
5816 token number, which provides ascii representations of token
5817 names. The name of this array is given by "<parser name>_token_names",
5818 where <parser name> is the name of the parser function as
5819 given by the value of the ©parser nameª parameter.
5820
5821 AnaGram also defines a macro, ©TOKEN_NAMESª, which evaluates
5822 to the name of the array.
5823
5824 The array contains strings for all grammar tokens which have
5825 been explicitly named in the syntax file as well as tokens
5826 which represent ©keywordªs or single character constants.
5827
5828 The array is useful in creating ©syntax errorª diagnostics.
5829
5830 Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained
5831 strings only for explicitly named tokens. If this restriction
5832 is required, set the ©token names onlyª switch.
5833
5834 Token names are also included if the ©diagnose errorsª
5835 switch is set.
5836 ##
5837
5838 TOKEN_NAMES
5839
5840 "TOKEN_NAMES" is the name of a macro that AnaGram defines to
5841 provide access to a static array of character strings indexed by
5842 token number, which provides ascii representation of token
5843 names. The array is generated if any of the ©token namesª,
5844 ©token names onlyª or ©diagnose errorsª switches are ON.
5845
5846 If ©token names onlyª is set, the array contains non-empty
5847 strings only for those tokens which are explicitly named
5848 in the syntax file. Otherwise, the array also contains
5849 strings for tokens which represent keywords or single
5850 character constants.
5851 ##
5852
5853
5854 token names only
5855
5856 "Token names only" is a ©configuration switchª that defaults to
5857 off. If it is set, it will cause AnaGram to include in the
5858 parser file a static array containing the names of the tokens
5859 in your grammar. This array will include only those tokens
5860 to which you have assigned names explicitly and will not
5861 include character constants or keywords. "Token names only"
5862 takes precedence over ©token namesª.
5863 ##
5864
5865 Token Not Used
5866
5867 "Token not used, TXXX: <token name> is a ©warningª
5868 message which appears if AnaGram finds an unused ©tokenª
5869 in your ©grammarª. Often an unused token is the result
5870 of an oversight of some kind and indicates a problem in
5871 the grammar.
5872 ##
5873
5874 Token Number
5875
5876 AnaGram assigns a unique number, called the "token
5877 number" to each token in the grammar, no matter whether
5878 it is a ©terminal tokenª or a ©nonterminal tokenª. Your
5879 parser does all of its analysis of your input stream
5880 using token numbers as its primary material.
5881
5882 You may need to know the values of token numbers that
5883 AnaGram has assigned, either so a lexical scanner can
5884 output correct token numbers, or so a ©reduction
5885 procedureª can correctly resolve a ©semantically
5886 determined productionª.
5887
5888 To help you, AnaGram defines enumeration constants for
5889 each of the named tokens in your grammar. The definition
5890 of these constants is in the ©parser headerª file.
5891 ##
5892
5893 Token Representation
5894
5895 Not all of the ©tokensª in your grammar have a ©token
5896 nameª. Some of the tokens may represent ©character setsª
5897 which you spelled out explicitly, ©virtual productionsª,
5898 ©immediate actionªs, or ©keywordªs. In its analysis
5899 tables, AnaGram tries to provide a meaningful
5900 representation for tokens whenever it can. Its first
5901 choice is to use the name, if it has one. Otherwise it
5902 will use the set definition or the definition of the
5903 virtual production if one exists. If AnaGram cannot
5904 otherwise represent your token, it will resort to using
5905 the token number which it normally represents using the
5906 letter T followed by a three digit, zero-padded token
5907 number.
5908 ##
5909
5910 Token Table
5911
5912 The Token Table lists all the tokens of your grammar.
5913 The first field is the token number. It is followed by a
5914 flag field which is "zl" if the token is a ©nonterminal
5915 tokenª and is ©zero lengthª. If the token is nonterminal
5916 and not zero length, the flag field contains "nt". If
5917 the token is a ©terminal tokenª, the field is blank.
5918
5919 The next field is blank unless the token has been
5920 declared ©stickyª or has had a ©precedenceª level
5921 assigned. If the token is sticky, this field will
5922 contain 's'. If a precedence level has been assigned,
5923 this field will contain the letter 'l', 'r', or 'n' to
5924 indicate associativity followed by the precedence
5925 level. Finally there is the ©data typeª of the ©semantic
5926 valueª of this token and the ©token representationª.
5927 ##
5928
5929 Token Usage
5930
5931 The Token Usage table may be accessed via the ©Auxiliary
5932 Windowsª menu from any window that identifies tokens. It
5933 shows all the rules in the grammar that use the token.
5934 ##
5935
5936 Top Margin
5937
5938 "Top margin" is an ©obsolete configuration parameterª.
5939 ##
5940
5941 Trace Coverage
5942
5943 Trace Coverage is a table which is built whenever you
5944 run ©Grammar Traceª, one of its pre-built versions, or a ©File
5945 Traceª. You can access it from the ©Browse Menuª. It shows the number
5946 of times each rule in your grammar has been reduced. Unless you have
5947 set the ©Rule Coverageª ©configuration switchª, some ©null productionªs
5948 and some rules that consist of only one element will not be counted
5949 because of speed optimizations in the parser tables.
5950
5951 The Trace Coverage tables are reset to zero when you load a new syntax
5952 file or start AnaGram.
5953 ##
5954
5955 Compound Action
5956
5957 Traditionally, ©LALR-1 parserªs use only four simple
5958 ©parser actionªs: shift, reduce, accept and error.
5959 AnaGram parsers use a number of compound actions
5960 in order to reduce the size of parse tables and
5961 speed up processing. A single compound action
5962 may replace several simple shift or reduce actions.
5963
5964 The ©Traditional Engineª ©configuration switchª may
5965 be used to force AnaGram to use only the simple
5966 actions.
5967 ##
5968
5969 Traditional Engine
5970
5971 "Traditional engine" is a ©configuration switchª that
5972 defaults to off. Traditional ©LALR-1 parserªs use a
5973 ©parsing engineª which has only four actions:
5974  ©shift actionª
5975  ©reduce actionª
5976  ©accept actionª
5977  ©error actionª
5978
5979
5980 AnaGram, in the interest of
5981 faster execution and more compact parse tables,
5982 uses a parsing engine with a number of
5983 short-cut, or ©compound actionªs. The "traditional engine" switch tells
5984 AnaGram not to use the short-cut actions.
5985
5986 You would turn this switch on if you wished to use the ©Grammar Traceª
5987 or ©File Traceª to see how the standard four parser actions work for
5988 a particular combination of grammar and input. Note that to see the
5989 effects of single parser actions, you must use the ©Single Stepª
5990 button. Remember that in the Grammar Trace, when you single step and
5991 the token you have selected causes a reduce action, it will appear
5992 on the ©lookahead lineª of the ©parser stack paneª and will be preselected
5993 in the ©allowable input paneª until it is finally shifted in to
5994 the parser stack.
5995
5996 Normally, you should leave the "traditional engine" switch off, Then
5997 AnaGram will, whenever possible, compress several parsing actions into
5998 one compound action in order to speed execution of the parser.
5999
6000 Unfortunately use of the term "traditional" has sometimes created the
6001 impression that there is a conservative aspect to the operation of
6002 traditional engine parsers. This is not the case. They have the same
6003 effect, but are slower and have much larger tables.
6004 ##
6005
6006 Type Redefinition
6007
6008 "Type Redefinition of TXXX: <token name> is a ©warningª
6009 message which appears when AnaGram finds a conflicting
6010 ©data typeª definition for a ©tokenª in your ©grammarª.
6011 The new definition will override the previous one. If
6012 you intend to use different type definitions, you should
6013 use extreme caution and check the generated code to
6014 verify that your ©reduction procedureªs are getting the
6015 values you intended.
6016 ##
6017
6018 Undefined Symbol
6019
6020 "Undefined symbol: <name>" is a ©warningª message which
6021 appears when AnaGram encounters an undefined ©symbolª
6022 while evaluating a ©character setª expression. The
6023 following warning in the ©Warningsª window identifies
6024 the particular ©tokenª AnaGram was trying to evaluate.
6025 ##
6026
6027 Undefined Token
6028
6029 "Undefined token TXXX: <name>" is a ©warningª message
6030 which appears when the indicated ©tokenª has been used
6031 in the ©grammarª, but there is no definition of it as a
6032 ©terminal tokenª nor does any ©productionª define it as
6033 a ©nonterminal tokenª.
6034 ##
6035
6036 Unexpected
6037
6038 "Unexpected <element 1> in <element 2>" is a ©warningª
6039 message which you may get when AnaGram analyzes your
6040 grammar. It appears when AnaGram unexpectedly encounters an instance of
6041 syntactic element 1 at the specified location in an instance of
6042 syntactic element 2. AnaGram cannot reliably continue parsing its
6043 input. Therefore, it limits further analysis to scanning for syntax
6044 errors. If this error is not the result of a prior error, you should
6045 correct your ©syntax fileª. Remember that this error could result from
6046 something missing just as well as from something extraneous.
6047
6048 If element 1 is ©eofª, it often means that you have
6049 an unbalanced brace or comment delimiter in the code
6050 following the indicated location.
6051 ##
6052
6053 Union
6054
6055 The union of two sets is the set of all elements that
6056 are to be found in one or another of the two sets. In an
6057 AnaGram syntax file the union of two ©character setsª A
6058 and B is represented using the plus sign, as in A + B.
6059 The union operator has the same precedence as the
6060 ©differenceª operator: lower than that of ©intersectionª
6061 and ©complementª. The union operator is ©left
6062 associativeª.
6063
6064 Watch out! In an AnaGram syntax file 65 + 97 represents
6065 the character set which consists of the lower case 'a'
6066 and upper case 'A'. It does not represent 162, the sum
6067 of 65 and 97.
6068 ##
6069
6070 Video mode
6071
6072 "Video mode" is an ©obsolete configuration parameterª.
6073 ##
6074
6075 Virtual Production
6076
6077 Virtual productions are a special short hand
6078 representation of ©grammar rulesª which can be used to
6079 indicate a choice of inputs. They are an important
6080 convenience, especially useful when you are first
6081 building a grammar.
6082
6083 Here are some examples of virtual productions:
6084 name? // optional name
6085 name?... // 0 or more instances of name
6086 {name | number} // exactly one name or number
6087 {name | number}... // one or more instances of name or number
6088 [name | number] // optional choice of name or number
6089 [name | number]... // zero or more instances of name or number
6090
6091 AnaGram rewrites virtual productions, so that when you
6092 look at the syntax tables in AnaGram, there will be
6093 actual ©productionªs replacing the virtual productions.
6094
6095 A virtual production appears as one of the rule
6096 elements in a grammar rule, i.e. as one of the members
6097 of the list on the right side of a production.
6098
6099 The simplest virtual production is the "optional"
6100 token. If x is an arbitrary token, x? can be used to
6101 indicate an optional x.
6102
6103 Related virtual productions are x... and x?... where
6104 the three dots indicate repetition. x... represents an
6105 arbitrary number of occurrences of x, but at least one.
6106 x?... represents zero or more occurrences of x.
6107
6108 The remaining virtual productions use curly or square
6109 brackets to enclose a sequence of rules. The brackets
6110 may be followed variously by nothing, a string of three
6111 dots, or a slash, to indicate the choices to be made
6112 from the rules. Note that rules may be used, not merely
6113 tokens.
6114
6115 If r1 through rn are a set of ©grammar rulesª, then
6116 {r1 | r2 | ... | rn}
6117 is a virtual production that allows a choice of exactly
6118 one of the rules. Similarly,
6119 {r1 | r2 | ... | rn}...
6120 is a virtual production that allows a choice of one or
6121 more of the rules. And, finally,
6122 {r1 | r2 | ... | rn}/...
6123 is a virtual production that allows a choice of one or
6124 more of the rules subject to the side condition that
6125 rules must alternate, that is, that no rule can follow
6126 itself immediately without the interposition of some
6127 other rule. This is a case that is not particularly
6128 easy to write by hand, but is quite useful in a number
6129 of contexts.
6130
6131 If the above virtual productions are written with []
6132 instead of {}, they all become optional. [] is an
6133 optional choice, []... is zero or more choices, and
6134 []/... is zero or more alternating choices.
6135
6136 Null productions are not permitted in virtual
6137 productions in those cases where they would cause an
6138 intrinsic ambiguity.
6139
6140 You may use a ©definitionª statement to assign a name to
6141 a virtual production.
6142 ##
6143
6144 Void token
6145
6146 "Void token, <token name>, used as parameter" is a
6147 ©warningª message which appears if AnaGram encounters a
6148 ©data typeª definition declaring a ©tokenª to have type
6149 void when the token has previously been used in a
6150 ©parameter assignmentª for a ©reduction procedureª. Your
6151 C or C++ compiler will complain when it tries to compile
6152 the call to the reduction procedure.
6153 ##
6154
6155 vs
6156
6157 vs is a field in a ©parser control blockª to which your
6158 ©error handlingª procedures and ©reduction procedureªs
6159 may refer. It is the ©parser value stackª for your
6160 parser. The ©semantic valuesª of the ©tokensª identified
6161 by the parser are stored in the value stack. The value
6162 stack, like the other ©parser stacksª, is indexed by
6163 ©PCBª.©ssxª. When you are executing a reduction
6164 procedure, PCB.vs[PCB.ssx] contains the semantic value
6165 of the first token in the grammar rule you are reducing,
6166 PCB.vs[PCB.ssx+1] contains the second, and so forth. The
6167 return value from your reduction procedure will be
6168 stored in turn in PCB.vs[PCB.ssx].
6169
6170 vs is defined to be of type $_vt, where "$" represents
6171 the name of your parser. AnaGram defines $_vt to
6172 be a union of fields of sizes corresponding to all the
6173 different data types declared in your syntax for the
6174 semantic values of your tokens. In order to avoid
6175 restrictions on the use of C++ classes, the fields are
6176 defined as character arrays. On some processors which
6177 have byte alignment restrictions for multibyte data,
6178 you might encounter a bus error. To correct this
6179 problem, set the ©parser stack alignmentª parameter to
6180 an appropriate data type.
6181 ##
6182
6183 Warning
6184
6185 If while analyzing your syntax file, AnaGram finds
6186 something suspicious, it is likely to issue a warning.
6187 The Warnings window will pop up automatically when the
6188 analysis has been completed. If the warning is for a
6189 ©syntax errorª in your input file, you will have to fix
6190 it, because AnaGram cannot successfully interpret it.
6191 Otherwise, AnaGram will be able to create a ©parserª for
6192 you, if you wish, no matter how serious the warnings may
6193 be.
6194
6195 You can bring up the Help topic associated with a highlighted warning
6196 by pressing F1 or by clicking with a ©Help Cursorª.
6197
6198 If you have syntax errors, AnaGram will synchronize the
6199 cursor in the ©syntax fileª window with the cursor in the
6200 Warnings window so that whenever the Warnings window is
6201 active, the cursor bar in the syntax file window will
6202 identify the location of the error.
6203
6204 ##
6205
6206 What's New
6207
6208 Changes in AnaGram 2.40
6209
6210 Most of the changes in AnaGram 2.40 are under the hood - cleanup of
6211 source files, reorganization of the source tree, revision of build and
6212 test procedures, and so forth, in preparation for the open source
6213 release. All of this will, with luck, be invisible to the end user.
6214
6215 Open Source
6216
6217 AnaGram is now ©open sourceª. AnaGram itself
6218 uses the 4-clause BSD ©licenseª; the ©parsing engineª, and thus the output
6219 files, are licensed with the less restrictive zlib ©licenseª. Source
6220 distributions are available from http://www.parsifalsoft.com.
6221
6222 The manual has been re-typeset using LaTeX instead of WordPerfect.
6223 The typographic consistency and formatting has been considerably
6224 improved; unfortunately, the pagination is now completely different,
6225 so page numbers are not portable to the new version.
6226
6227 All the logic dealing with registration, trial copies, serial numbers,
6228 and so forth has been removed.
6229
6230 Unix Support
6231
6232 The Unix build of the ©command line versionª of AnaGram (agcl) is now
6233 supported and available to the public. There is at present no GUI for
6234 the Unix version. The long-term goal is to migrate the AnaGram GUI
6235 away from the closed (and orphaned) IBM Visual Age class library to
6236 something else, probably GTK, so as to support both Windows and Unix.
6237
6238 Improved Functionality
6239
6240  Examples. The examples have been adjusted to the current dialect of
6241 C++ and are now compilable again. The legacy "classlib" code some
6242 still depend on is being phased out.
6243
6244 Increased Convenience
6245
6246  File names. File names in the AnaGram distribution and source
6247 tree are no longer limited to 8+3 characters, and quite a few now have
6248 less cryptic names. Additionally, all HTML files are now named ".html",
6249 not ".htm".
6250
6251  Installed files. The AnaGram.cgb and AnaGram.hlp files found in
6252 older releases of AnaGram no longer exist; their contents are compiled
6253 into the AnaGram executables instead.
6254
6255 Bug Fixes
6256
6257  Engine compiler error. The ©error_messageª field of the PCB has
6258 been changed to const char * so current C++ compilers will accept the
6259 code generated when ©diagnose errorsª is turned off.
6260
6261  Multiple output header files. Including more than one AnaGram
6262 output header file at once used to cause some compilers to issue a
6263 warning, because an #ifndef directive was checking the wrong
6264 symbol. This has been corrected.
6265
6266  Wrappers and error tokens. AnaGram 2.01 generated uncompilable
6267 code if you tried to use the ©wrapperª feature and error token
6268 resynchronization at the same time. This has been corrected.
6269
6270  More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain
6271 problems with large keyword tables, but in the process introduced
6272 another, which is now fixed.
6273
6274 For changes in the previous versions of AnaGram, see ©What's New in AnaGram
6275 2.01ª and ©What's New in AnaGram 2.0ª.
6276
6277 ##
6278
6279 What's New in AnaGram 2.01
6280
6281 Changes in AnaGram 2.01
6282
6283 Improved Functionality
6284
6285  Improved support for building ©thread safe parsersª. All
6286 nonconstant parser data previously declared as static variables has been
6287 moved to the ©parser control blockª. When the ©reentrant parserª switch
6288 is set, all references to the parser control block are passed to functions
6289 via calling sequences. The ©extend pcbª switch provides a mechanism to
6290 add user-defined variables to the parser control block.
6291
6292  Improved support for C++ parsers. The ©wrapperª statement
6293 provides C++ wrapper classes for objects to be stored on the ©parser value stackª.
6294 The ©PCB_TYPEª macro allows you to derive a C++ class from the parser control
6295 block and to access its members from your ©reduction proceduresª.
6296
6297  Support for the ©ISO Latin 1ª character set. When using
6298 the ©case sensitiveª switch, case conversion is performed for all ISO-Latin-1
6299 characters, not just those in the ASCII range.
6300
6301  Improved support for error diagnostics. It is now possible for users
6302 to provide their own text for the error messages created by the ©diagnose errorsª
6303 switch. In addition, the ©token namesª table option now includes ascii representation
6304 of individual characters and keywords instead of only named tokens. The ©token names
6305 onlyª switch can be used for compatibility with previous versions of AnaGram
6306
6307  More precise determination of error context. The tables used by the ©error frameª
6308 option to provide the context of a syntax error have been reworked and now provide
6309 a substantially more precise localization of the error.
6310
6311 Improved error diagnostics in AnaGram
6312
6313  ©Missing reduction procedureª diagnostic.
6314 In addition to warning that there is a ©parameter assignmentª
6315 without a ©reduction procedureª, this
6316 diagnostic is now provided if the ©default reduction valueª
6317 does not have the same ©data typeª as the ©reduction tokenª.
6318
6319  ©Command line versionª. Diagnostics have been reformatted so
6320 they can be recognized by the Microsoft Visual C++ IDE.
6321
6322  Refined ©keyword anomalyª diagnostics. There should
6323 now be fewer false alarms.
6324
6325 Increased Convenience
6326
6327  ©File Traceª. If your grammar uses ©semantically determined productionsª,
6328 the File Trace feature will now remember the choices you have
6329 made for ©reduction tokenªs, so that you do not have to make
6330 the same choices over and over again as you work with an example.
6331
6332  File Paths. The file paths in the #line directives created by the ©line numbersª
6333 switch now use forward slashes instead of backslashes.
6334
6335 Changed Defaults
6336
6337  ©Parser stack alignmentª. Now defaults to long instead of int.
6338  ©Parser stack sizeª. Now defaults to 128 instead of 32.
6339
6340 Bug Fixes
6341
6342  Interaction between context tracking and error token. In previous
6343 versions of AnaGram, if the first token in a rule was the ©error tokenª,
6344 the value of ©CONTEXTª was the value that corresponded to the location
6345 of the error. CONTEXT now correctly shows the context at which the
6346 aborted rule began. For instance, in the following example, if a
6347 syntax error is encountered while parsing the expression, the error
6348 rule will skip over remaining characters to the terminating semicolon.
6349 When invoked from handleError(), the CONTEXT macro will return the
6350 context as it was at the beginning of the expression.
6351 expression statement
6352 -> expression, ';'
6353 -> error, ~(eof + ';')?..., ';' =handleError();
6354
6355  ©Distinguish lexemesª. Several minor bugs in the implementation of distinguish lexemes have been
6356 corrected.
6357
6358  Set partition logic. Corrected problems in the interaction between the set ©partitionª logic
6359 and the implementation of the ©disregardª statement.
6360
6361  Table size. Fixed a data sizing problem which occurred when one particular parse table
6362 had precisely 256 entries.
6363
6364  Keyword recognition. Fixed a problem that could cause difficulties with ©keywordª
6365 recognition when the ©case sensitiveª switch was turned off.
6366
6367  Default conflict resolution. With unresolved ©shift-reduce conflictªs, the shift case was
6368 not always being selected. This problem has been corrected.
6369
6370  Lockup. It was possible to write an erroneous grammar that would cause
6371 AnaGram to lock up. This problem has been corrected.
6372
6373  Potential bus error. The error diagnostic funtion created by the ©diagnose errorsª
6374 switch, could, under some circumstances, access an uninitialized value
6375 on the ©parser value stackª. This problem has been corrected.
6376
6377  Internal errors. Fixed a number of minor bugs which could cause ©internal errorªs
6378 while running ©File Traceª.
6379
6380 For changes in the previous version of AnaGram, see ©What's New in AnaGram 2.0ª.
6381 ##
6382
6383 What's New in AnaGram 2.0
6384
6385 AnaGram's user interface has been completely revamped to make it more
6386 convenient and easier to use. However, the same tried and true AnaGram
6387 algorithms are still in place to build your parsers. The rules for
6388 syntax files are also unchanged.
6389
6390 The ©File Traceª and ©Grammar Traceª facilities have each had their
6391 windows combined into a single unit, and a ©Rule Stackª synched with
6392 these windows and with your syntax file window has been added. The
6393 Rule Stack is particularly convenient for relating the progress of the
6394 parse to the ©grammar rulesª in your ©syntax fileª.
6395
6396 A ©text entryª field has also been added to the Grammar Trace. This
6397 means you can provide character input to your parser in much the same
6398 way you can with a ©test fileª in File Trace, but with instant control
6399 over the input.
6400
6401 Some further controls have been added to both File and Grammar Traces.
6402 In particular there is a Reset button to reset the trace to its initial
6403 state. This is particularly useful for ©Conflict Traceªs.
6404
6405 AnaGram now has a small ©Control Panelª (default position is at the
6406 upper right of the screen) from which you can conveniently control
6407 operation. A menu bar provides access to the various commands and
6408 tables. There are toolbar buttons for Analyze Grammar, Build Parser,
6409 File Trace, and so on. The panel also has a data entry field for
6410 entering search keys.
6411
6412 You can set both colors and fonts in AnaGram windows to suit your own
6413 preferences. We suggest you check Help for ©Colorsª or ©Fontsª before
6414 making changes to make sure that all information will still be properly
6415 displayed.
6416
6417 AnaGram's ©Helpª has been updated to provide hypertext-type links. But
6418 you can still keep multiple Help windows on view at once. A popup menu
6419 shows all the links in a window. New topics have been added. Also,
6420 further documentation topics are provided in HTML format in the html
6421 subdirectory.
6422
6423 A ©Help Cursorª on the Control Panel toolbar can be used to get help for
6424 most AnaGram windows, buttons and menu items. F1 can also be used.
6425
6426 On the ©Action Menuª you will find a list of your most recently used
6427 syntax files. Just click on the file of your choice to have AnaGram
6428 analyze it (or build it if ©Autobuildª is on).
6429 ##
6430
6431 White Space
6432
6433 In many grammars it is desirable to pass over blanks,
6434 tabs, and similar characters, as well as comments,
6435 collectively termed "white space", as though they were
6436 not there. The "©disregardª" statement in AnaGram may
6437 be optionally used to accomplish this. The "©lexemeª"
6438 statement may be used to exercise fine control over the
6439 scope of the disregard statement.
6440 ##
6441
6442 Wrapper
6443
6444 The wrapper ©attribute statementª provides correct handling of C++
6445 objects returned by ©reduction procedureªs.
6446
6447 If you specify a wrapper for a C++ object, then, when a reduction
6448 procedure returns an instance of the object, a copy of the object will
6449 be constructed on the ©parser value stackª and the destructor will be
6450 called when the object is removed from the stack.
6451
6452 Without a wrapper, objects are stored on the value stack simply
6453 by coercing the stack pointer to the appropriate type.
6454 There is no constructor call when the object is stored nor
6455 a destructor call when it is removed from the stack.
6456
6457 Classes which use reference counts or otherwise overload the
6458 assignment operator should always have wrappers in order to
6459 function correctly.
6460
6461 Wrapper statements, like other ©attribute statementsª, must appear in
6462 configuration sections. The syntax is simply
6463 wrapper { <comma delimited list of data types> }
6464
6465 For example:
6466 [
6467 wrapper {CString, CFont}
6468 ]
6469
6470 You cannot specify a wrapper for the ©default token typeª.
6471
6472 If your parser exits with an error condition, there may be
6473 objects remaining on the stack. The ©DELETE_WRAPPERSª macro
6474 may be used to delete these objects. If you have enabled
6475 ©auto resynchª, DELETE_WRAPPERS will be invoked automatically.
6476
6477 The ©AG_PLACEMENT_DELETE_REQUIREDª macro is used to control
6478 definition of a "placement delete" operator in the wrapper
6479 class AnaGram defines.
6480 ##
6481
6482 Zero Length
6483
6484 A zero length ©tokenª is a ©reduction tokenª which can
6485 be matched by a void, i.e. by nothing at all. It
6486 represents an optional item, or a sequence of optional
6487 items, in the input. Since the matching process can
6488 involve several levels of reductions, it is most precise
6489 to use the following recursive definition: A zero length
6490 token is one which either has at least one ©null
6491 productionª or has at least one grammar rule defining it
6492 such that all the tokens in the rule are zero length
6493 tokens.
6494
6495 Care should be taken when using ©zero lengthª tokens in
6496 ©recursive ruleªs. If all the tokens in the rule other than
6497 the recursive token itself are zero length tokens
6498 the rule will generate an infinite loop in the generated
6499 parser.
6500
6501 The ©Token Tableª identifies zero length tokens because
6502 the use of such tokens sometimes inadvertently causes
6503 ©conflictªs.
6504 ##
6505
6506 Control Panel
6507
6508 The AnaGram Control Panel appears at the upper right of your monitor
6509 when you start AnaGram. It has a menu bar, command buttons, a button
6510 which enables a ©help cursorª, and a ©status indicatorª. At the lower
6511 left you will see a data entry field for entering ©searchª
6512 keys, with neighboring search forward and search backward buttons.
6513
6514 Notice that the ©Options Menuª has a "Stay On Top" entry which
6515 allows you to specify whether the Control Panel stays on top of
6516 other AnaGram windows.
6517 ##
6518
6519 Status Indicator
6520
6521 The status indicator at the right of the AnaGram
6522 Control Panel shows the status of the ©current grammarª:
6523 Ready
6524 Loaded
6525 Error
6526 Parsed
6527 Analyzed
6528 Built
6529
6530 "Ready" appears only when no grammar has been selected.
6531
6532 "Loaded" and "Parsed" are normally transitory.
6533
6534 "Error" means at least one syntax error has been detected
6535 in your grammar and AnaGram cannot continue. Check the
6536 Warnings window to determine the nature of the problem.
6537
6538 "Analyzed" means that a ©grammar analysisª has been
6539 completed, but no ©output filesª have been written.
6540
6541 "Built" means that an analysis has been completed and
6542 output files have been written.
6543 ##
6544
6545 Help Cursor
6546
6547 The Help Cursor is accessed via the button with the question mark on
6548 AnaGram's ©Control Panelª. It is convenient for getting help on
6549 ©Warningªs, browse tables, menu items and so on.
6550
6551 If you click on the button you enable the Help Cursor, which you can
6552 then drag with the mouse. A further mouse click will provide help
6553 for the item underneath the cursor.
6554
6555 Note further that AnaGram also has F1 help which you may find
6556 simpler and faster than the Help Cursor.
6557 ##
6558
6559 Search
6560
6561 AnaGram has a simple search facility to let you search for text strings
6562 in AnaGram windows. A data entry field on the ©Control Panelª is
6563 provided for you to enter text. Left-clicking on the neighboring
6564 buttons lets you search either forward or backward for a line in the
6565 active window which contains at least one instance of the text.
6566
6567 Note that the search begins at the next line after the highlighted line
6568 for forward search; at the line preceding the highlighted line for
6569 backward search.
6570 ##
6571
6572 Search Key
6573
6574 To find a text string in an AnaGram window, enter the
6575 string in the Search Key field in the ©Control Panelª
6576 and press Enter.
6577
6578 To find another instance of the string click on the
6579 ©Find Nextª button or press F3.
6580
6581 To find a previous instance of the string click on
6582 the ©Find Previousª button or press F4.
6583
6584 In windows that have a cursor bar, a forward search
6585 begins on the line following the cursor and a backward
6586 search begins on the line preceding the cursor.
6587 ##
6588
6589 Find Next
6590
6591 The Find Next key, on the ©Control Panelª immediately
6592 to the right of the ©Search Keyª field, locates
6593 the next instance of the search key in the most recently
6594 active AnaGram window. F3 is the keyboard equivalent.
6595 ##
6596
6597 Find Previous
6598
6599 The Find Previous key, on the ©Control Panelª immediately
6600 to the right of the ©Find Nextª key, searches
6601 backwards for the search key in the most recently
6602 active AnaGram window. F4 is the keyboard equivalent.
6603 ##
6604
6605 Fonts, Set Fonts
6606
6607 The Set Fonts dialog allows you to use the fonts of your choice in
6608 AnaGram windows. You should make sure that the ©marked tokenªs font is
6609 very distinctive so that marked tokens will show up clearly even if
6610 they are only 1 or 2 characters long. Sometimes it is helpful to use an
6611 underlined font for marked tokens.
6612
6613 A Default button at the bottom of the dialog lets you revert to
6614 AnaGram's original fonts if you wish.
6615 ##
6616
6617 Colors, Set Colors
6618
6619 The Set Colors dialog allows you change the colors of
6620 AnaGram windows. Notice that in the ©File Traceª the ©test file paneª
6621 requires three different sets of text and background colors. You
6622 should make sure that the backgrounds, at least, can be easily
6623 distinguished from each other so the trace information can be
6624 properly displayed. You also want to take care that an active pane in
6625 a File Trace or Grammar Trace can be distinguished from inactive
6626 panes.
6627
6628 The Default button at the bottom of the dialog lets you revert to
6629 AnaGram's original colors if you wish.
6630
6631 Color changes pertain only to the client areas of AnaGram windows. The
6632 remaining parts of your windows will have the customary colors you have
6633 chosen for your system.
6634 ##
6635
6636 Marked Token
6637
6638 Some tables and trace panes display each rule with one token marked to
6639 show how far parsing has progressed in the rule. The marked token is
6640 the next input expected in the input stream. It is shown in a different
6641 font to distinguish it from other tokens in the rule. If no token is
6642 marked, the rule is a ©completed ruleª, i.e. it has been completely
6643 matched and will be reduced by the next input.
6644
6645 You can set the font for marked tokens by choosing Fonts from the
6646 ©Options Menuª. You should make sure that the font is very distinctive so
6647 that marked tokens will show up clearly even if they are only 1 or 2
6648 characters long. Sometimes it is helpful to use an underlined font for
6649 marked tokens.
6650 ##
6651
6652 Synch Parse
6653
6654 The Synch Parse button replaces the ©Single Stepª button on the
6655 toolbar of the ©File Trace windowª when, for some reason, the
6656 location of the blinking cursor in the ©test file paneª differs from
6657 the current parse position. This can occur when you single click in
6658 the test file pane or when the parse cannot track the cursor because
6659 of a ©syntax errorª or a ©semantically determined productionª.
6660
6661 Click the synch parse button to resynch the parse with the cursor.
6662 ##
6663
6664
6665 Single Step
6666
6667 The Single Step button is one of the control buttons for the ©File
6668 Traceª and ©Grammar Traceª. It advances the parse one ©parser
6669 actionª at a time. In the File Trace, it is replaced with the "©Synch
6670 Parseª" button whenever the blinking cursor loses synch with
6671 the current parse location.
6672
6673 In the Grammar Trace, the Single Step button takes its input from the
6674 Allowable Input pane, the Reduction Choices pane, or the ©text entryª
6675 field, depending on which is active.
6676 ##
6677
6678 Proceed
6679
6680 The Proceed button is one of the control buttons for the
6681 ©Grammar Traceª. If the ©Reduction Choices paneª or the ©Allowable
6682 Input paneª is active, Proceed parses the highlighted token
6683 until it is shifted in to the ©parser stackª. If the ©text entryª
6684 field is active, Proceed parses all text in the field. If a
6685 ©syntax errorª is encountered, the parse stops and all ©reduce
6686 actionªs are undone.
6687
6688 Note that selecting a token in Allowable Input can cause a syntax
6689 error under certain circumstances. This can happen only if the
6690 following conditions are all true:
6691  the indicated operation is a ©reductionª,
6692  the reduction token for the rule being reduced has been used in several
6693 different contexts in the grammar
6694  and the specified token may
6695 follow it in some contexts and not in others.
6696 ##
6697
6698 Reduction Choices Pane
6699
6700 The ©File Traceª and ©Grammar Traceª display a Reduction Choices
6701 pane when they need to reduce a ©semantically determined productionª.
6702
6703 The rule to be reduced is highlighted in the ©rule stack paneª.
6704 If the ©syntax fileª window is visible, it shows the rule in
6705 context in your grammar.
6706
6707 The Reduction Choices pane lists all possible ©reduction tokenªs for
6708 the specified rule. The first reduction token that is admissible in
6709 the current context is highlighted and it appears
6710 as the ©lookahead tokenª in the ©parser stack paneª. The text that
6711 comprises the entire rule is highlighted in the ©test file paneª.
6712
6713 Select the desired reduction token before continuing with the parse.
6714
6715 If you select a token and it does not appear as the lookahead token,
6716 it is not syntactically correct in the current context. If you try
6717 to proceed with the parse, you will get a ©selection errorª.
6718 ##
6719
6720 Selection Error
6721
6722 The ©Parse Statusª field indicates a "selection error" if you
6723 choose a ©reduction tokenª from the ©Reduction Choices paneª of
6724 a ©File Traceª or ©Grammar Traceª and the selected token is not
6725 syntactically correct in the current context.
6726 ##
6727
6728 Parser Stack Pane
6729
6730 The Parser Stack pane, the upper left pane of the ©File Traceª and
6731 ©Grammar Traceª windows, displays the ©parser stackª for the current
6732 trace.
6733
6734 Each line corresponds to one level in the parser state stack. It shows
6735 the stack index, the ©parser stateª for that level, and the ©tokenª which
6736 was seen at that state. The last line of the stack, the ©lookahead
6737 lineª, corresponds to the current state of the parser. Since no input
6738 has yet been processed for this state, the token, if any, which
6739 appears at this level is a ©lookahead tokenª.
6740
6741 If you move the cursor in the Parser Stack pane of a File Trace,
6742 the text that makes up the selected token will be
6743 highlighted in the ©Test File paneª. You can back the parse up to
6744 any desired stack level by double clicking at the beginning of the
6745 token text in the Test File pane.
6746
6747 Similarly, if you move the cursor bar in the Parser Stack pane of a
6748 Grammar Trace, the ©Allowable Input paneª will change to display the
6749 allowable tokens in the selected state. The previously
6750 selected token will be highlighted. Then, double click on any token in
6751 the Allowable Input pane to back the parse up and choose a token
6752 a second time.
6753
6754 The ©Rule Stack paneª of the File or Grammar Trace is also synched
6755 to the Parser Stack pane. If the ©syntax fileª window is visible, it
6756 will be synched to show the rule currently selected in the rule
6757 stack pane. Note that rules that have been automatically generated
6758 by the expansion of ©virtual productionsª cannot be synched, so the
6759 top line of the syntax file will be highlighted instead.
6760
6761 In the Grammar Trace, the last line of the Parser Stack may or may not
6762 display a ©lookahead tokenª, depending on the last ©parser actionª
6763 performed. If input was taken from Allowable Input and the last
6764 action was a simple ©reduce actionª, the last input token selected
6765 will be displayed as the lookahead input. But if the last action
6766 performed shifted the token in, the lookahead field will be empty.
6767
6768 If you right-click on a highlighted line in the Parser Stack pane, you will
6769 get a pop-up menu to give you more information. In particular you can
6770 get an ©Auxiliary Traceª starting at the current point in your File or
6771 Grammar Trace, so you can explore various possibilities without losing
6772 your position in the old trace.
6773 ##
6774
6775 Exit
6776
6777 Select this entry from the ©Action Menuª to terminate AnaGram.
6778 ##
6779
6780 Allowable Input, Allowable Input Pane
6781
6782 The upper right pane of the ©Grammar Traceª window lists the
6783 allowable input tokens for the current state of the ©grammarª.
6784
6785 The tokens in the Allowable Input pane are listed in two groups:
6786 first, the ©terminal tokensª allowable in this state, and
6787 second, the ©nonterminal tokensª. Between these two groups of tokens
6788 is inserted a line which is either an option for a ©default reductionª,
6789 or declares that there is no default action.
6790
6791 Double click, press Enter, or click the ©Proceedª button to
6792 parse the highlighted token. When all parse actions triggered
6793 by the highlighted token have been completed, all panes of the trace
6794 will be redrawn to show the new state of the parser.
6795
6796 Note that selecting a token in Allowable Input can cause a syntax
6797 error under certain circumstances. This can happen only if the
6798 following conditions are all true:
6799  the indicated operation is a ©reductionª,
6800  the reduction token for the rule being reduced has been used in several
6801 different contexts in the grammar
6802  and the specified token may
6803 follow it in some contexts and not in others.
6804
6805 If you wish to see the results of a single parser action, click
6806 on the ©single stepª button. The parser will perform a single
6807 parser action. If the
6808 token you selected was not shifted in, it will now be displayed
6809 as the ©lookahead tokenª on the last line, the ©lookahead lineª in
6810 the ©Parser Stack paneª, and will be preselected in the Allowable
6811 Input pane.
6812
6813 Because AnaGram, by default, uses a number of compound
6814 parser actions, this situation does not arise very often unless you
6815 have set the ©traditional engineª switch or reset the ©default
6816 reductionsª switch. Usually you will want to select the same token to
6817 proceed, but it is not necessary.
6818
6819 The Allowable Input pane also displays
6820 the ©parser actionª associated with a specific token. If it is
6821 not a ©compound actionª, the action and its result are also shown.
6822
6823 The ©parser actionª field for a token may be interpreted as follows: If
6824 this token would cause a shift to a new state, the action field is ">>"
6825 followed by the new state number. If the token would cause a
6826 ©reductionª, the action field is "<<" followed by a ©rule numberª to
6827 show the rule reduced. If the parser action is a compound action, the
6828 action field is blank. If the token would cause the grammar to be
6829 accepted, the action field is "Accept".
6830
6831
6832 The ©text entryª field at the bottom of the Grammar Trace can be
6833 used as a convenient alternative to the Allowable Input pane. It
6834 accepts characters rather than tokens. Most non-printing characters
6835 such as newline are only available from Allowable Input.
6836 ##
6837
6838 Copy
6839
6840 The Copy command on the ©Windows Menuª copies the currently active
6841 table or Help topic to the clipboard.
6842 ##
6843
6844 Statistical Summary
6845
6846 While your grammar is being analyzed, a Statistical Summary window
6847 pops up to show you the progress of the analysis. Unless you have
6848 turned off ©Show Statisticsª on the ©Options Menuª, this window will remain
6849 on-screen for your reference. Among other things, it shows you the
6850 number of rules and states in your grammar, and the number of conflicts
6851 and warnings, if any.
6852
6853 Note that if your grammar is small and you have Show Statistics turned
6854 off, the appearance of this window on your monitor may be exceedingly
6855 brief - you may just see a flash.
6856
6857 If the window is turned off or you have closed it, you can get it from
6858 the ©Browse Menuª.
6859 ##
6860
6861 Stay On Top
6862
6863 The Stay On Top entry in the ©Options Menuª allows you to specify whether
6864 the ©Control Panelª stays on top of other AnaGram windows.
6865 ##
6866
6867 Show Syntax
6868
6869 If this entry in the ©Options Menuª is checked, AnaGram will display the
6870 ©syntax fileª when it has analyzed your ©grammarª. If this entry is not checked
6871 or you have closed the syntax file window, you can select the window
6872 from the ©Browse Menuª.
6873 ##
6874
6875 Show Statistics
6876
6877 If this entry in the ©Options Menuª is checked, AnaGram will leave the
6878 ©Statistical Summaryª on the screen after it has analyzed your ©grammarª. If
6879 this entry is not checked or you have closed the Statistical Summary
6880 window, you can select the window from the ©Browse Menuª.
6881 ##
6882
6883 About AnaGram
6884
6885 Select this entry from the ©Help Menuª to find out the version and
6886 serial numbers of your copy of AnaGram, and how to contact Parsifal
6887 Software.
6888 ##
6889
6890 Help Topics
6891
6892 Select Help Topics from the ©Help Menuª to get a complete list of AnaGram
6893 Help Topics titles. You can bring up the window for a highlighted topic
6894 by double-clicking with the left mouse button, pressing F1, or using
6895 the ©Help Cursorª.
6896 ##
6897
6898 Cascade Windows
6899
6900 Select this entry from the ©Windows Menuª to cascade your open windows
6901 starting at top left of the screen.
6902 ##
6903
6904 Close Windows
6905
6906 Select this entry from the ©Windows Menuª to close all open windows
6907 except the ©Control Panelª. You may also close the active window
6908 by pressing the Escape key.
6909 ##
6910
6911 Hide Windows
6912
6913 Select this entry from the ©Windows Menuª to hide all open windows
6914 except the ©Control Panelª. Restore them to the screen with ©Restore
6915 Windowsª
6916 ##
6917
6918 Restore Windows
6919
6920 Use this command on the ©Windows Menuª to restore to the screen
6921 any windows you have previously hidden with ©Hide Windowsª.
6922 ##
6923
6924 Token Input, Preprocessor, Lexical Scanner
6925
6926 AnaGram makes it unnecessary, in most cases, to have a separate
6927 preprocessor to provide the ©tokensª which are fed to your parser.
6928
6929 However in some cases you may want to use a preprocessor, or lexical
6930 scanner, to provide input to your parser. The preprocessor may
6931 or may not be written in AnaGram. If it sends the parser token
6932 numbers, as opposed to character codes, this is referred to as token
6933 input, as opposed to character input. Please refer to the AnaGram
6934 User's Guide for information on identifying the tokens to the parser
6935 and providing their semantic values, if any.
6936
6937 Since a ©File Traceª is based on character codes, it will be greyed out
6938 on the ©Action Menuª if you have token input. For a ©Grammar Traceª,
6939 entering characters in the ©text entryª field is not appropriate and
6940 will simply cause a syntax error.
6941 ##
6942
6943 Lookahead Line
6944
6945 The last line of the ©Parser Stack paneª, the "lookahead" line,
6946 will sometimes show a ©lookahead
6947 tokenª, and sometimes not. In a ©File Traceª, you will always see a
6948 lookahead token because it is available from the ©test fileª.
6949
6950 In a ©Grammar Traceª you will usually see a lookahead token only when
6951 you have used the ©Single Stepª button or if there is available
6952 input in the ©text entryª field. In the latter case the token
6953 corresponding to the first character of the input will appear on the
6954 lookahead line.
6955
6956 If you click Single Step after selecting a token from ©Allowable
6957 Inputª and it causes only a simple ©reduce actionª (as opposed to a
6958 shift or a compound action), then, upon completion of the reduction,
6959 the token you selected will appear on the lookahead line and also
6960 will be preselected in Allowable Input.
6961
6962 Usually you would select
6963 this token for the next parse step. However, if there are other
6964 possible inputs in this state, the parse theoretically could have
6965 arrived at this state by a different sequence of input tokens. Thus,
6966 if you are more interested in the behavior of the parser at this
6967 state than in the response of the parser to a particular sequence of
6968 inputs, it is perfectly valid to select a different input token, and
6969 AnaGram will let you do it.
6970
6971 Note that if you have enabled the ©traditional engineª switch or
6972 disabled the ©default reductionsª switch, the
6973 probability of finding a token which does a simple reduction is
6974 noticeably higher than otherwise.
6975 ##
6976
6977 Action Menu
6978
6979 The Action menu begins with the ©Analyze Grammarª and ©Build Parserª
6980 commands. If a grammar has already been analyzed, but not yet built,
6981 there will also be an extra Build command bearing the name of your
6982 syntax file.
6983
6984 There are also ©Reanalyzeª and ©Rebuildª commands which are
6985 initially greyed out. They become available if you change the
6986 current syntax file.
6987
6988 The next section has ©File Traceª and ©Grammar Traceª
6989 commands. If you have enabled the ©Error Traceª
6990 ©configuration switchª, this section also shows an
6991 Error Trace command.
6992
6993 The menu ends with an ©Exitª command
6994 and a list of recently used syntax files, if any. Just
6995 click on a syntax file name to have AnaGram analyze it, or
6996 build it if the ©Autobuildª option is on.
6997 ##
6998
6999 Browse Menu
7000
7001 Initially, the Browse Menu shows only a single entry:
7002 ©Configuration Parametersª which lets you see the
7003 current state of configuration parameters before any
7004 may have been set by your syntax file. Once you have
7005 analyzed a grammar, this menu fills up with many tables
7006 containing information about your grammar. You can also
7007 bring up a window showing your ©syntax fileª from this menu.
7008 If your grammar has generated ©syntax errorªs or warnings, or
7009 contains conflicts, there will be ©Warningªs or ©Conflictªs
7010 entries.
7011 ##
7012
7013 Options Menu
7014
7015 From this menu you can select a ©Fontsª or ©Colorsª dialog so you can
7016 set AnaGram's fonts and colors to suit your own tastes. You can set
7017 ©Autobuildª if you want AnaGram to automatically build your ©grammarª
7018 when you select a ©syntax fileª from the ©Action Menuª. You can also
7019 choose whether or not to automatically show the ©Statistical Summaryª
7020 window or your syntax file window when you open a grammar, or make
7021 the ©Control Panelª stay on top of other AnaGram windows.
7022 ##
7023
7024 Windows Menu
7025
7026 The Windows menu lets you cascade, close, or hide all AnaGram
7027 windows except the ©Control Panelª, or restore them if they
7028 have been hidden. It also has a list of open windows (even
7029 if hidden) so you can select the one you want. The Copy command will
7030 copy most windows to the clipboard.
7031 ##
7032
7033 Help Menu
7034
7035 The Help Menu has the following entries:
7036
7037 ©Getting Startedª provides a brief description of AnaGram and
7038 introductory suggestions.
7039
7040 ©Help Topicsª brings up a list of all help topics.
7041
7042 ©Using Helpª tells you how to use AnaGram's help facilities.
7043
7044 ©What's Newª has information on new features of this version of AnaGram.
7045
7046 ©About AnaGramª tells you what version of AnaGram you are using, and also
7047 provides contact information for Parsifal Software.
7048 ##
7049
7050 Autobuild
7051
7052 When Autobuild (©Options Menuª) is checked, selecting a file
7053 from the list of most recently used files on the ©Action Menuª
7054 invokes the ©Build Parserª command. Otherwise, the ©Analyze
7055 Grammarª command is invoked.
7056 ##
7057
7058 Reanalyze, Rebuild
7059
7060 Reanalyze and Rebuild commands on the ©Action Menuª are
7061 initially greyed out.
7062
7063 Reanalyze becomes available if
7064 you have a syntax file currently analyzed or built
7065 in AnaGram and change it while AnaGram is still running.
7066
7067 Rebuild becomes available if
7068 you have a syntax file currently built
7069 and change it while AnaGram is still running.
7070 ##
7071
7072 Percent Sign
7073
7074 The percent sign ( % ) is used to mark certain tokens in your grammar
7075 which AnaGram must redefine in order to implement the ©disregardª
7076 statement. If you have used this statement in your grammar, You will
7077 probably notice the percent sign appearing in some windows and traces.
7078
7079 The percent sign indicates the original token, without the optional
7080 white space attached. Early versions of AnaGram used the degree sign
7081 instead, but this character is not generally available in Windows.
7082 ##
7083
7084 Program Development
7085
7086 The first step in writing a program is to write a ©grammarª in
7087 AnaGram notation which describes the input the program expects.
7088
7089 The file containing the grammar, called the ©syntax fileª, should
7090 have the extension ".syn". You could also make up a few sample input
7091 files at this time, but it is not necessary to write ©reduction
7092 procedureªs at this stage.
7093
7094 Run AnaGram and use the ©Analyze Grammarª command to create parse
7095 tables. If there are ©syntax errorsª in the grammar at this point,
7096 you will have to correct them before proceeding, but you do not
7097 necessarily have to eliminate ©conflictsª, if there are any, at this
7098 time. There are, however, many aids available to help you with
7099 conflicts. These aids are described in the AnaGram User's Guide, and
7100 somewhat more briefly in the Online Help topics.
7101
7102 Once syntax errors are corrected, you can try out your grammar on the
7103 sample input files using the ©File Traceª facility.
7104 With File Trace, you can see interactively just how your grammar
7105 operates on your test files. You can also use ©Grammar Traceª to
7106 answer "what if" questions concerning input to the grammar. The
7107 Grammar Trace does not use a test file, but rather allows you to make
7108 input choices interactively.
7109
7110 At any time, you can write ©reduction procedureªs to process your
7111 input data as its components are identified in the input stream. Each
7112 procedure is associated with a ©grammar ruleª. The reduction
7113 procedures will be incorporated into your parser when you create it
7114 with the ©Build Parserª command.
7115
7116 By default, unless you specify an input procedure, ©parser inputª
7117 will be read from stdin, using the default ©GET_INPUTª macro.
7118 You will probably wish to redefine GET_INPUT, or configure your
7119 parser to use ©pointer inputª or ©event drivenª input.
7120 ##
7121
7122 License, Copyright, Copying, Open Source, Warranty, No Warranty
7123
7124 AnaGram, A System for Syntax Directed Programming
7125
7126 Copyright 1993-2002 Parsifal Software
7127
7128 Copyright 2006, 2007 David A. Holland
7129
7130 All Rights Reserved.
7131
7132 AnaGram itself is released to the public under the traditional 4-clause BSD
7133 license:
7134
7135 Redistribution and use in source and binary forms, with or without
7136 modification, are permitted provided that the following conditions are
7137 met:
7138
7139 1. Redistributions of source code must retain the above copyright notice,
7140 this list of conditions and the following disclaimer.
7141
7142 2. Redistributions in binary form must reproduce the above copyright
7143 notice, this list of conditions and the following disclaimer in the
7144 documentation and/or other materials provided with the distribution.
7145
7146 3. All advertising materials mentioning features or use of this software
7147 must display the following acknowledgement:
7148 This product includes software developed by Parsifal Software,
7149 Jerome T. Holland, and their contributors.
7150
7151 4. Neither the name of Parsifal Software nor the name of Jerome T.
7152 Holland nor the names of their contributors may be used to endorse or
7153 promote products derived from this software without specific prior written
7154 permission.
7155
7156 THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE,
7157 JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY
7158 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
7159 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
7160 AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
7161 IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T.
7162 HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
7163 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
7164 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
7165 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
7166 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
7167 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
7168 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
7169 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
7170 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
7171 POSSIBILITY OF SUCH DAMAGE.
7172
7173 The AnaGram ©parsing engineª, that is, the code that is emitted by
7174 AnaGram and incorporated into programs developed using AnaGram, uses
7175 this less restrictive zlib-style license:
7176
7177 This software is provided 'as-is', without any express or implied warranty.
7178 In no event will the authors be held liable for any damages arising from
7179 the use of this software.
7180
7181 Permission is granted to anyone to use this software for any purpose,
7182 including commercial applications, and to alter it and redistribute it
7183 freely, subject to the following restrictions:
7184
7185 1. The origin of this software must not be misrepresented; you must not
7186 claim that you wrote the original software. If you use this software in a
7187 product, an acknowledgment in the product documentation would be
7188 appreciated but is not required.
7189
7190 2. Altered source versions must be plainly marked as such, and must not
7191 be misrepresented as being the original software.
7192
7193 3. This notice may not be removed or altered from any source distribution.
7194
7195 ##