Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/manual/cfp.tex @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 \chapter{Configuration Parameters} | |
2 \index{Configuration parameters}\index{Parameters} | |
3 | |
4 \agterm{Configuration parameters} are named constants that control the | |
5 way AnaGram works. AnaGram ignores case\index{Case sensitivity} when | |
6 it looks up the names of configuration parameters, so that | |
7 \agcode{parser name} and \agcode{Parser Name} both refer to the same | |
8 parameter. Configuration parameters that have only true/false or | |
9 on/off values are often referred to as | |
10 \index{Configuration switches}\agterm{configuration switches}. | |
11 | |
12 Configuration parameters are used to control: | |
13 | |
14 \begin{itemize} | |
15 \item Comment nesting | |
16 \item Grammar analysis | |
17 \item Parser generation | |
18 \end{itemize} | |
19 | |
20 Every configuration parameter has a default value which has been | |
21 chosen to correspond to a standard if it exists, customary usage if | |
22 such can be determined, or otherwise to the most likely usage. | |
23 | |
24 Configuration parameters may be specified either in | |
25 \index{Configuration file}\index{File}\agparam{configuration files}, | |
26 always named \agfile{AnaGram.cfg}, or in a syntax file. A | |
27 configuration file is a normal ASCII file containing parameter | |
28 specifications. The syntax of a configuration file is the same as | |
29 that of a configuration segment within a syntax file, except that a | |
30 configuration file does not have the brackets ( \agcode{[ ]} ) that | |
31 enclose a configuration segment in a syntax file. You may comment the | |
32 configuration file freely, just as though it were a syntax file. | |
33 % XXX ``configuration segment'' is a forward reference and we should | |
34 % rearrange all this so it isn't. Also, the forward reference is | |
35 % ``configuration section''. Sigh. | |
36 | |
37 % Parameters can be set in either a configuration file or in your syntax | |
38 % file. | |
39 Apart from the \agparam{nest comments} switch, if a parameter | |
40 is specified more than once, only the last value is used (see below). | |
41 The \agparam{nest comments} switch, which affects the way AnaGram | |
42 reads your configuration and syntax files, takes effect as soon as | |
43 AnaGram encounters it in a file and stays in effect unless it is later | |
44 turned off. | |
45 | |
46 % XXX this should be belabored less. Also, good practice dictates that | |
47 % if you ship a project or a grammar it should compile in someone | |
48 % else's environment, and we shouldn't encourage people to do things | |
49 % like put \agparam{pointer input} in a systemwide AnaGram.cfg. | |
50 % | |
51 % XXX also in the Unix world it ought to read | |
52 % /usr/local/etc/AnaGram.cfg and then also ~/.AnaGram.cfg - or | |
53 % something like that. And it ought to be possible to set params | |
54 % on the agcl command line. We need to think about this. (Well, | |
55 % there's not really any valid use for either, so perhaps it | |
56 % doesn't matter.) | |
57 % | |
58 % How about something like | |
59 % | |
60 % Support for a global configuration file dates from the DOS-based | |
61 % AnaGram 1.x, where the same configuration mechanism was used to | |
62 % establish user interface preferences. AnaGram 2.0 and above handle | |
63 % preferences separately, and the configuration system is only used | |
64 % for code-related options. Since good practice dictates that code | |
65 % should continue to work if exported outside of one's personal | |
66 % environment, there are few or no legitimate uses of the global | |
67 % configuration file and support for it will likely be removed in a | |
68 % future AnaGram release. | |
69 % | |
70 % (But there really should be support for params on the agcl command | |
71 % line; if nothing else it would make it a lot easier to test | |
72 % combinations of settings.) | |
73 % | |
74 On initialization, AnaGram checks the directory that contains the | |
75 AnaGram executable file. If it finds \agfile{AnaGram.cfg}, it reads it | |
76 and sets internal parameters accordingly. It then looks for | |
77 \agfile{AnaGram.cfg} in your working directory and, if it finds it, reads | |
78 it in turn. If any parameter is set in both files, the last setting | |
79 wins. The effect of this two stage process is to allow you to set | |
80 your standard preferences in the principal directory, with specific | |
81 overrides in your working directories. You may also put configuration | |
82 parameters in your syntax file, which override the settings in the | |
83 configuration files. Note that neither configuration file is | |
84 necessary. | |
85 | |
86 Before executing an Analyze Grammar or Build Parser command, AnaGram | |
87 resets configuration parameters to their initial values, as determined | |
88 by the built in defaults and the configuration files read at program | |
89 initialization. | |
90 | |
91 There are, therefore, four levels at which parameters may be set. At | |
92 the first level, there are the settings built into AnaGram. If you | |
93 don't like some of these, you can override them with a configuration | |
94 file at the second level, the tools directory where you installed | |
95 AnaGram. If a particular project needs overrides, you can put them in | |
96 a configuration file at the third level, the working directory for | |
97 this project. And if you have specific configuration requirements for | |
98 a particular parser, the best place for them is the fourth level, the | |
99 syntax file for the parser. | |
100 | |
101 For all of this flexibility, some people prefer to set every | |
102 configuration parameter explicitly in their syntax files so there is | |
103 no question as to what setting is being used. AnaGram is set up so | |
104 you can do it whichever way you prefer. | |
105 | |
106 If you are uncertain as to the actual parameters that AnaGram is using | |
107 at any time, the | |
108 \index{Configuration Parameters}\index{Window} | |
109 \agwindow{Configuration Parameters} window listed in the | |
110 \agmenu{Windows} menu will show you the current state of all | |
111 parameters. | |
112 | |
113 The different varieties of configuration parameters are described | |
114 below. Each definition of a parameter must start on a new line. A | |
115 configuration file is just a sequence of parameter definitions, each | |
116 on a separate line. Blank lines can be used as separators where you | |
117 please, and comments may be used as described for syntax files. | |
118 Case\index{Case sensitivity} is ignored for parameter names (but not | |
119 for the whole definition). In a syntax file, each set of definitions | |
120 must be enclosed with brackets ( \agcode{[ ]} ), forming a | |
121 \index{Configuration section}\agterm{configuration section}, one of | |
122 the four kinds of AnaGram statements. Configuration sections can be | |
123 scattered throughout a syntax file, but each section should begin on a | |
124 new line, and following statements should also of course start on new | |
125 lines. There is no restriction on the number of sections, or on the | |
126 number of times a parameter appears. The last setting of a parameter | |
127 wins. | |
128 | |
129 The first variety of configuration parameter is a simple | |
130 \index{Switches}\index{Configuration switches}switch that controls | |
131 one of the various features of AnaGram. Such parameters are also called | |
132 \agterm{configuration switches}. They need simply be stated to set the | |
133 condition (turn it on) or negated with the tilde (\agcode{\~{}}) to | |
134 reset the condition (turn it off). Thus | |
135 | |
136 \begin{indentingcode}{0.4in} | |
137 nest comments | |
138 \end{indentingcode} | |
139 causes AnaGram to allow nested comments, and | |
140 | |
141 \begin{indentingcode}{0.4in} | |
142 \~{}nest comments | |
143 \end{indentingcode} | |
144 causes AnaGram to disallow nested comments. | |
145 | |
146 You may also set or reset configuration switches with explicit on or | |
147 off values: | |
148 | |
149 \begin{indentingcode}{0.4in} | |
150 nest comments = on | |
151 nest comments = off | |
152 \end{indentingcode} | |
153 | |
154 A second variety of configuration parameter takes a value which is the | |
155 name of a token. Thus | |
156 | |
157 \begin{indentingcode}{0.4in} | |
158 grammar token = c grammar | |
159 \end{indentingcode} | |
160 specifies that the token \agcode{c grammar} is the grammar that | |
161 AnaGram should use as the starting point for analyzing your grammar. | |
162 | |
163 A third variety of configuration parameter takes a value which is a C | |
164 or C++ data type. Thus | |
165 | |
166 \begin{indentingcode}{0.4in} | |
167 default token type = unsigned char * | |
168 \end{indentingcode} | |
169 signifies that the value of a token, unless otherwise specified, is a | |
170 pointer to an \agcode{unsigned char}. AnaGram does not accept the | |
171 full panoply of C and C++ \index{Data type}data types. The | |
172 restrictions are that AnaGram does not allow specification of array or | |
173 function types, nor explicit structure types. Types that are defined | |
174 with typedef statements, structure definitions, or class definitions, | |
175 including template classes, in your embedded C or C++ are acceptable. | |
176 If you have more complex data types, you should define a simple name | |
177 using a typedef statement. | |
178 | |
179 A fourth variety of configuration parameter takes a string value to | |
180 set an ASCII string used by AnaGram. Thus | |
181 | |
182 \begin{indentingcode}{0.4in} | |
183 header file name = "widget.h" | |
184 \end{indentingcode} | |
185 signifies that the header file created by AnaGram should be called | |
186 \agfile{widget.h}. In | |
187 those strings which are used to name the parser or files which AnaGram | |
188 builds, the character ``\agcode{\#}'' is used to indicate that AnaGram | |
189 should substitute the name of your syntax file. In strings used to | |
190 determine the names of program variables or functions, ``\agcode{\$}'' | |
191 is used to indicate that AnaGram should substitute the name of your | |
192 parser. When building enumeration constants for the names of the | |
193 tokens in your grammar, ``\agcode{\%}'' will be replaced by the name | |
194 of the token. | |
195 | |
196 The final variety of configuration parameter takes a numeric value. | |
197 The value may be decimal, octal or hexadecimal, following the C | |
198 conventions, and may have an optional sign. Thus | |
199 | |
200 \begin{indentingcode}{0.4in} | |
201 parser stack size = 50 | |
202 \end{indentingcode} | |
203 tells AnaGram to allocate space for at least fifty stack entries when | |
204 it creates your parser. | |
205 | |
206 If AnaGram does not recognize a parameter, it will give you a warning | |
207 with line number, column number, and the message ``no such | |
208 parameter''. If the value for a parameter is inappropriate, such as a | |
209 string value for a parameter which should have a numeric value, the | |
210 message will be ``inappropriate value''. If the error occurs in the | |
211 configuration file found in the AnaGram directory, AnaGram will prefix | |
212 the warning with the complete path name for the file. If the error | |
213 occurs in the configuration file in your working directory, AnaGram | |
214 will prefix the warning with ``AnaGram.cfg:''. If AnaGram encounters a | |
215 syntax error while reading a configuration file, it will honor the | |
216 parameter settings it found before the syntax error, but will ignore | |
217 everything that follows the error. | |
218 | |
219 \section{Alphabetic Listing of Configuration Parameters} | |
220 | |
221 \index{Configuration switches}\index{Allow macros}\index{Macros} | |
222 \agparamheading{allow macros}{switch, default on} | |
223 | |
224 When this switch is set, i.e., on, reduction procedures will be | |
225 implemented as macros if they are sufficiently simple. This makes | |
226 your parser some what more compact and faster but makes it somewhat | |
227 more difficult to debug. It's a good idea to turn this switch off for | |
228 debugging. | |
229 | |
230 \index{Configuration switches}\index{Auto init} | |
231 \agparamheading{auto init}{switch, default on} | |
232 | |
233 This switch controls the initialization of any parser that is not | |
234 \agparam{event driven}. When it is on, the | |
235 \index{Initializer}initializer for your parser is automatically called | |
236 every time the parser is called. | |
237 This is the normal situation. On occasion, however, it | |
238 is desirable to call a parser several times without reinitializing it. | |
239 In this case, you may set the \agparam{auto init} parameter to off. | |
240 Should you do this, you must call the initializer yourself whenever | |
241 appropriate. | |
242 % XXX characterize the occasion... | |
243 | |
244 When \agparam{event driven} is set, \agparam{auto init} has no effect. | |
245 | |
246 \index{Configuration switches}\index{Auto resynch} | |
247 \agparamheading{auto resynch}{switch, default off} | |
248 | |
249 Setting this switch causes AnaGram to include an automatic | |
250 resynchronization procedure in the parser. The resynchronization | |
251 procedure will be invoked upon encountering a syntax error and will | |
252 skip over input until it finds input characters or tokens consistent | |
253 with its state at the time of the error. The purpose of the | |
254 resynchronization procedure is to provide a simple way for your parser | |
255 to proceed in the event of syntax errors so that it can find more than | |
256 one syntax error on a given pass. The resynchronization procedure | |
257 uses a heuristic based on your own syntax. AnaGram itself uses this | |
258 technique to resynchronize after syntax errors in its input. | |
259 | |
260 A disadvantage to using this resynchronization technique is that the | |
261 resynchronization procedure turns off all reduction procedures. The | |
262 reason is that the resynchronization may cause a number of reduction | |
263 procedures to be skipped. This means that the parameters for any | |
264 reduction procedures that might be called later would be suspect and | |
265 could cause serious problems. It seems more prudent simply to shut | |
266 them down. Semantically determined productions will subsequently, of | |
267 course, always use the default reduction token. | |
268 | |
269 If you have a | |
270 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} | |
271 macro, it will be called \emph{before} the resynchronization | |
272 process. It will also be called on subsequent syntax errors, so your | |
273 program will not lose control entirely. | |
274 | |
275 If you use the auto resynchronization procedure, you must also specify | |
276 the \agparam{eof token} configuration parameter (see below) so that | |
277 the synchronizer doesn't inadvertently try to pass over the end of | |
278 file. | |
279 | |
280 For other methods of recovering from syntax errors, see Chapter 9. | |
281 | |
282 \index{Configuration switches}\index{Backtrack} | |
283 \agparamheading{backtrack}{switch, default on} | |
284 | |
285 If your parser does not continue after encountering a syntax error, | |
286 you can speed up your parser and make it a little smaller by turning | |
287 off the \agparam{backtrack} switch. If \agparam{backtrack} is on, | |
288 AnaGram configures your parser so that in case of syntax error it can | |
289 undo any default reductions it might have made as a consequence of the | |
290 erroneous input. The purpose of such an undo function is to identify | |
291 the proper error frame and to maximize the probability of being able | |
292 to recover gracefully. | |
293 | |
294 % XXX shouldn't these be indexed as ``obsolete parameters'' or | |
295 % something, with xrefs so if you look up ``Bottom margin'' in the | |
296 % index it says ``see ``obsolete parameters''''? | |
297 % | |
298 % Also, shouldn't the various obsolete parameters be described with | |
299 % the same text? | |
300 % | |
301 \index{Configuration parameters}\index{Bottom margin} | |
302 \agparamheading{bottom margin}{integer value, default = 3} | |
303 | |
304 This is an obsolete parameter which was used in the DOS version of | |
305 AnaGram. It is no longer used, but is still recognized for the sake | |
306 of compatibility. | |
307 | |
308 \index{Configuration switches}\index{Bright background} | |
309 \agparamheading{bright background}{switch, default on} | |
310 | |
311 This configuration switch is not used in AnaGram 2.0. It is retained | |
312 for compatibility with configuration files used with the DOS versions | |
313 of AnaGram. | |
314 | |
315 \index{Configuration switches}\index{Case sensitive} | |
316 \index{Case sensitivity} | |
317 \agparamheading{case sensitive}{switch, default on} | |
318 | |
319 Use this switch to control how your parser deals with distinctions | |
320 between upper and lower case. When \agparam{case sensitive} is on, | |
321 AnaGram builds a parser which distinguishes upper from lower case. | |
322 When this switch is off, AnaGram builds a parser which ignores case | |
323 for all input. This does not mean that the values of character set | |
324 tokens are not case sensitive. Although 'a' and 'A' would map to the | |
325 same token, the values would still be lower and upper case | |
326 respectively. | |
327 | |
328 % XXX the last bit could be explained more clearly. (something like | |
329 % ``parsers still preserve case'') | |
330 | |
331 % XXX this should discuss character sets, locales, and other such | |
332 % garbage. | |
333 | |
334 \index{Configuration parameters}\index{Compile command} | |
335 \agparamheading{compile command}{string, default = \agcode{NULL}} | |
336 | |
337 This parameter is retained only for compatibility with the DOS version | |
338 of AnaGram. It is ignored in the Windows version. | |
339 | |
340 \index{Configuration switches}\index{Const data} | |
341 \agparamheading{const data}{switch, default on} | |
342 | |
343 The \agparam{const data} switch controls the use of \agcode{const} | |
344 qualifiers in generated C code. If the switch is on, all fixed data | |
345 arrays in the parser file will be qualified as \agcode{const}. The | |
346 \agparam{const data} switch is ignored if the \agparam{old style} | |
347 switch is set. | |
348 | |
349 \index{Configuration parameters}\index{Context type} | |
350 %XXX: \index{context tracking} ? | |
351 \agparamheading{context type}{c data type, no default} | |
352 | |
353 By default, \agparam{context type} is undefined. If you assign the | |
354 name of a C data type, AnaGram will implement ``context tracking'' in | |
355 your parser. See Chapter 9. The data type name can be either a | |
356 standard, pre-defined data type or one which you create with a | |
357 \agcode{typedef} statement. | |
358 | |
359 \index{Configuration parameters}\index{Coverage file name} | |
360 \index{File extension}\index{nrc} | |
361 \agparamheading{coverage file name}{string, default = \agcode{"\#.nrc"}} | |
362 | |
363 If you set the \agparam{rule coverage} configuration switch, AnaGram | |
364 will provide functions in your parser to read and write rule counts to | |
365 a file. The name of the file will be determined by \agparam{coverage | |
366 file name}. The name of your syntax file will be substituted for the | |
367 ``\agcode{\#}'' character. | |
368 | |
369 \index{Configuration switches}\index{Declare pcb} | |
370 % XXX \index{Parser control block} ? | |
371 \agparamheading{declare pcb}{switch, default on} | |
372 | |
373 When AnaGram builds a parser, it checks the status of the | |
374 \agparam{declare pcb} switch. If it is on, AnaGram declares a parser | |
375 control block for you. AnaGram creates the name of the control block | |
376 variable by appending \agcode{{\us}pcb} to the name of your parser. | |
377 AnaGram will also code an \agcode{\#include} statement to include your | |
378 parser header file, and will define the \agcode{PCB} macro for you. | |
379 If you wish to declare the parser control block yourself you should | |
380 turn this switch off. | |
381 | |
382 \index{Configuration parameters}\index{Default input type} | |
383 \index{Input type} | |
384 % XXX: \index{Types} ? | |
385 \agparamheading{default input type}{c data type, default = \agcode{int}} | |
386 | |
387 This parameter tells AnaGram what data type to assume for terminal | |
388 tokens if they are not explicitly declared. Normally, you would | |
389 explicitly declare terminal tokens only when you have set the | |
390 \agparam{input values} configuration switch. The default type for | |
391 nonterminal tokens is given by \agparam{default token type}. | |
392 | |
393 \index{Configuration switches}\index{Default reductions}\index{Reduction} | |
394 \agparamheading{default reductions}{switch, default on} | |
395 | |
396 If in a given parser state there is only one production that could be | |
397 possibly reduced, it is usually faster to reduce it on any input than | |
398 to check specifically for correct input before reducing it. The only | |
399 time this default reduction causes trouble is in the event of | |
400 erroneous input. In this situation you may get an erroneous | |
401 reduction. Normally when you are parsing a file, this is | |
402 inconsequential because you are not going to continue semantic action | |
403 in the presence of error. But, if you are using your parser to handle | |
404 real-time interactive input, you have to be able to continue semantic | |
405 processing after notifying your user that he has entered erroneous | |
406 input. In this case you would want to turn the \agparam{default | |
407 reductions} switch off so that productions are reduced only when there | |
408 is correct input. | |
409 | |
410 \index{Configuration parameters}\index{Default token type}\index{token} | |
411 % XXX \index{Types} ? | |
412 \agparamheading{default token type}{c data type, default = \agcode{void}} | |
413 | |
414 This parameter takes a C data type as its value. It is used to set | |
415 the data type for the semantic values of nonterminal tokens whose type | |
416 is not explicitly specified in the grammar. To set the default type | |
417 for terminal tokens use \agparam{default input type}. | |
418 | |
419 \index{Diagnose errors}\index{Configuration switches} | |
420 \agparamheading{diagnose errors}{switch, default on} | |
421 | |
422 If you set this switch, AnaGram will include a syntax error diagnostic | |
423 procedure in your parser. This procedure will be called before your | |
424 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is | |
425 called. It will store a pointer to a string in the | |
426 \agcode{error{\us}message} field of your parser control | |
427 block. The string will contain a diagnostic message. If there is | |
428 only one syntactically correct input, x, for example, the message will | |
429 be ``Missing x''. Otherwise it will be ``Unexpected x'' if the input | |
430 is recognizable but incorrect and ``Unexpected input'' otherwise. If | |
431 the \agparam{error frame} switch has been set, the | |
432 \agcode{error{\us}frame{\us}ssx} and | |
433 \agcode{error{\us}frame{\us}token} fields | |
434 in the parser control block will be set as described in Chapter 9. | |
435 | |
436 % XXX say: diagnose errors causes the token_names[] array to be | |
437 % included in the parser. and index token_names[]... | |
438 | |
439 \index{Distinguish lexemes}\index{Configuration switches} | |
440 % XXX \index{Disregard} ? | |
441 \agparamheading{distinguish lexemes}{switch, default off} | |
442 | |
443 The \agparam{distinguish lexemes} switch has no effect unless a | |
444 disregard token has been defined. Normally, the disregard token | |
445 (usually white space) is optional between lexemes. This may lead to | |
446 apparent shift-reduce conflicts if the characters that comprise the | |
447 second of two successive lexemes can be construed as part of the first | |
448 lexeme. In this situatation, turning on the \agparam{distinguish | |
449 lexemes} switch effectively requires a disregard token to separate the | |
450 two lexemes. | |
451 | |
452 \index{Edit command}\index{Configuration parameters} | |
453 \index{File extension}\index{syn} | |
454 \agparamheading{edit command}{string, default = \agcode{"ed \#.syn"}} | |
455 | |
456 This parameter is no longer used and is retained only for file | |
457 compatibility with the DOS version of AnaGram. | |
458 | |
459 \index{Enable mouse}\index{Configuration switches} | |
460 \agparamheading{enable mouse}{switch, default on} | |
461 | |
462 This parameter is no longer used and is retained only for file | |
463 compatibility with the DOS version of AnaGram. | |
464 | |
465 \index{Enum constant name}\index{Configuration parameters} | |
466 \agparamheading{enum constant name}{string, | |
467 default = \agcode{"\${\us}\%{\us}token"}} | |
468 | |
469 Use the \agparam{enum constant name} parameter to control the names | |
470 AnaGram uses for the enumeration constants it defines in the | |
471 header file for your parser. The value of \agparam{enum constant | |
472 name} should be a string containing the ``\agcode{\%}'' character. | |
473 AnaGram will substitute each token name in turn for the | |
474 ``\agcode{\%}'' character in this template as it creates the list of | |
475 enumeration constants. If it finds a ``\agcode{\$}'' character it | |
476 will substitute the name of your parser. | |
477 | |
478 \index{Eof token}\index{Configuration parameters}\index{Token} | |
479 \agparamheading{eof token}{token name, no default} | |
480 | |
481 If you use the auto resynchronization capability of AnaGram, you must | |
482 specify an end of file token explicitly. You can do this either by | |
483 specifying a terminal token in your grammar called \agcode{eof} or by | |
484 using the \agparam{eof token} parameter to identify some other | |
485 terminal token to be used as the end of file marker. You would do | |
486 this only if you must use the name \agcode{eof} for some other | |
487 purpose. | |
488 | |
489 \index{Error frame}\index{Error frame}\index{Configuration switches} | |
490 \agparamheading{error frame}{switch, default off} | |
491 | |
492 AnaGram uses the \agparam{error frame} switch in conjunction with the | |
493 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} | |
494 switch. If both are set, when your parser encounters a syntax error, | |
495 before invoking the | |
496 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro, | |
497 your parser will determine the frame in which the error occurred, that | |
498 is, the production the parser was trying to match at the time of the | |
499 error. | |
500 | |
501 % XXX: See chapter (dd.tex) for a complete discussion. | |
502 | |
503 \index{Configuration parameters}\index{Error token}\index{Token} | |
504 \agparamheading{error token}{token name, no default} | |
505 | |
506 One of your options for error recovery after a syntax error is a | |
507 technique similar to that provided in \agfile{yacc}. You include a | |
508 terminal token called \agcode{error} in your grammar. When the parser | |
509 encounters an error in the input it backs up the state stack to the | |
510 most recent state in which \agcode{error} was an acceptable input. It | |
511 then shifts to the new state as though it had seen an actual | |
512 \agcode{error} token. At this point, it skips over any character in | |
513 the input which is not an acceptable input character for this state. | |
514 Once it does find an acceptable input character, it continues | |
515 processing as though nothing had happened. If you wish to use this | |
516 approach and for some reason you wish to use the name \agcode{error} | |
517 for some other token in your grammar, you may use the \agparam{error | |
518 token} parameter to identify some other terminal token in your grammar | |
519 as the ``error token''. | |
520 | |
521 \index{Configuration switches}\index{Error trace}\index{Trace} | |
522 \index{Window} | |
523 \agparamheading{error trace}{switch, default off} | |
524 | |
525 If you turn the \agparam{error trace} switch on, AnaGram will include | |
526 code in your parser so that when it encounters a syntax error it will | |
527 write the contents of the \index{Parser state stack}\index{State | |
528 stack}\index{Stack}parser state stack to a file. The name of the file | |
529 is the same as the name of your syntax file but with the extension | |
530 \index{File extension}\index{etr}\agfile{.etr}. You may override this | |
531 definition by defining | |
532 \index{AG{\us}TRACE{\us}FILE{\us}NAME}\index{Macros}\agcode{AG{\us}TRACE{\us}FILE{\us}NAME} | |
533 in your embedded C. | |
534 | |
535 The \agmenu{Error Trace} option in the \agmenu{Action} menu can then | |
536 read this information and prepare a pre-built \agwindow{Grammar Trace} | |
537 showing you the status of the parse at the time of the syntax error. | |
538 You would use this switch primarily when you are first checking out | |
539 your grammar to make sure it accurately represents the input you | |
540 desire to handle. You would also use it any time your parser | |
541 encounters a syntax error you don't understand. For more information, | |
542 see Chapter 5. | |
543 | |
544 \index{Escape backslashes}\index{Configuration switches} | |
545 \agparamheading{escape backslashes}{switch, default off} | |
546 | |
547 \agparam{Escape backslashes} is used only in conjunction with the | |
548 \agparam{line numbers} option. When turned on, it causes the | |
549 backslashes in the pathname generated by the \agparam{line numbers} | |
550 option to be doubled. This switch has been provided because C and C++ | |
551 compilers are not consistent in their handling of backslashes in path | |
552 names. | |
553 | |
554 \index{Event driven}\index{Configuration switches} | |
555 % XXX \index{AG{\us}RUNNING{\us}CODE} ? | |
556 % XXX \index{exit{\us}flag} ? | |
557 \agparamheading{event driven}{switch, default off} | |
558 | |
559 If you turn the \agparam{event driven} switch on, when you build a | |
560 parser, it will be configured as an ``event driven'' parser. This | |
561 means that after calling its initializer function, you call it once | |
562 with each discrete unit of input. The parser proceeds until it | |
563 needs more input, finishes the grammar, or encounters an error. It | |
564 then returns. The \agcode{exit{\us}flag} field in the parser control | |
565 block is equal to \agcode{AG{\us}RUNNING{\us}CODE} if more input is needed. | |
566 Other values indicate other reasons for termination. | |
567 % XXX crossreference the discussion of exit codes? | |
568 | |
569 When \agparam{event driven} is on, \agparam{auto init} has no effect; | |
570 you must always call the initializer function yourself. | |
571 | |
572 \index{Far tables}\index{Configuration switches} | |
573 \agparamheading{far tables}{switch, default = off} | |
574 | |
575 If \agparam{far tables} is on when AnaGram builds a parser, it will | |
576 declare the larger tables it builds as \agcode{far}. This can be a | |
577 convenience when using some memory models of the 8086 architecture. | |
578 | |
579 \index{Grammar token}\index{Configuration parameters}\index{Token} | |
580 \agparamheading{grammar token}{token name, no default} | |
581 | |
582 The \agparam{grammar token} parameter may be used to specify the | |
583 grammar, or ``goal'', token for the syntax analyzer portion of | |
584 AnaGram. An alternative method is to append a ``\$'' to the goal | |
585 token when you define it. You may also simply use the name | |
586 \agcode{grammar} to identify the grammar token. | |
587 | |
588 \index{Header file name}\index{Configuration parameters}\index{File name} | |
589 \agparamheading{header file name}{string, default = \agcode{"\#.h"}} | |
590 | |
591 This parameter names the parser header file AnaGram generates. The | |
592 contents of the header file are described in Chapter 9. When AnaGram | |
593 creates the file, it copies the value of \agparam{header file name}, | |
594 substituting the name of your syntax file for the ``\agcode{\#}'' | |
595 character, in order to create the pathname and extension for the file. | |
596 You can therefore use this parameter to give the header file a | |
597 particular name, independent of the syntax file name, or to specify a | |
598 particular drive or directory where you want the header file to | |
599 reside. Note that if you include a full DOS/Windows pathname, | |
600 backslash characters must be quoted. | |
601 | |
602 \index{Input values}\index{Configuration switches} | |
603 \agparamheading{input values}{switch, default off} | |
604 | |
605 % XXX this shouldn't say ASCII because it's true even if the | |
606 % characters are some other character set... | |
607 If the input to your parser includes explicit token values which are | |
608 not simply the ASCII values of corresponding ASCII input characters, | |
609 you must set the \agparam{input values} switch to inform AnaGram. | |
610 Unless your parser is \agparam{event driven}, you must also provide | |
611 your own \agcode{GET{\us}INPUT} macro. | |
612 | |
613 \index{Line length}\index{Configuration parameters} | |
614 \agparamheading{line length}{integer value, default = 80} | |
615 | |
616 \agparam{Line length} is an obsolete configuration parameter, recognized | |
617 for the sake of compatibility with configuration files prepared for | |
618 the DOS version of AnaGram. It is ignored in AnaGram 2.0. | |
619 | |
620 \index{Line numbers}\index{configuration switches} | |
621 \agparamheading{line numbers}{switch, default off} | |
622 | |
623 If \agparam{line numbers} is set, AnaGram will put syntax file line | |
624 numbers into the generated C code file using the | |
625 \index{\#line}\agcode{\#line} | |
626 directive so that your compiler diagnostics will refer to lines in the | |
627 syntax file rather than in the generated C code file. If | |
628 \agparam{line numbers} is off, AnaGram will put syntax file line | |
629 numbers in comments. The | |
630 \index{Line numbers path}\index{Configuration parameters} | |
631 \agparam{line numbers path} and | |
632 \index{Escape backslashes}\index{Configuration switch} | |
633 \agparam{escape backslashes} | |
634 switches may be used to control the generation of the line number | |
635 directives. | |
636 | |
637 \index{Line numbers path}\index{Configuration parameters} | |
638 \agparamheading{line numbers path}{string, default = \agcode{NULL}} | |
639 | |
640 When you have set the \agparam{line numbers} switch and | |
641 \agparam{line numbers path} is not NULL, AnaGram uses it in the | |
642 \agcode{\#line} directive in place of the full path name of your | |
643 syntax file. | |
644 % XXX update for unix where we (maybe) don't generate full pathnames | |
645 | |
646 \index{Lines and columns}\index{Configuration switches} | |
647 \agparamheading{lines and columns}{switch, default on} | |
648 | |
649 If this switch is set, AnaGram will incorporate code into your parser | |
650 to track line numbers and column numbers in its input. At all times, | |
651 the \agcode{line} and \agcode{column} fields in your parser control | |
652 block will mark the location of the current lookahead character. The | |
653 treatment of tab characters is controlled by the | |
654 \index{TAB{\us}SPACING}\index{Macros}\agcode{TAB{\us}SPACING} macro. | |
655 | |
656 \index{Main program}\index{Configuration switches} | |
657 \agparamheading{main program}{switch, default on} | |
658 | |
659 The \agparam{main program} switch determines what AnaGram does if you | |
660 invoke the Build Parser command, but have no embedded C in your syntax | |
661 file. If the switch is on, AnaGram creates a main program which does | |
662 nothing but call your parser. The switch is ignored if your parser | |
663 uses \agparam{pointer input} or is \agparam{event driven}. | |
664 | |
665 \index{Max conflicts}\index{Configuration parameters}\index{Conflicts} | |
666 \agparamheading{max conflicts}{integer value, default = 50} | |
667 | |
668 \agparam{Max conflicts} limits the number of conflicts AnaGram will | |
669 record. Sometimes, a simple editing error in your syntax file can | |
670 cause hundreds of conflicts, which you don't need to see in gory | |
671 detail. If you have a grammar that is in serious trouble and you want | |
672 to see more conflicts, you may change \agparam{max conflicts} to suit | |
673 your needs. | |
674 | |
675 \index{Near functions}\index{Configuration switches} | |
676 \agparamheading{near functions}{switch, default off} | |
677 | |
678 \agparam{Near functions} controls the use of the \agcode{near} keyword | |
679 for static functions in your parser. If your parser is to run on a | |
680 16-bit 80x86 processor you would want to turn it on. If you are | |
681 going to run your parser on some other processor or use a C compiler | |
682 that does not support the \agcode{near} keyword you should leave | |
683 \agparam{near functions} off. | |
684 | |
685 \index{Configuration switches}\index{Nest comments}\index{Comments} | |
686 \agparamheading{nest comments}{switch, default off} | |
687 | |
688 Use this switch to allow nested comments in your syntax or | |
689 configuration files. It defaults to off, in accordance with the ANSI | |
690 standard for C. Note that AnaGram scans comments in any embedded C | |
691 code as well as in the grammar specification. You may turn this | |
692 switch on and off as many times as necessary in a single file. | |
693 | |
694 \index{Old style}\index{Configuration switches} | |
695 \agparamheading{old style}{switch, default off} | |
696 | |
697 \agparam{Old style} controls the function definitions in the code | |
698 AnaGram generates. When \agparam{old style} is off, AnaGram generates | |
699 ANSI style calling sequences with prototypes as necessary. When | |
700 \agparam{old style} is on, it generates old style function definitions, | |
701 and no prototypes. It also causes the | |
702 \index{Const data}\index{Configuration switch}\agparam{const data} | |
703 switch to be ignored. | |
704 | |
705 \index{Page length}\index{Configuration parameters} | |
706 \agparamheading{page length}{integer value, default = 66} | |
707 | |
708 \agparam{Page length} is an obsolete configuration parameter, | |
709 recognized for the sake of compatibility with configuration files | |
710 prepared for the DOS version of AnaGram. It is ignored in AnaGram | |
711 2.0. | |
712 | |
713 \index{Parser file name}\index{Configuration parameters}\index{File name} | |
714 \agparamheading{parser file name}{string, default = \agcode{"\#.c"}} | |
715 | |
716 AnaGram creates a parser which consists of all the embedded C code in | |
717 your syntax file, the syntax tables created by the syntax analyzer, | |
718 and a parsing engine configured to your requirements. This code is | |
719 written to a file whose name is given by this parameter. When AnaGram | |
720 creates your parser file, it copies the value of the \agparam{parser | |
721 file name} parameter, substituting the name of your syntax file for | |
722 the ``\agcode{\#}'' character, in order to create the pathname and | |
723 extension for the file. You can therefore use this parameter to give | |
724 the parser file a particular name, independent of the syntax file | |
725 name, or to specify a particular drive or directory where you want the | |
726 parser file to reside. Note that if you include a full DOS/Windows | |
727 pathname, you must quote the backslash characters. If writing a C++ | |
728 parser you would use this parameter to set the output filename suffix. | |
729 | |
730 \index{Parser}\index{Parser name}\index{Configuration parameters} | |
731 \agparamheading{parser name}{string, default = \agcode{"\$"}} | |
732 | |
733 % XXX This should say something other than ``name your parser'' | |
734 AnaGram uses the value of \agparam{parser name} to name your parser, | |
735 substituting the name (not including the extension) of your syntax | |
736 file for a ``\agcode{\$}'' character. If you accept the default value of | |
737 \agparam{parser name} and have a syntax file called \agfile{ana.syn}, | |
738 AnaGram will name your parser \agcode{ana}. | |
739 | |
740 The \index{Initializer}initializer for your parser will have the same | |
741 name preceded by \agcode{init{\us}}. In the above example, the | |
742 initializer would be called \agcode{init{\us}ana}. | |
743 | |
744 \index{Configuration parameters}\index{Stack}\index{Parser stack alignment} | |
745 \agparamheading{parser stack alignment}{c data type, default = \agcode{int}} | |
746 | |
747 \agparam{Parser stack alignment} is used to control byte alignment of | |
748 the parser stack, \agcode{PCB.vs}. AnaGram normally adds a field of | |
749 the specified data type to the \agcode{union} declaration that defines | |
750 the data type for the parser stack. This parameter can be used to | |
751 deal with byte alignment problems when a parser is to be run on a | |
752 processor with byte alignment restrictions. For instance, if your | |
753 grammar has tokens of type \agcode{double} and your processor requires | |
754 double precision variables to be properly aligned, you can include the | |
755 following statement in a configuration section in your grammar or in | |
756 your configuration file: | |
757 \begin{indentingcode}{0.4in} | |
758 parser stack alignment = double | |
759 \end{indentingcode} | |
760 If the data type is \agcode{void}, no alignment declaration will be | |
761 made. | |
762 % You will not need to change this parameter if your parser is to | |
763 % run on a PC or compatible processor. | |
764 % | |
765 % XXX this really ought to be updated for the century of the fruitbat | |
766 | |
767 \index{Configuration parameters}\index{Parser stack size} | |
768 \agparamheading{parser stack size}{integer value, default = 32} | |
769 | |
770 \agparam{Parser stack size} is used to set the sizes of the parser | |
771 stacks in your parser control block. When AnaGram analyzes your | |
772 grammar, it determines the minimum amount of stack space required for | |
773 the deepest left recursion. To this depth it adds one half the value | |
774 of the \agparam{parser stack size} parameter. It then sets the actual | |
775 stack size to the larger of this value and the \agparam{parser stack | |
776 size} parameter. If you find 32 wastefully large or dangerously | |
777 small, you can define it to suit the needs of your particular parser. | |
778 | |
779 \index{Pointer input}\index{Configuration switches} | |
780 \agparamheading{pointer input}{switch, default off} | |
781 | |
782 When you turn \agparam{pointer input} on you tell AnaGram that the | |
783 input to your parser is in memory and can be scanned simply by | |
784 incrementing a pointer. Before calling your parser you should make | |
785 sure that the \agcode{pointer} field in your parser control block is | |
786 properly initialized to point to the first character or token in your | |
787 input. | |
788 | |
789 Use the parameter | |
790 \index{Pointer type}\index{Configuration parameters}\agparam{pointer type} | |
791 to specify the type of the pointer. The default value of pointer type | |
792 is \agcode{unsigned char *}. | |
793 | |
794 \index{Pointer type}\index{Configuration parameters} | |
795 \agparamheading{pointer type}{c data type, default = \agcode{unsigned char *}} | |
796 | |
797 If you have set the \agparam{pointer input} switch, AnaGram will use | |
798 the value of the \agparam{pointer type} parameter to declare the | |
799 \agcode{pointer} field in your parser control block. | |
800 | |
801 \index{Print file name}\index{Configuration parameters}\index{File name} | |
802 \agparamheading{print file name}{string, default = \agcode{"LPT1"}} | |
803 | |
804 \agparam{Print file name} is an obsolete configuration parameter, | |
805 recognized for the sake of compatibility with configuration files | |
806 prepared for the DOS version of AnaGram. It is ignored by AnaGram | |
807 2.0. | |
808 | |
809 \index{Quick reference}\index{Configuration switches} | |
810 \agparamheading{quick reference}{switch, default off} | |
811 | |
812 The \agparam{quick reference} switch is no longer used, but is still | |
813 recognized for compatiblity's sake. In future versions of AnaGram it | |
814 may no longer be recognized. | |
815 | |
816 \index{Configuration switches}\index{Reduction choices} | |
817 \agparamheading{reduction choices}{switch, default off} | |
818 | |
819 If the \agparam{reduction choices} switch is set when AnaGram builds a | |
820 parser, it will include in your parser file a function which can | |
821 identify the acceptable choices for the reduction token in the current | |
822 state. You would use this switch only if you were using semantically | |
823 determined productions in your grammar and if there were states in | |
824 which not all the tokens on the left side of the production were valid | |
825 reduction tokens. | |
826 | |
827 \index{Rule coverage}\index{Configuration switches}\index{Coverage} | |
828 \agparamheading{rule coverage}{switch, default off} | |
829 | |
830 If you set the \agparam{rule coverage} switch, AnaGram will include | |
831 code in your parser to count the number of times your parser identifies | |
832 each rule in your grammar. To maintain the counts, AnaGram declares, | |
833 at the beginning of your parser, an integer array, whose name is | |
834 created by appending \agcode{{\us}nrc} to the name of your parser. The | |
835 array contains one counter for each rule you have defined in your | |
836 grammar. There are no entries for the auxiliary rules that AnaGram | |
837 creates to deal with set overlaps or disregard statements. In order | |
838 to identify every rule that the parser reduces in the course of | |
839 execution, AnaGram | |
840 has to turn off certain optimization features in your parser. | |
841 Therefore, a parser that has the \agparam{rule coverage} switch | |
842 enabled will run slightly slower than one with the switch off. An | |
843 entry on the \agmenu{Browse} menu allows you to view the coverage data. | |
844 % XXX See Chapter ???. | |
845 | |
846 \index{Tab spacing}\index{Configuration parameters} | |
847 \agparamheading{tab spacing}{integer value, default = 8} | |
848 | |
849 \agparam{Tab spacing} controls the expansion of tabs when AnaGram | |
850 displays your syntax file or the \agwindow{File Trace} test file. | |
851 | |
852 The value of \agparam{tab spacing} is also used to set the default | |
853 value of the \index{TAB{\us}SPACING}\index{Macros}\agcode{TAB{\us}SPACING} | |
854 macro in your parser. | |
855 | |
856 The default value of \agparam{tab spacing} is 8. If you prefer a | |
857 different value, you should probably include an appropriate statement | |
858 in your configuration file. For example: | |
859 | |
860 \begin{indentingcode}{0.4in} | |
861 tab spacing = 2 | |
862 \end{indentingcode} | |
863 | |
864 \index{Test file binary}\index{Configuration switch} | |
865 \agparamheading{test file binary}{switch, default off} | |
866 | |
867 \agparam{Test file binary} causes \agwindow{File Trace} to read test | |
868 files in binary mode. When \agwindow{File Trace} reads a test file, | |
869 it normally reads it in text mode, which in Windows causes carriage return | |
870 characters to be stripped out. Occasionally it is necessary to test a | |
871 grammar where carriage return characters are important and should not | |
872 be stripped. In this situation, set \agparam{test file binary} to on, | |
873 and the carriage return characters will not be discarded. | |
874 % XXX rewrite the second half of this paragraph? | |
875 | |
876 \index{Test file mask}\index{Configuration parameters} | |
877 \agparamheading{test file mask}{string, default = \agcode{"*.*"}} | |
878 | |
879 % XXX default should be ``*'' on unix | |
880 AnaGram uses \agparam{test file mask} to filter the pick list of test | |
881 files when you use the | |
882 \index{File Trace}\index{Trace}\index{Window}\agwindow{File Trace} | |
883 feature. | |
884 You may set it to any value you wish, including a pathname. | |
885 % XXX: test this | |
886 For instance, if you know that all your test files are in the directory | |
887 \agfile{C:{\bs}PROJECT{\bs}SOURCE} and have | |
888 extension \agfile{.FOO} you could set test file mask to | |
889 \agcode{"C:{\bs\bs}PROJECT{\bs\bs}SOURCE{\bs\bs}*.FOO"}. | |
890 Note that, as in any string literal, backslash characters must be | |
891 escaped. | |
892 | |
893 \index{Test range}\index{Configuration switches}\index{Range} | |
894 \agparamheading{test range}{switch, default off} | |
895 % XXX should this really default to off? | |
896 | |
897 When \agparam{test range} is on, AnaGram will insert code in your | |
898 parser to make sure all input characters or token identifiers are | |
899 within the range specified in your grammar. If you do not turn this | |
900 switch on, your parser will run slightly faster, but its behavior will | |
901 be undefined if it gets input outside the range you have specified | |
902 in your grammar. | |
903 | |
904 \index{Token names}\index{Configuration switches} | |
905 \agparamheading{token names}{switch, default off} | |
906 | |
907 When \agparam{token names} is set, AnaGram includes a static array of | |
908 ASCII strings in your parser containing the names of your tokens. The | |
909 name of this array is \agcode{\#{\us}token{\us}names} where the | |
910 ``\agcode{\#}'' character is replaced with the name of your parser. | |
911 The entry for tokens which do not have names is an empty string: | |
912 \agcode{""}. | |
913 | |
914 \index{Top margin}\index{Configuration parameters} | |
915 \agparamheading{top margin}{integer value, default = 3} | |
916 | |
917 \agparam{Top margin} is an obsolete configuration parameter, | |
918 recognized for the sake of compatibility with configuration files | |
919 prepared for the DOS version of AnaGram. It is ignored by AnaGram | |
920 2.0. | |
921 | |
922 \index{Traditional engine}\index{Configuration switches} | |
923 \agparamheading{traditional engine}{switch, default off} | |
924 | |
925 Traditional LALR-1 parsers use a parsing engine which has only four | |
926 actions: shift, reduce, accept, and error. AnaGram, in the interests | |
927 of faster execution and more compact tables, uses a parsing engine | |
928 with a number of short-cut actions. The \agparam{traditional engine} | |
929 switch tells AnaGram not to use the short-cut actions. | |
930 | |
931 You would set this switch primarily in conjunction with use of the | |
932 \index{Grammar Trace}\index{Trace}\index{Window}\agwindow{Grammar Trace} | |
933 in order to have a clearer idea of what is happening. AnaGram will | |
934 then be using the same parsing actions as textbook parsers. Note that | |
935 if a lookahead token has already been selected, AnaGram will display | |
936 it on the last line of the \agwindow{Parser Stack} pane in the | |
937 \agwindow{Grammar Trace} window. | |
938 % XXX what is this note doing here? | |
939 | |
940 You should turn this switch back off when you have finished debugging | |
941 or your parser will be larger and slower than necessary. | |
942 | |
943 % XXX: say that in production code traditional engine is not useful | |
944 % and only serves to slow things down. | |
945 | |
946 \index{Video mode}\index{Configuration parameters} | |
947 \agparamheading{video mode}{integer value, default = $-$1} | |
948 | |
949 \agparam{Video mode} is an obsolete configuration parameter, | |
950 recognized for the sake of compatibility with configuration files | |
951 prepared for the DOS version of AnaGram. It is ignored by AnaGram | |
952 2.0. |