Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/misc/html/examples/mpp/ts.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> | |
2 <HTML> | |
3 <HEAD> | |
4 <TITLE> Token Scanner - Macro preprocessor and C Parser </TITLE> | |
5 </HEAD> | |
6 | |
7 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" | |
8 TEXT="#000000" LINK="#0033CC" | |
9 VLINK="#CC0033" ALINK="#CC0099"> | |
10 | |
11 <P> | |
12 <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram" | |
13 WIDTH=124 HEIGHT=30 > | |
14 <BR CLEAR="all"> | |
15 Back to : | |
16 <A HREF="../../index.html">Index</A> | | |
17 <A HREF="index.html">Macro preprocessor overview</A> | |
18 <P> | |
19 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
20 WIDTH=1010 HEIGHT=2 > | |
21 <P> | |
22 | |
23 <H1> Token Scanner - Macro preprocessor and C Parser </H1> | |
24 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
25 WIDTH=1010 HEIGHT=2 > | |
26 <P> | |
27 <BR> | |
28 | |
29 <H2>Introduction</H2> | |
30 | |
31 The token scanner module, <tt>ts.syn</tt>, accomplishes the following | |
32 tasks: | |
33 <OL> | |
34 <LI> It reads the raw input, gathers tokens and identifies | |
35 them. </LI> | |
36 <LI> It analyzes conditional compilation directives and | |
37 skips over text that is to be omitted. </LI> | |
38 <LI> It analyzes macro definitions and maintains the macro | |
39 tables. </LI> | |
40 <LI> It identifies macro calls in the input stream and calls | |
41 the <tt>macro_expand()</tt> function to expand them. </LI> | |
42 <LI> It recognizes <tt>#include</tt> statements and calls itself | |
43 recursively to parse the include file. </LI> | |
44 </OL> | |
45 | |
46 The token_scanner parser, <tt>ts()</tt>, is called from a shell | |
47 function, <tt>scan_input(char *)</tt>, which takes the name | |
48 of a file | |
49 as an argument. <tt>scan_input()</tt> opens the file, calls | |
50 <tt>ts()</tt>, and | |
51 closes the file. <tt>scan_input()</tt> is called recursively by | |
52 <tt>include_file()</tt> when an <tt>#include</tt> statement | |
53 is found in the | |
54 input. | |
55 <P> | |
56 Output from the token scanner is directed to a token_sink | |
57 pointed to by the <tt>scanner_sink</tt> global variable. The main | |
58 program may set scanner sink to point to either a | |
59 <tt>token_translator</tt> or a <tt>c_parser</tt>. During the | |
60 course of | |
61 processing, the token scanner redirects output to a token | |
62 accumulator or to the conditional expression evaluator, as | |
63 necessary, by temporarily changing the value of | |
64 <tt>scanner_sink</tt>. | |
65 <P> | |
66 The token scanner module contains two syntax error | |
67 diagnostic procedures: <tt>syntax_error(char *)</tt> and | |
68 <tt>syntax_error_scanning(char *)</tt>. The former is set up to | |
69 provide correct line and column numbers for functions called | |
70 from reduction procedures in the token scanner. The latter | |
71 is set up to provide line and column numbers for errors | |
72 discovered in the scanner itself. Both functions accept a | |
73 pointer to an error message. | |
74 <P> | |
75 <BR> | |
76 | |
77 <H2> Theory of Operation </H2> | |
78 | |
79 The primary purpose of the token scanner is to identify the | |
80 C language tokens in the input file and pass them on to | |
81 another module for further processing. In order to package | |
82 them for transmission, the token scanner maintains a "token | |
83 dictionary", <tt>td</tt>, which enables it to characterize each | |
84 distinct input token with a single number. The token scanner | |
85 also classifies tokens according to the definitions of the C | |
86 language. The "token" that it passes on for further | |
87 processing is a pair consisting of an id field, and a value | |
88 field. The id field is defined by the <tt>token_id</tt> | |
89 enumeration | |
90 in <tt>token.h</tt>. The value field is the index of the | |
91 token in the | |
92 token dictionary, <tt>td</tt>. | |
93 <P> | |
94 To support its primary purpose, the token scanner deals with | |
95 several other problems. First, it identifies preprocessor | |
96 control lines which control conditional compilation and | |
97 skips input appropriately. Second, it fields <tt>#include</tt> | |
98 statements, and recurses to process include files. Third, it | |
99 fields <tt>#define</tt> statements and manages the macro definition | |
100 tables. Finally, it checks the tokens it identifies and | |
101 calls the macro/argument expansion module to expand them if | |
102 they turn out to be macros. | |
103 <P> | |
104 The conditional compilation logic in the token scanner is | |
105 carried out in its entirety by syntactic means. The only C | |
106 code involved deals with evaluating conditional statements. | |
107 <tt>#ifdef</tt> and <tt>#ifndef</tt> are quite | |
108 straightforward. <tt>#if</tt> is another | |
109 matter. To deal with the generality of this statement, token | |
110 scanner output is diverted to the expression evaluator | |
111 module, <tt>ex.syn</tt>, where the expression is evaluated. The | |
112 outcome of the calculation is then used to control a | |
113 semantically determined production in the token scanner. | |
114 <P> | |
115 Processing <tt>#include</tt> statements is reasonably | |
116 straightforward. Token scanner output is diverted to the | |
117 token accumulator, <tt>ta</tt>. The content of the token accumulator | |
118 is then translated back to ASCII string form. This takes | |
119 care of macro calls in the <tt>#include</tt> statement. Once the file | |
120 has been identified, <tt>scan_input()</tt> is called recursively to | |
121 deal with it. | |
122 <P> | |
123 The only complication with macro definitions is that the | |
124 tokens which comprise the body of a macro must not be | |
125 expanded until the macro is invoked. For that reason, there | |
126 are two different definitions of token in the token scanner: | |
127 "simple token" and "expanded token". The difference is that | |
128 simple tokens are not checked for macro calls. When a macro | |
129 definition is encountered, the token scanner output is | |
130 diverted to the token accumulator, so that the body of the | |
131 macro can be captured and stored. | |
132 <P> | |
133 When a macro call is recognized, the token scanner must pick | |
134 up the arguments for the macro. There are three | |
135 complications here: First, the tokens must not be scanned | |
136 for macros; second, the scan must distinguish the commas | |
137 that separate arguments from commas that may be contained | |
138 inside balanced parentheses within an argument; and finally, | |
139 leading white space tokens do not count as argument tokens. | |
140 <P> | |
141 <BR> | |
142 | |
143 <H2> Elements of the Token Scanner </H2> | |
144 | |
145 The remainder of this document describes the macro | |
146 definitions, the structure definitions, the static data | |
147 definitions, all configuration parameter settings, and all | |
148 non-terminal parsing tokens used in the token scanner. It | |
149 also explains each configuration parameter setting in the | |
150 syntax file. In <tt>ts.syn</tt>, each function that is defined is | |
151 preceded by a short explanation of its purpose. | |
152 <P> | |
153 <BR> | |
154 | |
155 <H2> Macro definitions </H2> | |
156 <DL> | |
157 <DT> <tt>GET_CONTEXT</tt> | |
158 <DD> The <tt>GET_CONTEXT</tt> macro provides the parser with context | |
159 information for the input character. (Instead of writing a | |
160 <tt>GET_CONTEXT</tt> macro, the context information could be stored | |
161 as part of <tt>GET_INPUT</tt>.) | |
162 | |
163 <DT> <tt>GET_INPUT</tt> | |
164 <DD> The <tt>GET_INPUT</tt> macro provides the next input | |
165 character for | |
166 the parser. If the parser used <b>pointer input</b> or <b>event | |
167 driven</b> input, a <tt>GET_INPUT</tt> macro would not be | |
168 necessary. The | |
169 default for <tt>GET_INPUT</tt> would read <tt>stdin</tt> and | |
170 so is not | |
171 satisfactory for this parser. | |
172 | |
173 <DT> <tt>PCB</tt> | |
174 <DD> Since the <b>declare pcb</b> switch has been turned off, AnaGram | |
175 will not define <tt>PCB</tt>. Making the parser control block part of | |
176 the file descriptor structure simplifies saving and | |
177 restoring the pcb for nested #include files. | |
178 | |
179 <DT> <tt>SYNTAX_ERROR</tt> | |
180 <DD> <tt>ts.syn</tt> defines the <tt>SYNTAX_ERROR</tt> macro, | |
181 since otherwise the | |
182 generated parser would use the default definition of | |
183 <tt>SYNTAX_ERROR</tt>, which would not provide the name of the file | |
184 currently being read. | |
185 </DL> | |
186 <P> | |
187 <BR> | |
188 | |
189 <H2> Local Structure Definitions </H2> | |
190 <DL><DT> <tt>location</tt> | |
191 <DD> <tt>location</tt> is a structure which records a line | |
192 number and a | |
193 column number. It is handed to AnaGram with the context type | |
194 statement found in the configuration segment. AnaGram then | |
195 declares two member fields of type <tt>location</tt> in the parser | |
196 control block: <tt>input_context</tt> and a stack, <tt>cs</tt>. In | |
197 <tt>scan_input()</tt>, the <tt>input_context</tt> variable | |
198 is set explicitly | |
199 with the current line and column number. In <tt>syntax_error()</tt> | |
200 the <tt>CONTEXT</tt> macro is used to extract the line and column | |
201 number at which the rule currently being reduced started. | |
202 | |
203 <DT> <tt>file_descriptor</tt> | |
204 <DD> <tt>file_descriptor</tt> contains the information that | |
205 needs to be | |
206 saved and restored when nested include files are processed. | |
207 </DL> | |
208 <P> | |
209 <BR> | |
210 | |
211 <H2> Static Variables </H2> | |
212 <DL><DT> <tt>error_modifier</tt> | |
213 <DD> Type: <tt>char *</tt><BR> | |
214 | |
215 The string identified by <tt>error_modifier</tt> is added to the | |
216 error diagnostic printed by <tt>syntax_error()</tt>. Normally it is | |
217 an empty string; however, when macros are being expanded it | |
218 is set so that the diagnostic will specify that the error | |
219 was found inside a macro expansion. | |
220 | |
221 <DT> <tt>input</tt> | |
222 <DD> Type: <tt>file_descriptor</tt><BR> | |
223 | |
224 <tt>input</tt> provides the name and stream pointer for the | |
225 currently active | |
226 input file. | |
227 | |
228 <DT> <tt>save_sink</tt> | |
229 <DD> Type: <tt>stack<token_sink *></tt><BR> | |
230 | |
231 This stack provides for saving and restoring <tt>scanner_sink</tt> | |
232 when it is necessary to divert the scanner output for | |
233 dealing with conditional expressions, macro definitions and | |
234 macro arguments. Actually, a stack is not necessary, since | |
235 such diversions never nest more than one level deep, but it | |
236 seems clearer to use a stack. | |
237 </DL> | |
238 <P> | |
239 <BR> | |
240 | |
241 <H2> Configuration Parameters </H2> | |
242 <DL><DT> <tt>~allow macros</tt> | |
243 <DD> This statement turns off the <b>allow macros</b> switch so that | |
244 AnaGram implements all reduction procedures as explicit | |
245 function definitions. This simplifies debugging at the cost | |
246 of a slight performance degradation. | |
247 | |
248 <DT> <tt>auto resynch</tt> | |
249 <DD> This switch turns on automatic resynchronization in case a | |
250 syntax error is encountered by the token scanner. | |
251 | |
252 <DT> <tt>context type = location</tt> | |
253 <DD> This statement specifies that the generated parser is to | |
254 track context automatically. The context variables have type | |
255 <tt>location</tt>. <tt>location</tt> is defined elsewhere to | |
256 consist of two | |
257 fields: line number and column number. | |
258 | |
259 <DT> <tt>~declare pcb</tt> | |
260 <DD> This statement tells AnaGram not to declare a parser control | |
261 block for the parser. The parser control block is declared | |
262 later as part of the <tt>file_descriptor</tt> structure. | |
263 | |
264 <DT> <tt>~error frame</tt> | |
265 <DD> This turns off the error frame portion of the automatic | |
266 syntax error diagnostic generator, since the context of the | |
267 error in the scanner syntax is of little interest. If an | |
268 error frame were to be used in diagnostics that of the C | |
269 parser would be more appropriate. | |
270 | |
271 <DT> <tt>error trace</tt> | |
272 <DD> This turns on the <b>error trace</b> functionality, so | |
273 that if the token | |
274 scanner encounters a syntax error it will write an <tt>.etr</tt> | |
275 file. | |
276 | |
277 <DT> <tt>line numbers</tt> | |
278 <DD> This statement causes AnaGram to include <tt>#line</tt> | |
279 statements in | |
280 the parser file so that your compiler can provided | |
281 diagnostics keyed to your syntax file. | |
282 | |
283 <DT> <tt>subgrammar</tt> | |
284 <DD> The basic token grammar for C is usually implemented using | |
285 some sort of regular expression parser, such as <tt>lex</tt>, which | |
286 always looks for the longest match to the regular | |
287 expression. In no case does the regular expression parser | |
288 use what follows a match to determine the nature of the | |
289 match. An LALR parser generator, on the other hand, normally | |
290 looks not only at the content of a token but also looks | |
291 ahead. The subgrammar declaration tells AnaGram not to look | |
292 ahead but to parse these tokens based only on their internal | |
293 structure. Thus the conflicts that would normally be | |
294 detected are not seen. To see what happens if lookahead is | |
295 allowed, simply comment out any one of these subgrammar | |
296 statements and look at the conflicts that result. | |
297 | |
298 <DT> <tt>~test range</tt> | |
299 <DD> This statement tells AnaGram not to check input characters | |
300 to see if they are within allowable limits. This checking is | |
301 not necessary since the token scanner is reading a text file | |
302 and cannot possibly get an out of range token. | |
303 </DL> | |
304 <P> | |
305 <BR> | |
306 | |
307 <H2> Scanner Tokens, in alphabetical order </H2> | |
308 <DL><DT> any text | |
309 <DD> These productions are used when skipping over text. "any | |
310 text" consists of all characters other than eof, newline and | |
311 backslash, as well as any character (including newline and | |
312 backslash) that is quoted with a preceding backslash | |
313 character. | |
314 | |
315 <DT> arg element | |
316 <DD> An "arg element" is a token in the argument list of a macro. | |
317 It is essentially the same as "simple token" except that | |
318 commas must be detected as separators and nested parentheses | |
319 must be recognized. An "arg element" is either a space or an | |
320 "initial arg element". | |
321 | |
322 <DT> character constant | |
323 <DD> A "character constant" is a quoted character or escape | |
324 sequence. The token scanner does not inquire closely into | |
325 the internal nature of the character constant. | |
326 | |
327 <DT> comment | |
328 <DD> A "comment" consists of a comment head followed by the | |
329 closing "*/". | |
330 | |
331 <DT> comment head | |
332 <DD> A "comment head" consists of the entire comment up to the | |
333 closing "*/". If a complete comment is found following a | |
334 comment head, its treatment depends on whether one believes, | |
335 with ANSI, that comments should not be nested, or whether | |
336 one prefers to allow nested comments. Followers of the ANSI | |
337 principle will want "comment head, comment" to reduce to | |
338 "comment". Believers in nested comments will want to finish | |
339 the comment that was in progress when the nested comment was | |
340 encountered, so they will want "comment head, comment" to | |
341 reduce to "comment head", which will allow the search for | |
342 "*/" to continue. | |
343 | |
344 <DT> conditional block | |
345 <DD> A "conditional block" is an #if, #ifdef, or #ifndef line and | |
346 all following lines through the terminating #endif. If the | |
347 initial condition turns out to be true, then everything has | |
348 to be skipped following an #elif or #else line. If the | |
349 initial condition is false, everything has to be skipped | |
350 until a true #elif condition or an #else line is found. | |
351 | |
352 <DT> confusion | |
353 <DD> This token is designed to deal with a curious anomaly of C. | |
354 Integers which begin with a zero are octal, but floating | |
355 point numbers may have leading zeroes without losing their | |
356 fundamental decimal nature. "confusion" is an octal integer | |
357 that is followed by an eight or a nine. This will become | |
358 legitimate if eventually a decimal point or an exponent | |
359 field is encountered. | |
360 | |
361 <DT> control line | |
362 <DD> "control line" consists of any preprocessor control line | |
363 other than those associated with conditional compilation. | |
364 | |
365 <DT> decimal constant | |
366 <DD> A "decimal constant" is a "decimal integer" and any | |
367 following qualifiers. | |
368 | |
369 <DT> decimal integer | |
370 <DD> The digits which comprise the integer are pushed onto the | |
371 string accumulator. When the integer is complete, the string | |
372 will be entered into the token dictionary and subsequently | |
373 it will be described by its index in the token dictionary. | |
374 | |
375 <DT> defined | |
376 <DD> See "expanded word". id_macro will recognize "defined" only | |
377 when the if_clause switch is set. | |
378 | |
379 <DT> eof | |
380 <DD> end of file: equal to the null character. | |
381 | |
382 <DT> eol | |
383 <DD> end of line: a newline and all immediately following white | |
384 space or newline characters. eol is declared to be a | |
385 subgrammar since it is used in circumstances where space can | |
386 legitimately follow, according to the syntax as written. | |
387 | |
388 <DT> else if header | |
389 <DD> This production is simply a portion of the rule for the | |
390 #elif statement. It is separated out in order to provide a | |
391 hook on which to hang the call to init_condition(), which | |
392 diverts scanner output to the expression_evaluator which | |
393 will calculate the value of the conditional expression. | |
394 | |
395 <DT> else section | |
396 <DD> An "else section" is an #else line and all immediately | |
397 following complete sections. An "else section" and a "skip | |
398 else section" are the same except that in an "else section" | |
399 tokens are sent to the scanner output and in a "skip else | |
400 section" they are discarded. | |
401 | |
402 <DT> endif line | |
403 <DD> An "endif line" is simply a line that begins #endif | |
404 | |
405 <DT> expanded token | |
406 <DD> The word "token" is used here in the sense of Kernighan and | |
407 Ritchie, 2nd Edition, Appendix A, p. 191. In this program a | |
408 "simple token" is one which is simply passed on without | |
409 regard to macro processing. An "expanded token" is one which | |
410 has been checked to see if it is a macro identifier and, if | |
411 so, expanded. "simple tokens" are recognized only in the | |
412 bodies of macro definitions. Therefore spaces and '#' | |
413 characters are passed on. For "expanded tokens" they are | |
414 discarded. | |
415 | |
416 <DT> expanded word | |
417 <DD> This is the treatment of a simple identifier as an "expanded | |
418 token". "variable", "simple macro", "macro", and "defined" | |
419 are the various outcomes of semantic analysis of "name | |
420 string" performed by id_macro(). In this case reserved words | |
421 and identifiers which are not the names of macros are | |
422 subsumed under the rubric "variable". These tokens are | |
423 simply passed on to the scanner output. | |
424 <P> | |
425 The distinction between "macro" and "simple macro" depends | |
426 on whether the macro was defined with or without following | |
427 parentheses. A "simple macro" is expanded by calling | |
428 expand(). expand() simply serves as a local interface to the | |
429 expand_text() function defined in <tt>mas.syn</tt>. | |
430 <P> | |
431 If a "macro" was defined with parentheses but appears bereft | |
432 of an argument list, it is treated as a simple identifier | |
433 and passed on to the output. Otherwise the argument tokens | |
434 for the macro are gathered and stacked on the token | |
435 accumulator, using "macro arg list". Finally, the macro is | |
436 expanded in the same way as a "simple macro". Note that | |
437 "macro arg list" provides a count of the number of arguments | |
438 found inside the balanced parentheses. | |
439 <P> | |
440 If "if_clause" is set, it means that the conditional | |
441 expression of an #if or #elif line is being evaluated. In | |
442 this case, the pseudo-function defined() must be recognized | |
443 to determine whether a macro has or has not been defined. | |
444 The defined() function returns a "1" or "0" token depending | |
445 on whether the macro has been defined. | |
446 | |
447 <DT> exponent | |
448 <DD> This is simply the exponent field on a floating point number | |
449 with optional sign. | |
450 | |
451 | |
452 <DT> false condition | |
453 <DD> The "true condition" and "false condition" tokens are | |
454 semantically determined. They consist of #if, #ifdef, or | |
455 #ifndef lines. If the result of the test is true the | |
456 reduction token is "true condition", otherwise it is "false | |
457 condition". | |
458 | |
459 <DT> false else condition | |
460 <DD> The "true else condition" and "false else condition" tokens | |
461 are semantically determined. They consist of an #elif line. | |
462 If the value of the conditional expression is true the | |
463 reduction token is "true else condition", otherwise it is | |
464 "false else condition". | |
465 | |
466 <DT> false if section: | |
467 <DD> A "false if section" is a #if, #ifdef, or #ifndef condition | |
468 that turns out to be false followed by any number, including | |
469 zero, of complete sections or false #elif condition lines. | |
470 All of the text within a "false if section" is discarded. | |
471 <DT> floating qualifier | |
472 <DD> These productions are simply the optional qualifiers to | |
473 specify that a constant is to be treated as a float or as a | |
474 long double. | |
475 | |
476 <DT> hex constant | |
477 <DD> A "hex constant" is simply a "hex integer" plus any | |
478 following qualifiers. | |
479 | |
480 <DT> hex integer | |
481 <DD> The digits which comprise the integer are pushed onto the | |
482 string accumulator. When the integer is complete, the string | |
483 will be entered into the token dictionary and subsequently | |
484 it will be described by its index in the token dictionary. | |
485 | |
486 <DT> if header | |
487 <DD> This production is simply a portion of the rule for the #if | |
488 statement. It is separated out in order to provide a hook on | |
489 which to hang the call to init_condition(), which diverts | |
490 scanner output to the expression evaluator which will | |
491 calculate the value of the conditional expression. | |
492 | |
493 <DT> initial arg element | |
494 <DD> In gathering macro arguments, spaces must not be confused | |
495 with a true argument. Therefore, the arg element token is | |
496 broken down into two pieces so that each argument begins | |
497 with a nonblank token. | |
498 | |
499 <DT> include header | |
500 <DD> "include header" simply represents the initial portion of an | |
501 #include line and provides a hook for a reduction procedure | |
502 which diverts scanner output to the token accumulator. This | |
503 diversion allows the text which follows #include to be | |
504 scanned for macros and accumulated. The include_file() | |
505 function will be called to actually identify and scan the | |
506 specified file. | |
507 | |
508 <DT> input file | |
509 <DD> This is the grammar, or start token. It describes the entire | |
510 file as alternating sections and eols, terminated by an eof | |
511 | |
512 <DT> integer constant | |
513 <DD> These productions simply gather together the varieties of | |
514 integer constants under one umbrella. | |
515 | |
516 <DT> integer qualifier | |
517 <DD> These productions are simply the optional qualifiers to | |
518 specify that an "integer constant" is to be treated as | |
519 unsigned, long, or both. | |
520 | |
521 <DT> macro | |
522 <DD> See "expanded word". id_macro specifies "macro" or "simple | |
523 macro" depending on whether the named macro was defined with | |
524 or without following parentheses. | |
525 | |
526 <DT> macro arg list | |
527 <DD> A "macro arg list" can be either empty or can consist of any | |
528 number of token sequences separated by commas. Commas that | |
529 are protected by nested parentheses do not separate | |
530 arguments. Argument strings are accumulated on the token | |
531 accumulator and counted by "macro args". | |
532 | |
533 <DT> macro args | |
534 <DD> Each argument to a macro is gathered on a separate level of | |
535 the token accumulator, so the token accumulator level is | |
536 incremented before each argument, and the arguments are | |
537 counted. | |
538 | |
539 <DT> macro definition header | |
540 <DD> The "macro definition header" consists of the #define line | |
541 up to the beginning of the body text of the macro. It serves | |
542 as a hook to call init_macro_def() which begins the macro | |
543 definition and diverts scanner output to the token | |
544 accumulator. The macro definition will be completed by the | |
545 save_macro_body() function once the entire macro body has | |
546 been accumulated. Note that the tokens for the macro body | |
547 are not examined for macro calls. | |
548 | |
549 <DT> name string | |
550 <DD> "name string" is simply an accumulation on the string | |
551 accumulator of the characters which make up an identifier. | |
552 | |
553 <DT> nested elements | |
554 <DD> "nested elements" are "arg elements" that are found inside | |
555 nested parentheses. | |
556 | |
557 <DT> not control mark | |
558 <DD> This consists of any input character excepting eof, newline, | |
559 backslash and '#', but including any of these if preceded by | |
560 a backslash. It serves, at the beginning of a line, to | |
561 distinguish ordinary lines of text from preprocessor control | |
562 lines. | |
563 <DT> | |
564 octal integer | |
565 <DD> The digits which comprise the integer are pushed onto the | |
566 string accumulator. When the integer is complete, the string | |
567 will be entered into the token dictionary and subsequently | |
568 it will be described by its index in the token dictionary. | |
569 | |
570 <DT> operator | |
571 <DD> This is simply an inventory of all the multi-character | |
572 operators in C. | |
573 | |
574 <DT> parameter list | |
575 <DD> "parameter list" is simply a wrapper about "names" which | |
576 allows for empty parentheses. Note that both the "names" | |
577 token and the "parameter list" tokens provide the count of | |
578 the number of parameter names found inside the parentheses. | |
579 The names themselves have been stacked on the string | |
580 accumulator. | |
581 | |
582 <DT> qualified real | |
583 <DD> This production exists to allow the "floating qualifier" to | |
584 be appended to a "real constant". | |
585 <DT> real | |
586 <DD> These productions itemize the various ways of writing a | |
587 floating point number with and without decimal points and | |
588 with and without exponent fields. | |
589 | |
590 <DT> real constant | |
591 <DD> This production is simply an envelope to contain "real" and | |
592 write the output code once instead of four times. | |
593 | |
594 <DT> section | |
595 <DD> This is a logical block of input. It is either a single line | |
596 of ordinary code, a control line such as #define or #undef, | |
597 or an entire conditional compilation block, i.e., everything | |
598 from the #if to the closing #endif. Notice that the eol that | |
599 terminates a "section" is not part of the "section". The | |
600 only difference between a "section" and a "skip section" is | |
601 that in a "section", all tokens are sent to the scanner | |
602 output while in a "skip section", all input is discarded. | |
603 | |
604 <DT> separator | |
605 <DD> This is simply a gathering together of all the tokens that | |
606 are neither white space nor identifiers, since they are | |
607 treated uniformly throughout the grammar. | |
608 | |
609 <DT> simple macro | |
610 <DD> See "expanded word". | |
611 | |
612 <DT> simple real | |
613 <DD> A "simple real" is one which has a decimal point and has | |
614 digits on at least one side of the decimal point. | |
615 Unaccompanied decimal points will be turned away at the | |
616 door. | |
617 <DT> simple token | |
618 <DD> The word "token" is used here in the sense of Kernighan and | |
619 Ritchie, 2nd Edition, Appendix A, p. 191. In this program a | |
620 "simple token" is one which is simply passed on without | |
621 regard to macro processing. An "expanded token" is one which | |
622 has been checked to see if it is a | |
623 <P> macro identifier and, if so, expanded. "simple tokens" are | |
624 recognized only in the bodies of macro definitions. | |
625 Therefore spaces and '#' characters are passed on. For | |
626 "expanded tokens" they are discarded. | |
627 | |
628 <DT> skip else line | |
629 <DD> For purposes of skipping over complete conditional sections | |
630 #elif and #else lines are equivalent. | |
631 | |
632 <DT> skip else section | |
633 <DD> A "skip else section" consists of the #else or #elif line | |
634 following a satisfied conditional and all subsequent | |
635 sections and #elif and #else lines. All input in the "skip | |
636 else section" is discarded. | |
637 | |
638 <DT> skip if section | |
639 <DD> A "skip if section" consists of an #if, #ifdef, or #ifndef | |
640 line, and all following complete "sections" (represented as | |
641 "skip sections", so their content will be ignored) and #else | |
642 and #elif lines. | |
643 | |
644 <DT> skip line | |
645 <DD> When skipping text, we have to distinguish between lines | |
646 which begin with the control mark ('#') and those which | |
647 don't so that we deal correctly with nested #endif | |
648 statements. We wouldn't want to terminate a block of | |
649 uncompiled code with the wrong #endif. | |
650 | |
651 <DT> skip section | |
652 <DD> A "skip section" is simply a "section" that follows an | |
653 unsatisfied conditional. In a "skip section", all input is | |
654 discarded. | |
655 | |
656 <DT> space | |
657 <DD> space consists of either a blank or a comment. If a comment | |
658 is found, it is replaced with a blank. | |
659 <DT> simple chars | |
660 <DD> "simple chars" consists of the body of a character constant | |
661 up to but not including the final quote. | |
662 | |
663 <DT> string chars | |
664 <DD> "string chars" consists of the body of a string literal up | |
665 to but not including the final double quote. | |
666 | |
667 <DT> string literal | |
668 <DD> A "string literal" is simply a quoted string. It is | |
669 accumulated on the string accumulator. | |
670 | |
671 <DT> true condition | |
672 <DD> The "true condition" and "false condition" tokens are | |
673 semantically determined. They consist of #if, #ifdef, or | |
674 #ifndef lines. If the result of the test is true the | |
675 reduction token is "true condition", otherwise it is "false | |
676 condition". | |
677 | |
678 <DT> true condition | |
679 <DD> The "true condition" and "false condition" tokens are | |
680 semantically determined. They consist of #if, #ifdef, or | |
681 #ifndef lines. If the result of the test is true the | |
682 reduction token is "true condition", otherwise it is "false | |
683 condition". | |
684 | |
685 <DT> true else condition | |
686 <DD> The "true else condition" and "false else condition" tokens | |
687 are semantically determined. They consist of an #elif line. | |
688 If the value of the conditional expression is true the | |
689 reduction token is "true else condition", otherwise it is | |
690 "false else condition". | |
691 | |
692 <DT> true if section | |
693 <DD> A "true if section" is a true #if, #ifdef, or #ifndef, | |
694 followed by any number of complete sections, including zero. | |
695 Alternatively, it could be a "false if section" that is | |
696 followed by a true #elif condition, followed by any number | |
697 of complete "sections". All input in a "true if section" | |
698 subsequent to the true condition is passed on to the scanner | |
699 output. | |
700 | |
701 <DT> word | |
702 <DD> This is the treatment of a simple identifier as a "simple | |
703 token". The name_token() procedure is called to pop the name | |
704 string from the string accumulator, identify it in the token | |
705 dictionary and assign a token_id to it by checking to see if | |
706 it is a reserved word. | |
707 | |
708 <DT> variable | |
709 <DD> See "expanded word". | |
710 | |
711 ws | |
712 <DD> The definition for ws as space... simply allows a briefer | |
713 reference in those places in the grammar where it is | |
714 necessary to skip over white space. | |
715 </DL> | |
716 <P> | |
717 <BR> | |
718 | |
719 | |
720 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
721 WIDTH=1010 HEIGHT=2 > | |
722 <P> | |
723 <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software" | |
724 WIDTH=181 HEIGHT=25> | |
725 <BR CLEAR="right"> | |
726 | |
727 <P> | |
728 Back to : | |
729 <A HREF="../../index.html">Index</A> | | |
730 <A HREF="index.html">Macro preprocessor overview</A> | |
731 <P> | |
732 | |
733 <ADDRESS><FONT SIZE="-1"> | |
734 AnaGram parser generator - examples<BR> | |
735 Token Scanner - Macro preprocessor and C Parser <BR> | |
736 Copyright © 1993-1999, Parsifal Software. <BR> | |
737 All Rights Reserved.<BR> | |
738 </FONT></ADDRESS> | |
739 | |
740 </BODY> | |
741 </HTML> | |
742 |