comparison doc/misc/html/examples/mpp/ts.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2 <HTML>
3 <HEAD>
4 <TITLE> Token Scanner - Macro preprocessor and C Parser </TITLE>
5 </HEAD>
6
7 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
8 TEXT="#000000" LINK="#0033CC"
9 VLINK="#CC0033" ALINK="#CC0099">
10
11 <P>
12 <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
13 WIDTH=124 HEIGHT=30 >
14 <BR CLEAR="all">
15 Back to :
16 <A HREF="../../index.html">Index</A> |
17 <A HREF="index.html">Macro preprocessor overview</A>
18 <P>
19 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
20 WIDTH=1010 HEIGHT=2 >
21 <P>
22
23 <H1> Token Scanner - Macro preprocessor and C Parser </H1>
24 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
25 WIDTH=1010 HEIGHT=2 >
26 <P>
27 <BR>
28
29 <H2>Introduction</H2>
30
31 The token scanner module, <tt>ts.syn</tt>, accomplishes the following
32 tasks:
33 <OL>
34 <LI> It reads the raw input, gathers tokens and identifies
35 them. </LI>
36 <LI> It analyzes conditional compilation directives and
37 skips over text that is to be omitted. </LI>
38 <LI> It analyzes macro definitions and maintains the macro
39 tables. </LI>
40 <LI> It identifies macro calls in the input stream and calls
41 the <tt>macro_expand()</tt> function to expand them. </LI>
42 <LI> It recognizes <tt>#include</tt> statements and calls itself
43 recursively to parse the include file. </LI>
44 </OL>
45
46 The token_scanner parser, <tt>ts()</tt>, is called from a shell
47 function, <tt>scan_input(char *)</tt>, which takes the name
48 of a file
49 as an argument. <tt>scan_input()</tt> opens the file, calls
50 <tt>ts()</tt>, and
51 closes the file. <tt>scan_input()</tt> is called recursively by
52 <tt>include_file()</tt> when an <tt>#include</tt> statement
53 is found in the
54 input.
55 <P>
56 Output from the token scanner is directed to a token_sink
57 pointed to by the <tt>scanner_sink</tt> global variable. The main
58 program may set scanner sink to point to either a
59 <tt>token_translator</tt> or a <tt>c_parser</tt>. During the
60 course of
61 processing, the token scanner redirects output to a token
62 accumulator or to the conditional expression evaluator, as
63 necessary, by temporarily changing the value of
64 <tt>scanner_sink</tt>.
65 <P>
66 The token scanner module contains two syntax error
67 diagnostic procedures: <tt>syntax_error(char *)</tt> and
68 <tt>syntax_error_scanning(char *)</tt>. The former is set up to
69 provide correct line and column numbers for functions called
70 from reduction procedures in the token scanner. The latter
71 is set up to provide line and column numbers for errors
72 discovered in the scanner itself. Both functions accept a
73 pointer to an error message.
74 <P>
75 <BR>
76
77 <H2> Theory of Operation </H2>
78
79 The primary purpose of the token scanner is to identify the
80 C language tokens in the input file and pass them on to
81 another module for further processing. In order to package
82 them for transmission, the token scanner maintains a "token
83 dictionary", <tt>td</tt>, which enables it to characterize each
84 distinct input token with a single number. The token scanner
85 also classifies tokens according to the definitions of the C
86 language. The "token" that it passes on for further
87 processing is a pair consisting of an id field, and a value
88 field. The id field is defined by the <tt>token_id</tt>
89 enumeration
90 in <tt>token.h</tt>. The value field is the index of the
91 token in the
92 token dictionary, <tt>td</tt>.
93 <P>
94 To support its primary purpose, the token scanner deals with
95 several other problems. First, it identifies preprocessor
96 control lines which control conditional compilation and
97 skips input appropriately. Second, it fields <tt>#include</tt>
98 statements, and recurses to process include files. Third, it
99 fields <tt>#define</tt> statements and manages the macro definition
100 tables. Finally, it checks the tokens it identifies and
101 calls the macro/argument expansion module to expand them if
102 they turn out to be macros.
103 <P>
104 The conditional compilation logic in the token scanner is
105 carried out in its entirety by syntactic means. The only C
106 code involved deals with evaluating conditional statements.
107 <tt>#ifdef</tt> and <tt>#ifndef</tt> are quite
108 straightforward. <tt>#if</tt> is another
109 matter. To deal with the generality of this statement, token
110 scanner output is diverted to the expression evaluator
111 module, <tt>ex.syn</tt>, where the expression is evaluated. The
112 outcome of the calculation is then used to control a
113 semantically determined production in the token scanner.
114 <P>
115 Processing <tt>#include</tt> statements is reasonably
116 straightforward. Token scanner output is diverted to the
117 token accumulator, <tt>ta</tt>. The content of the token accumulator
118 is then translated back to ASCII string form. This takes
119 care of macro calls in the <tt>#include</tt> statement. Once the file
120 has been identified, <tt>scan_input()</tt> is called recursively to
121 deal with it.
122 <P>
123 The only complication with macro definitions is that the
124 tokens which comprise the body of a macro must not be
125 expanded until the macro is invoked. For that reason, there
126 are two different definitions of token in the token scanner:
127 "simple token" and "expanded token". The difference is that
128 simple tokens are not checked for macro calls. When a macro
129 definition is encountered, the token scanner output is
130 diverted to the token accumulator, so that the body of the
131 macro can be captured and stored.
132 <P>
133 When a macro call is recognized, the token scanner must pick
134 up the arguments for the macro. There are three
135 complications here: First, the tokens must not be scanned
136 for macros; second, the scan must distinguish the commas
137 that separate arguments from commas that may be contained
138 inside balanced parentheses within an argument; and finally,
139 leading white space tokens do not count as argument tokens.
140 <P>
141 <BR>
142
143 <H2> Elements of the Token Scanner </H2>
144
145 The remainder of this document describes the macro
146 definitions, the structure definitions, the static data
147 definitions, all configuration parameter settings, and all
148 non-terminal parsing tokens used in the token scanner. It
149 also explains each configuration parameter setting in the
150 syntax file. In <tt>ts.syn</tt>, each function that is defined is
151 preceded by a short explanation of its purpose.
152 <P>
153 <BR>
154
155 <H2> Macro definitions </H2>
156 <DL>
157 <DT> <tt>GET_CONTEXT</tt>
158 <DD> The <tt>GET_CONTEXT</tt> macro provides the parser with context
159 information for the input character. (Instead of writing a
160 <tt>GET_CONTEXT</tt> macro, the context information could be stored
161 as part of <tt>GET_INPUT</tt>.)
162
163 <DT> <tt>GET_INPUT</tt>
164 <DD> The <tt>GET_INPUT</tt> macro provides the next input
165 character for
166 the parser. If the parser used <b>pointer input</b> or <b>event
167 driven</b> input, a <tt>GET_INPUT</tt> macro would not be
168 necessary. The
169 default for <tt>GET_INPUT</tt> would read <tt>stdin</tt> and
170 so is not
171 satisfactory for this parser.
172
173 <DT> <tt>PCB</tt>
174 <DD> Since the <b>declare pcb</b> switch has been turned off, AnaGram
175 will not define <tt>PCB</tt>. Making the parser control block part of
176 the file descriptor structure simplifies saving and
177 restoring the pcb for nested #include files.
178
179 <DT> <tt>SYNTAX_ERROR</tt>
180 <DD> <tt>ts.syn</tt> defines the <tt>SYNTAX_ERROR</tt> macro,
181 since otherwise the
182 generated parser would use the default definition of
183 <tt>SYNTAX_ERROR</tt>, which would not provide the name of the file
184 currently being read.
185 </DL>
186 <P>
187 <BR>
188
189 <H2> Local Structure Definitions </H2>
190 <DL><DT> <tt>location</tt>
191 <DD> <tt>location</tt> is a structure which records a line
192 number and a
193 column number. It is handed to AnaGram with the context type
194 statement found in the configuration segment. AnaGram then
195 declares two member fields of type <tt>location</tt> in the parser
196 control block: <tt>input_context</tt> and a stack, <tt>cs</tt>. In
197 <tt>scan_input()</tt>, the <tt>input_context</tt> variable
198 is set explicitly
199 with the current line and column number. In <tt>syntax_error()</tt>
200 the <tt>CONTEXT</tt> macro is used to extract the line and column
201 number at which the rule currently being reduced started.
202
203 <DT> <tt>file_descriptor</tt>
204 <DD> <tt>file_descriptor</tt> contains the information that
205 needs to be
206 saved and restored when nested include files are processed.
207 </DL>
208 <P>
209 <BR>
210
211 <H2> Static Variables </H2>
212 <DL><DT> <tt>error_modifier</tt>
213 <DD> Type: <tt>char *</tt><BR>
214
215 The string identified by <tt>error_modifier</tt> is added to the
216 error diagnostic printed by <tt>syntax_error()</tt>. Normally it is
217 an empty string; however, when macros are being expanded it
218 is set so that the diagnostic will specify that the error
219 was found inside a macro expansion.
220
221 <DT> <tt>input</tt>
222 <DD> Type: <tt>file_descriptor</tt><BR>
223
224 <tt>input</tt> provides the name and stream pointer for the
225 currently active
226 input file.
227
228 <DT> <tt>save_sink</tt>
229 <DD> Type: <tt>stack&lt;token_sink *&gt;</tt><BR>
230
231 This stack provides for saving and restoring <tt>scanner_sink</tt>
232 when it is necessary to divert the scanner output for
233 dealing with conditional expressions, macro definitions and
234 macro arguments. Actually, a stack is not necessary, since
235 such diversions never nest more than one level deep, but it
236 seems clearer to use a stack.
237 </DL>
238 <P>
239 <BR>
240
241 <H2> Configuration Parameters </H2>
242 <DL><DT> <tt>~allow macros</tt>
243 <DD> This statement turns off the <b>allow macros</b> switch so that
244 AnaGram implements all reduction procedures as explicit
245 function definitions. This simplifies debugging at the cost
246 of a slight performance degradation.
247
248 <DT> <tt>auto resynch</tt>
249 <DD> This switch turns on automatic resynchronization in case a
250 syntax error is encountered by the token scanner.
251
252 <DT> <tt>context type = location</tt>
253 <DD> This statement specifies that the generated parser is to
254 track context automatically. The context variables have type
255 <tt>location</tt>. <tt>location</tt> is defined elsewhere to
256 consist of two
257 fields: line number and column number.
258
259 <DT> <tt>~declare pcb</tt>
260 <DD> This statement tells AnaGram not to declare a parser control
261 block for the parser. The parser control block is declared
262 later as part of the <tt>file_descriptor</tt> structure.
263
264 <DT> <tt>~error frame</tt>
265 <DD> This turns off the error frame portion of the automatic
266 syntax error diagnostic generator, since the context of the
267 error in the scanner syntax is of little interest. If an
268 error frame were to be used in diagnostics that of the C
269 parser would be more appropriate.
270
271 <DT> <tt>error trace</tt>
272 <DD> This turns on the <b>error trace</b> functionality, so
273 that if the token
274 scanner encounters a syntax error it will write an <tt>.etr</tt>
275 file.
276
277 <DT> <tt>line numbers</tt>
278 <DD> This statement causes AnaGram to include <tt>#line</tt>
279 statements in
280 the parser file so that your compiler can provided
281 diagnostics keyed to your syntax file.
282
283 <DT> <tt>subgrammar</tt>
284 <DD> The basic token grammar for C is usually implemented using
285 some sort of regular expression parser, such as <tt>lex</tt>, which
286 always looks for the longest match to the regular
287 expression. In no case does the regular expression parser
288 use what follows a match to determine the nature of the
289 match. An LALR parser generator, on the other hand, normally
290 looks not only at the content of a token but also looks
291 ahead. The subgrammar declaration tells AnaGram not to look
292 ahead but to parse these tokens based only on their internal
293 structure. Thus the conflicts that would normally be
294 detected are not seen. To see what happens if lookahead is
295 allowed, simply comment out any one of these subgrammar
296 statements and look at the conflicts that result.
297
298 <DT> <tt>~test range</tt>
299 <DD> This statement tells AnaGram not to check input characters
300 to see if they are within allowable limits. This checking is
301 not necessary since the token scanner is reading a text file
302 and cannot possibly get an out of range token.
303 </DL>
304 <P>
305 <BR>
306
307 <H2> Scanner Tokens, in alphabetical order </H2>
308 <DL><DT> any text
309 <DD> These productions are used when skipping over text. "any
310 text" consists of all characters other than eof, newline and
311 backslash, as well as any character (including newline and
312 backslash) that is quoted with a preceding backslash
313 character.
314
315 <DT> arg element
316 <DD> An "arg element" is a token in the argument list of a macro.
317 It is essentially the same as "simple token" except that
318 commas must be detected as separators and nested parentheses
319 must be recognized. An "arg element" is either a space or an
320 "initial arg element".
321
322 <DT> character constant
323 <DD> A "character constant" is a quoted character or escape
324 sequence. The token scanner does not inquire closely into
325 the internal nature of the character constant.
326
327 <DT> comment
328 <DD> A "comment" consists of a comment head followed by the
329 closing "*/".
330
331 <DT> comment head
332 <DD> A "comment head" consists of the entire comment up to the
333 closing "*/". If a complete comment is found following a
334 comment head, its treatment depends on whether one believes,
335 with ANSI, that comments should not be nested, or whether
336 one prefers to allow nested comments. Followers of the ANSI
337 principle will want "comment head, comment" to reduce to
338 "comment". Believers in nested comments will want to finish
339 the comment that was in progress when the nested comment was
340 encountered, so they will want "comment head, comment" to
341 reduce to "comment head", which will allow the search for
342 "*/" to continue.
343
344 <DT> conditional block
345 <DD> A "conditional block" is an #if, #ifdef, or #ifndef line and
346 all following lines through the terminating #endif. If the
347 initial condition turns out to be true, then everything has
348 to be skipped following an #elif or #else line. If the
349 initial condition is false, everything has to be skipped
350 until a true #elif condition or an #else line is found.
351
352 <DT> confusion
353 <DD> This token is designed to deal with a curious anomaly of C.
354 Integers which begin with a zero are octal, but floating
355 point numbers may have leading zeroes without losing their
356 fundamental decimal nature. "confusion" is an octal integer
357 that is followed by an eight or a nine. This will become
358 legitimate if eventually a decimal point or an exponent
359 field is encountered.
360
361 <DT> control line
362 <DD> "control line" consists of any preprocessor control line
363 other than those associated with conditional compilation.
364
365 <DT> decimal constant
366 <DD> A "decimal constant" is a "decimal integer" and any
367 following qualifiers.
368
369 <DT> decimal integer
370 <DD> The digits which comprise the integer are pushed onto the
371 string accumulator. When the integer is complete, the string
372 will be entered into the token dictionary and subsequently
373 it will be described by its index in the token dictionary.
374
375 <DT> defined
376 <DD> See "expanded word". id_macro will recognize "defined" only
377 when the if_clause switch is set.
378
379 <DT> eof
380 <DD> end of file: equal to the null character.
381
382 <DT> eol
383 <DD> end of line: a newline and all immediately following white
384 space or newline characters. eol is declared to be a
385 subgrammar since it is used in circumstances where space can
386 legitimately follow, according to the syntax as written.
387
388 <DT> else if header
389 <DD> This production is simply a portion of the rule for the
390 #elif statement. It is separated out in order to provide a
391 hook on which to hang the call to init_condition(), which
392 diverts scanner output to the expression_evaluator which
393 will calculate the value of the conditional expression.
394
395 <DT> else section
396 <DD> An "else section" is an #else line and all immediately
397 following complete sections. An "else section" and a "skip
398 else section" are the same except that in an "else section"
399 tokens are sent to the scanner output and in a "skip else
400 section" they are discarded.
401
402 <DT> endif line
403 <DD> An "endif line" is simply a line that begins #endif
404
405 <DT> expanded token
406 <DD> The word "token" is used here in the sense of Kernighan and
407 Ritchie, 2nd Edition, Appendix A, p. 191. In this program a
408 "simple token" is one which is simply passed on without
409 regard to macro processing. An "expanded token" is one which
410 has been checked to see if it is a macro identifier and, if
411 so, expanded. "simple tokens" are recognized only in the
412 bodies of macro definitions. Therefore spaces and '#'
413 characters are passed on. For "expanded tokens" they are
414 discarded.
415
416 <DT> expanded word
417 <DD> This is the treatment of a simple identifier as an "expanded
418 token". "variable", "simple macro", "macro", and "defined"
419 are the various outcomes of semantic analysis of "name
420 string" performed by id_macro(). In this case reserved words
421 and identifiers which are not the names of macros are
422 subsumed under the rubric "variable". These tokens are
423 simply passed on to the scanner output.
424 <P>
425 The distinction between "macro" and "simple macro" depends
426 on whether the macro was defined with or without following
427 parentheses. A "simple macro" is expanded by calling
428 expand(). expand() simply serves as a local interface to the
429 expand_text() function defined in <tt>mas.syn</tt>.
430 <P>
431 If a "macro" was defined with parentheses but appears bereft
432 of an argument list, it is treated as a simple identifier
433 and passed on to the output. Otherwise the argument tokens
434 for the macro are gathered and stacked on the token
435 accumulator, using "macro arg list". Finally, the macro is
436 expanded in the same way as a "simple macro". Note that
437 "macro arg list" provides a count of the number of arguments
438 found inside the balanced parentheses.
439 <P>
440 If "if_clause" is set, it means that the conditional
441 expression of an #if or #elif line is being evaluated. In
442 this case, the pseudo-function defined() must be recognized
443 to determine whether a macro has or has not been defined.
444 The defined() function returns a "1" or "0" token depending
445 on whether the macro has been defined.
446
447 <DT> exponent
448 <DD> This is simply the exponent field on a floating point number
449 with optional sign.
450
451
452 <DT> false condition
453 <DD> The "true condition" and "false condition" tokens are
454 semantically determined. They consist of #if, #ifdef, or
455 #ifndef lines. If the result of the test is true the
456 reduction token is "true condition", otherwise it is "false
457 condition".
458
459 <DT> false else condition
460 <DD> The "true else condition" and "false else condition" tokens
461 are semantically determined. They consist of an #elif line.
462 If the value of the conditional expression is true the
463 reduction token is "true else condition", otherwise it is
464 "false else condition".
465
466 <DT> false if section:
467 <DD> A "false if section" is a #if, #ifdef, or #ifndef condition
468 that turns out to be false followed by any number, including
469 zero, of complete sections or false #elif condition lines.
470 All of the text within a "false if section" is discarded.
471 <DT> floating qualifier
472 <DD> These productions are simply the optional qualifiers to
473 specify that a constant is to be treated as a float or as a
474 long double.
475
476 <DT> hex constant
477 <DD> A "hex constant" is simply a "hex integer" plus any
478 following qualifiers.
479
480 <DT> hex integer
481 <DD> The digits which comprise the integer are pushed onto the
482 string accumulator. When the integer is complete, the string
483 will be entered into the token dictionary and subsequently
484 it will be described by its index in the token dictionary.
485
486 <DT> if header
487 <DD> This production is simply a portion of the rule for the #if
488 statement. It is separated out in order to provide a hook on
489 which to hang the call to init_condition(), which diverts
490 scanner output to the expression evaluator which will
491 calculate the value of the conditional expression.
492
493 <DT> initial arg element
494 <DD> In gathering macro arguments, spaces must not be confused
495 with a true argument. Therefore, the arg element token is
496 broken down into two pieces so that each argument begins
497 with a nonblank token.
498
499 <DT> include header
500 <DD> "include header" simply represents the initial portion of an
501 #include line and provides a hook for a reduction procedure
502 which diverts scanner output to the token accumulator. This
503 diversion allows the text which follows #include to be
504 scanned for macros and accumulated. The include_file()
505 function will be called to actually identify and scan the
506 specified file.
507
508 <DT> input file
509 <DD> This is the grammar, or start token. It describes the entire
510 file as alternating sections and eols, terminated by an eof
511
512 <DT> integer constant
513 <DD> These productions simply gather together the varieties of
514 integer constants under one umbrella.
515
516 <DT> integer qualifier
517 <DD> These productions are simply the optional qualifiers to
518 specify that an "integer constant" is to be treated as
519 unsigned, long, or both.
520
521 <DT> macro
522 <DD> See "expanded word". id_macro specifies "macro" or "simple
523 macro" depending on whether the named macro was defined with
524 or without following parentheses.
525
526 <DT> macro arg list
527 <DD> A "macro arg list" can be either empty or can consist of any
528 number of token sequences separated by commas. Commas that
529 are protected by nested parentheses do not separate
530 arguments. Argument strings are accumulated on the token
531 accumulator and counted by "macro args".
532
533 <DT> macro args
534 <DD> Each argument to a macro is gathered on a separate level of
535 the token accumulator, so the token accumulator level is
536 incremented before each argument, and the arguments are
537 counted.
538
539 <DT> macro definition header
540 <DD> The "macro definition header" consists of the #define line
541 up to the beginning of the body text of the macro. It serves
542 as a hook to call init_macro_def() which begins the macro
543 definition and diverts scanner output to the token
544 accumulator. The macro definition will be completed by the
545 save_macro_body() function once the entire macro body has
546 been accumulated. Note that the tokens for the macro body
547 are not examined for macro calls.
548
549 <DT> name string
550 <DD> "name string" is simply an accumulation on the string
551 accumulator of the characters which make up an identifier.
552
553 <DT> nested elements
554 <DD> "nested elements" are "arg elements" that are found inside
555 nested parentheses.
556
557 <DT> not control mark
558 <DD> This consists of any input character excepting eof, newline,
559 backslash and '#', but including any of these if preceded by
560 a backslash. It serves, at the beginning of a line, to
561 distinguish ordinary lines of text from preprocessor control
562 lines.
563 <DT>
564 octal integer
565 <DD> The digits which comprise the integer are pushed onto the
566 string accumulator. When the integer is complete, the string
567 will be entered into the token dictionary and subsequently
568 it will be described by its index in the token dictionary.
569
570 <DT> operator
571 <DD> This is simply an inventory of all the multi-character
572 operators in C.
573
574 <DT> parameter list
575 <DD> "parameter list" is simply a wrapper about "names" which
576 allows for empty parentheses. Note that both the "names"
577 token and the "parameter list" tokens provide the count of
578 the number of parameter names found inside the parentheses.
579 The names themselves have been stacked on the string
580 accumulator.
581
582 <DT> qualified real
583 <DD> This production exists to allow the "floating qualifier" to
584 be appended to a "real constant".
585 <DT> real
586 <DD> These productions itemize the various ways of writing a
587 floating point number with and without decimal points and
588 with and without exponent fields.
589
590 <DT> real constant
591 <DD> This production is simply an envelope to contain "real" and
592 write the output code once instead of four times.
593
594 <DT> section
595 <DD> This is a logical block of input. It is either a single line
596 of ordinary code, a control line such as #define or #undef,
597 or an entire conditional compilation block, i.e., everything
598 from the #if to the closing #endif. Notice that the eol that
599 terminates a "section" is not part of the "section". The
600 only difference between a "section" and a "skip section" is
601 that in a "section", all tokens are sent to the scanner
602 output while in a "skip section", all input is discarded.
603
604 <DT> separator
605 <DD> This is simply a gathering together of all the tokens that
606 are neither white space nor identifiers, since they are
607 treated uniformly throughout the grammar.
608
609 <DT> simple macro
610 <DD> See "expanded word".
611
612 <DT> simple real
613 <DD> A "simple real" is one which has a decimal point and has
614 digits on at least one side of the decimal point.
615 Unaccompanied decimal points will be turned away at the
616 door.
617 <DT> simple token
618 <DD> The word "token" is used here in the sense of Kernighan and
619 Ritchie, 2nd Edition, Appendix A, p. 191. In this program a
620 "simple token" is one which is simply passed on without
621 regard to macro processing. An "expanded token" is one which
622 has been checked to see if it is a
623 <P> macro identifier and, if so, expanded. "simple tokens" are
624 recognized only in the bodies of macro definitions.
625 Therefore spaces and '#' characters are passed on. For
626 "expanded tokens" they are discarded.
627
628 <DT> skip else line
629 <DD> For purposes of skipping over complete conditional sections
630 #elif and #else lines are equivalent.
631
632 <DT> skip else section
633 <DD> A "skip else section" consists of the #else or #elif line
634 following a satisfied conditional and all subsequent
635 sections and #elif and #else lines. All input in the "skip
636 else section" is discarded.
637
638 <DT> skip if section
639 <DD> A "skip if section" consists of an #if, #ifdef, or #ifndef
640 line, and all following complete "sections" (represented as
641 "skip sections", so their content will be ignored) and #else
642 and #elif lines.
643
644 <DT> skip line
645 <DD> When skipping text, we have to distinguish between lines
646 which begin with the control mark ('#') and those which
647 don't so that we deal correctly with nested #endif
648 statements. We wouldn't want to terminate a block of
649 uncompiled code with the wrong #endif.
650
651 <DT> skip section
652 <DD> A "skip section" is simply a "section" that follows an
653 unsatisfied conditional. In a "skip section", all input is
654 discarded.
655
656 <DT> space
657 <DD> space consists of either a blank or a comment. If a comment
658 is found, it is replaced with a blank.
659 <DT> simple chars
660 <DD> "simple chars" consists of the body of a character constant
661 up to but not including the final quote.
662
663 <DT> string chars
664 <DD> "string chars" consists of the body of a string literal up
665 to but not including the final double quote.
666
667 <DT> string literal
668 <DD> A "string literal" is simply a quoted string. It is
669 accumulated on the string accumulator.
670
671 <DT> true condition
672 <DD> The "true condition" and "false condition" tokens are
673 semantically determined. They consist of #if, #ifdef, or
674 #ifndef lines. If the result of the test is true the
675 reduction token is "true condition", otherwise it is "false
676 condition".
677
678 <DT> true condition
679 <DD> The "true condition" and "false condition" tokens are
680 semantically determined. They consist of #if, #ifdef, or
681 #ifndef lines. If the result of the test is true the
682 reduction token is "true condition", otherwise it is "false
683 condition".
684
685 <DT> true else condition
686 <DD> The "true else condition" and "false else condition" tokens
687 are semantically determined. They consist of an #elif line.
688 If the value of the conditional expression is true the
689 reduction token is "true else condition", otherwise it is
690 "false else condition".
691
692 <DT> true if section
693 <DD> A "true if section" is a true #if, #ifdef, or #ifndef,
694 followed by any number of complete sections, including zero.
695 Alternatively, it could be a "false if section" that is
696 followed by a true #elif condition, followed by any number
697 of complete "sections". All input in a "true if section"
698 subsequent to the true condition is passed on to the scanner
699 output.
700
701 <DT> word
702 <DD> This is the treatment of a simple identifier as a "simple
703 token". The name_token() procedure is called to pop the name
704 string from the string accumulator, identify it in the token
705 dictionary and assign a token_id to it by checking to see if
706 it is a reserved word.
707
708 <DT> variable
709 <DD> See "expanded word".
710
711 ws
712 <DD> The definition for ws as space... simply allows a briefer
713 reference in those places in the grammar where it is
714 necessary to skip over white space.
715 </DL>
716 <P>
717 <BR>
718
719
720 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
721 WIDTH=1010 HEIGHT=2 >
722 <P>
723 <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
724 WIDTH=181 HEIGHT=25>
725 <BR CLEAR="right">
726
727 <P>
728 Back to :
729 <A HREF="../../index.html">Index</A> |
730 <A HREF="index.html">Macro preprocessor overview</A>
731 <P>
732
733 <ADDRESS><FONT SIZE="-1">
734 AnaGram parser generator - examples<BR>
735 Token Scanner - Macro preprocessor and C Parser <BR>
736 Copyright &copy; 1993-1999, Parsifal Software. <BR>
737 All Rights Reserved.<BR>
738 </FONT></ADDRESS>
739
740 </BODY>
741 </HTML>
742