comparison doc/misc/html/examples/mpp/index.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2 <HTML>
3 <HEAD>
4 <TITLE> Macro preprocessor and C Parser </TITLE>
5 </HEAD>
6
7
8 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
9 TEXT="#000000" LINK="#0033CC"
10 VLINK="#CC0033" ALINK="#CC0099">
11
12 <P>
13 <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
14 WIDTH=124 HEIGHT=30 >
15 <BR CLEAR="all">
16 Back to <A HREF="../../index.html">Index</A>
17 <P>
18 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
19 WIDTH=1010 HEIGHT=2 >
20 <P>
21
22 <H1>Macro preprocessor and C Parser</H1>
23
24 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
25 WIDTH=1010 HEIGHT=2 >
26
27 <H2>Introduction</H2>
28
29 This document provides an overview of the entire macro preprocessor example.
30 Since the example consists of a number of modules, there is also a separate
31 document file for each module. These document files provide an overview
32 of the module and detailed descriptions of the variables, data structures
33 and syntactic elements associated with the module.
34
35 <P>This implementation of a C macro preprocessor demonstrates:
36 <UL>
37 <LI>
38 the use of AnaGram in a real-world problem of considerable complexity.</LI>
39
40 <LI>
41 the use of AnaGram in a C++ environment.</LI>
42 </UL>
43 It was felt that only a fairly complex problem would adequately demonstrate
44 the power of AnaGram. This example, therefore, may not be particularly
45 easy to grasp or to understand in its entirety.
46
47 <P>However, it is not necessary to understand all facets of this example
48 to make good use of it. If you skim over it, you will see examples of many
49 common syntactic constructs. You will find that in many cases you can copy
50 these constructs verbatim and incorporate them directly into your own programs.
51
52 <P>A number of AnaGram's features and options are well illustrated. This
53 example makes use of four separate syntaxes to deal with the preprocessing
54 so that the complete program, with one or another of the C parsers linked
55 in, contains five separate parsers. There are, therefore, numerous examples
56 of interfacing a parser to the rest of a program. In particular, several
57 of the parsers are configured as C++ classes.
58
59 <P>Other AnaGram features, such as semantically determined productions
60 and context tracking, are used to good avail, particularly in the token
61 scanner, which also illustrates the use of AnaGram to write lexical scanners.
62
63 <P>In addition to the macro preprocessor, this example provides a choice
64 of two C parsers which have been interfaced to the preprocessor. These
65 parsers are simply syntax checkers. They have essentially no reduction
66 procedures except for enough to give them rudimentary (and not fully
67 correct) capabilities for
68 coping with typedef types. You may, of course, add your own reduction procedures
69 to adapt them to your needs.
70 <P>
71 Note that this macro preprocessor is not particularly standards
72 compliant; if you feed it difficult or pedantic test cases, it will
73 probably give you wrong output.
74 <P>
75 <BR>
76
77 <H2>
78 Components of the Macro Preprocessor</H2>
79 <TABLE WIDTH="100%">
80
81 <TR>
82 <TD COLSPAN=4>
83 The macro preprocessor example comprises the following modules:
84 <BR><BR>
85 </TD>
86 </TR>
87
88 <TR>
89 <td rowspan=10 width="4%">&nbsp;</td>
90 <TD><tt><A HREF=mpp.html>mpp.cpp</A></tt></TD>
91 <td rowspan=10 width="4%">&nbsp;</td>
92
93 <TD>data declarations and main program</TD>
94 </TR>
95
96 <TR>
97 <TD><tt><A HREF=mpp.html>mpp.h</A></tt></TD>
98
99 <TD>Structure definitions, data and function declarations</TD>
100 </TR>
101
102 <TR>
103 <TD><tt><A HREF=token.html>token.cpp</A></tt></TD>
104
105 <TD>token class function definitions</TD>
106 </TR>
107
108 <TR>
109 <TD><tt><A HREF=token.html>token.h</A></tt></TD>
110
111 <TD>Token class definitions</TD>
112 </TR>
113
114 <TR>
115 <TD><tt><A HREF=ts.html>ts.syn</A></tt></TD>
116
117 <TD>token scanner</TD>
118 </TR>
119
120 <TR>
121 <TD><tt><A HREF=mas.html>mas.syn</A></tt></TD>
122
123 <TD>macro and argument substitution module</TD>
124 </TR>
125
126 <TR>
127 <TD><tt><A HREF=ex.html>ex.syn</A></tt></TD>
128
129 <TD>constant expression evaluator</TD>
130 </TR>
131
132 <TR>
133 <TD><tt><A HREF=ct.html>ct.syn</A></tt></TD>
134
135 <TD>token classifier</TD>
136 </TR>
137
138 <TR>
139 <TD><tt><A HREF=parsers.html>jrc.syn</A></tt></TD>
140
141 <TD>C parser, based on C grammar by James A. Roskind</TD>
142 </TR>
143
144 <TR>
145 <TD><tt><A HREF=parsers.html>krc.syn</A></tt></TD>
146
147 <TD>C parser, based on C grammar in K &amp; R, section A13</TD>
148 </TR>
149
150 <!--
151 <P>
152 Here are links to the corresponding
153 document files:
154 <CENTER><A HREF="mpp.html">MPP&nbsp;</A> | <A HREF="token.html">TOKEN</A>
155 | <A HREF="ts.html">TS</A> | <A HREF="mas.html">MAS</A> | <A HREF="ex.html">EX</A>
156 | <A HREF="ct.html">CT</A> | <A HREF="parsers.html">PARSERS</A></CENTER>
157 -->
158
159 <TR>
160 <TD COLSPAN=4>
161 <BR>
162 In addition, the following modules found in the <tt>oldclasslib</tt>
163 directory provide supporting functions:
164 <BR><BR>
165 </TD>
166 </TR>
167
168 <TR>
169 <td rowspan=6 width="4%">&nbsp;</td>
170 <TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.cpp</A></tt></TD>
171 <td rowspan=6 width="4%">&nbsp;</td>
172
173 <TD>Character sink support</TD>
174 </TR>
175
176 <TR>
177 <TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.h</A></tt></TD>
178
179 <TD>Character sink class definitions</TD>
180 </TR>
181
182 <TR>
183 <TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.cpp</A></tt></TD>
184
185 <TD>String dictionary support</TD>
186 </TR>
187
188 <TR>
189 <TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.h</A></tt></TD>
190
191 <TD>String dictionary class definition</TD>
192 </TR>
193
194 <TR>
195 <TD><tt><A HREF=../../oldclasslib/array.html>array.h</A></tt></TD>
196
197 <TD>Array class definition</TD>
198 </TR>
199
200 <TR>
201 <TD><tt><A HREF=../../oldclasslib/stack.html>stack.h</A></tt></TD>
202
203 <TD>Stack class definition</TD>
204 </TR>
205
206 </TABLE>
207
208 <!--
209 Here are links to the corresponding
210 document files:
211 <CENTER><A HREF="../../oldclasslib/charsink.html">CHARSINK</A> | <A HREF="../../oldclasslib/array.html">ARRAY</A>
212 | <A HREF="../../oldclasslib/stack.html">STACK</A> | <A HREF="../../oldclasslib/strdict.html">STRDICT</A></CENTER>
213 -->
214
215 <P>
216 <BR>
217 <H2>
218 Data Flow in the Macro Preprocessor</H2>
219 Of the four parsers that make up the macro preprocessor itself, three are
220 simply operators which transform their input:
221 <UL>
222 <LI>
223 MAS transforms a token string (e.g., the body of a macro) into another
224 token string (e.g., the expansion of the macro). MAS is called only from
225 TS, and, recursively, from itself.</LI>
226
227 <LI>
228 EX transforms a token string (e.g., the text of a conditional expression)
229 into a long integer (e.g., the value of the expression). EX is called only
230 from TS.</LI>
231
232 <LI>
233 CT transforms a character string (ostensibly a C token) into a type identification
234 code (e.g., STRINGliteral, identifier, etc.). CT is called only from MAS.</LI>
235 </UL>
236 The fourth is the token scanner, TS, which controls the entire process.
237 The relationships are illustrated in the diagrams below which show the
238 type direction of data flow among the modules.
239 </P>
240 <BR>
241 <H3>
242 Relationship between Token Scanner, Macro/Argument Scanner and Token Classifier
243 modules:</H3>
244
245 <CENTER><IMG SRC="reltmt24.gif" ALT="TS, translator, and output diagram" ></CENTER>
246 <P>
247 <BR>
248
249
250 <H3>
251 Relationship between Token Scanner and Expression Evaluator:</H3>
252
253 <CENTER><IMG SRC="relte24.gif" ALT="TS, translator, and output diagram" ></CENTER>
254 <P>
255 <BR>
256
257 <H3>
258 Relationship between Token Scanner, token translator and output file:</H3>
259
260 <CENTER><IMG SRC="reltto24.gif" ALT="TS, translator, and output diagram" ></CENTER>
261 <P>
262 <BR>
263 <H3>
264 Relationship between Token Scanner and C Parser:</H3>
265
266 <CENTER><IMG SRC="reltc24.gif" ALT="TS, translator, and output diagram" ></CENTER>
267 <P>
268
269 <BR>
270 <H2>
271 Building and Running the Macro Preprocessor</H2>
272 To make a working version of the macro preprocessor you need to take the
273 following steps:
274 <OL>
275 <LI>
276 Run AnaGram and build parsers for TS, MAS, CT, and EX.</LI>
277
278 <LI>
279 Choose which C grammar you would like to use (JRC or KRC), run AnaGram, and
280 build a parser for your choice.</LI>
281
282 <LI>
283 If you are using JRC, edit the <tt>#include</tt> near the top of
284 <tt>mpp.h</tt> to load <tt>jrc.h</tt> instead of <tt>krc.h</tt>.
285
286 <LI>
287 Make sure your compiler can find include files from
288 <tt>oldclasslib/include</tt>.</LI>
289
290 <LI>
291 Then, compile and link the following modules:</LI>
292
293 <BR><tt>mpp.cpp</tt>
294 <BR><tt>token.cpp</tt>
295 <BR><tt>ts.cpp</tt>
296 <BR><tt>mas.cpp</tt>
297 <BR><tt>ct.cpp</tt>
298 <BR><tt>ex.cpp</tt>
299 <BR><tt>krc.cpp</tt> or <tt>jrc.cpp</tt>
300 <BR><tt>oldclasslib/source/charsink.cpp</tt>
301 <BR><tt>oldclasslib/source/strdict.cpp</tt>
302 </OL>
303 Now you can run the macro preprocessor.
304
305 <P>The command line syntax is as follows:
306 <PRE>
307 mpp [-c] [-n] &lt;input file name&gt; [&lt;output file name&gt;]
308 </PRE>
309 The -c switch causes output of the preprocessor to be directed to the C
310 parser you have included, rather than to an output file.
311
312 <P>The -n switch allows the recognition of nested comments.
313
314 <P>If you do not set the -c switch and do not specify an output file name,
315 output will be directed to stdout.
316
317 <P>
318 <BR>
319 <H2>
320 Theory of Operation</H2>
321 This implementation of a macro preprocessor is based on the description
322 of preprocessing given in Section A12, Appendix A, of "The C Programming
323 Language", Second Edition, by Kernighan and Ritchie, Prentice-Hall, 1988.
324
325 <P>The preprocessor itself comprises four modules: A token scanner,
326 <tt>ts.syn</tt>;
327 a macro/argument substitution module, <tt>mas.syn</tt>; a token
328 classifier, <tt>ct.syn</tt>;
329 and an expression evaluator, <tt>ex.syn</tt>. These modules, working
330 together, deal
331 with conditional compilation, include files, macro definition, and macro
332 expansion. The output of the preprocessor may be directed to stdout, to
333 a file, or to either of two C parsers, depending on which you choose to
334 link into your version of the program.
335
336 <P>Two of the modules, <tt>ts.syn</tt> and <tt>mas.syn</tt> do most of
337 the work. <tt>ts.syn</tt> breaks
338 the input into a sequence of "tokens" as defined by section A2.1 in Kernighan
339 and Ritchie. It also determines the syntactic type of each such token.
340 Descriptors, consisting of a type identifier and a storage handle, are
341 then used as the units for further processing. <tt>ts.syn</tt> also handles the
342 conditional compilation logic and fields macro definitions. When it encounters
343 a macro call, it enlists <tt>mas.syn</tt> to expand the macro.
344
345 <P><tt>ex.syn</tt> exists only to evaluate the conditional expressions
346 in <TT>#if</TT> and <TT>#elif </TT>control statements. <tt>ct.syn</tt>
347 is used only when a
348 new token has been created during macro expansion. The "<TT>##</TT>" operator
349 requires that two tokens be pasted together to make a single token.
350 <tt>ct.syn</tt>
351 is then used to determine what manner of beast has been created.
352
353 <P>
354 <BR>
355 <H2>
356 Supporting Class Libraries</H2>
357 The macro preprocessor uses a number of simple data structures implemented
358 as C++ classes to record and analyze the data generated by the parsers.
359 Some of these structures are of general utility and are found in
360 the <A HREF="../../oldclasslib/index.html">oldclasslib</A> directory.
361 The others are specific to the preprocessor and are to be found in the
362 files <tt>token.h</tt> and <tt>token.cpp</tt> with the rest of the
363 preprocessor files.
364
365 <P>
366 <BR>
367 <H2>
368 General Purpose Data Structures</H2>
369 The general purpose data structures are the following:
370 <UL>
371 <LI><tt>character_sink</tt></LI>
372 <LI><tt>string_accumulator</tt></LI>
373 <LI><tt>output_file</tt></LI>
374 <LI><tt>array&lt;class T&gt;</tt></LI>
375 <LI><tt>stack&lt;class T&gt;</tt></LI>
376 <LI><tt>string_dictionary</tt></LI>
377 </UL>
378 A <tt>character_sink</tt> is an abstract class. It represents simply a
379 general purpose
380 character output device which can be plugged in to any character generator
381 to accept its output.
382
383 <P>A <tt>string_accumulator</tt> is a species of
384 <tt>character_sink</tt>, which can store
385 up characters as they arrive. It has multiple levels, so it can be used
386 in recursive contexts without any confusion.
387
388 <P>An <tt>output_file</tt> is another species of
389 <tt>character_sink</tt>. It is simply a
390 very simple implementation of stream output, set up so that it can be used
391 interchangeably with other kinds of <tt>character_sink</tt>.
392
393 <P><tt>array</tt> is a template class that simplifies the allocation
394 and freeing of local storage for arrays of arbitrary type.
395
396 <P>A <tt>stack</tt> is a template class that provides for
397 multi-leveled push-down stacks of arbitrary types of data.
398
399 <P>A <tt>string_dictionary</tt> is a device for associating a unique
400 integer handle
401 with a string so that the integer handle may be used as an alias for the
402 string.
403
404 <P>All of these classes use operator overloading in a consistent manner:
405
406 <P><TT>&lt;&lt; </TT>is used to add data to an entity, for example, to
407 push data onto a stack, to add a string to a string dictionary, to add
408 data to a string accumulator, to send data to an output file, or to transmit
409 data to a parser. In all cases, <TT>&lt;&lt; </TT>may be chained:
410 <PRE> ta &lt;&lt; s1 &lt;&lt; s2;</PRE>
411 <TT>&gt;&gt; </TT>is used to remove data from an entity, in particular, to pop
412 something from a stack, or to remove a character from a string accumulator.
413 Like " &lt;&lt; ", "&gt;&gt;" may be chained:
414 <PRE> ta &gt;&gt; s1 &gt;&gt; s2;</PRE>
415 <TT>++ </TT>is used with string accumulators and with stacks to increment
416 the level number. It is defined only as a pre-increment operator.
417
418 <P><TT>-- </TT>is used with string accumulators and with stacks to decrement
419 the level number. It is defined only as a pre-decrement operator.
420
421 <P><TT>[] </TT>is used to access a particular item. In the case of the
422 string dictionary, <TT>[] </TT>with a string argument returns the handle,
423 or zero, if the string is not in the dictionary. <TT>[] </TT>with a handle
424 returns a pointer to the string. In the case of the "array" class, <TT>[]
425 </TT>provides access to a single element and checks for out of bounds references.
426
427 <P>Cast operators are also overloaded to provide simple access to the data
428 stored in an instance of a class.
429
430 <P>Several overloaded functions are defined consistently where they are
431 defined at all:
432 <TABLE WIDTH="100%">
433
434 <TR>
435 <TD ROWSPAN=3 WIDTH="4%">
436 <TD><tt>reset(</tt><i>object</i><tt>)</tt></TD>
437 <TD>restores initial state&nbsp;</TD>
438 </TR>
439
440 <TR>
441 <TD><tt>size(</tt><i>object</i><tt>)</tt></TD>
442 <TD>returns size&nbsp;</TD>
443 </TR>
444
445 <TR>
446 <TD><tt>error(</tt><i>object</i><tt>)</tt>&nbsp;</TD>
447 <TD>returns error flag&nbsp;</TD>
448 </TR>
449
450 </TABLE>
451 The macro preprocessor uses instances of the above classes for global data
452 storage and manipulation:
453 <PRE>
454 extern stack&lt;char *&gt; paths;
455 extern string_accumulator sa;
456 extern string_dictionary td;
457 </PRE>
458 <TT>paths </TT>is used to hold a list of search paths to look for include
459 files whose names are enclosed in angle brackets.
460
461 <P><TT>sa </TT>is used in the token scanner, to accumulate the strings
462 that constitute C tokens. Once complete, each string is added to the string_dictionary
463 <TT>td </TT>to get a handle which identifies the string uniquely. <TT>td
464 </TT>is generally referred to as the "token dictionary".
465
466 <P>In the main program, an output file is defined in terms of these classes:
467 <PRE> output_file file;</PRE>
468
469 <P>
470 <BR>
471 <H2>
472 Token Classes</H2>
473 A number of class and structure definitions specific to the macro preprocessor
474 are given in <tt>token.h</tt>. Member functions are defined in
475 <tt>token.cpp</tt>.
476
477 <P>The definitions in <tt>token.h</tt> are geared toward the
478 transmission and sharing
479 of data among the modules that make up the macro preprocessor. An enumeration
480 statement defines enumeration constants for all the different kinds of
481 terminal tokens a C parser can expect to see. These enumeration constants
482 are defined to be of type <tt>token_id</tt>.
483
484 <!-- this sentence needs to be shot. -->
485 <P>A structure definition defines a token as a pair consisting of a
486 <tt>token_id</tt>,
487 and an unsigned integer which represents the handle in the token dictionary
488 of the string of characters that constitutes the actual token as defined
489 in K&amp;R.
490
491 <P>Then, to facilitate working with these tokens, a set of classes is
492 defined using the <tt>character_sink</tt> class and its derived
493 classes <!-- more or less --> as a model:
494
495 <UL>
496 <LI><tt>token_sink</tt></LI>
497 <LI><tt>token_accumulator</tt></LI>
498 <LI><tt>token_translator</tt></LI>
499 <LI><tt>expression_evaluator</tt></LI>
500 <LI><tt>c_parser</tt></LI>
501 </UL>
502
503 Like <tt>character_sink</tt>, <tt>token_sink</tt> is an abstract class
504 that serves
505 as a general purpose output device for processes which create a stream
506 of tokens.
507
508 <P>A <tt>token_accumulator</tt> is a species of <tt>token_sink</tt>.
509 It is a repository for
510 sequences of tokens. It has multiple levels, like a
511 <tt>string_accumulator</tt>,
512 so it can be used safely in recursive procedures.
513
514 <P>A <tt>token_translator</tt> is a species of <tt>token_sink</tt>
515 which converts a stream
516 of tokens to a stream of characters. The constructor for a
517 <tt>token_translator</tt>
518 takes a pointer to a <tt>character_sink</tt>, so that tokens handed to
519 a <tt>token_translator</tt>
520 are converted to strings and passed on to the specified character sink.
521
522 <P>The <tt>expression_evaluator</tt> class is a class structure wrapped about
523 the expression evaluation module, <tt>ex.syn</tt>. It is a species of
524 <tt>token_sink</tt>,
525 so that tokens may be passed to the <tt>expression_evaluator</tt> just
526 as they are to a <tt>token_accumulator</tt> or a <tt>token_translator</tt>.
527
528 <P>The <tt>c_parser</tt> class is a class structure wrapped about a C
529 parser module.
530 Implementations of this class are found in both <tt>jrc.syn</tt> and
531 <tt>krc.syn</tt>. The
532 <tt>c_parser</tt> class is also a <tt>token_sink</tt>.
533
534 <P>The macro preprocessor uses several global variables based on the token
535 based classes defined above:
536 <PRE>
537 extern token_sink *scanner_sink;
538 extern token_accumulator ta;
539 extern expression_evaluator condition;
540 </PRE>
541 <tt>scanner_sink</tt> is the generic output device for the token
542 scanner. As the
543 token scanner develops tokens it sends them to the <tt>token_sink</tt> pointed
544 to by <tt>scanner_sink</tt>.
545
546 <P><tt>condition</tt> is used to evaluate constant expressions in
547 <TT>#if</TT> and
548 <TT>#elif</TT> statements. The token scanner diverts its output
549 to the expression evaluator with the statement:
550 <PRE>
551 scanner_sink = &amp;condition;
552 </PRE>
553 Until the <tt>scanner_sink</tt> is restored to its previous value, all
554 output from
555 the token scanner flows to the expression_evaluator, <tt>condition</tt>.
556
557 <P><TT>ta</TT> is a token_accumulator, used in the token scanner and in
558 <tt>mas.syn</tt> to accumulate sequences of tokens. As with the
559 <tt>expression_evaluator</tt>,
560 output from the token scanner can be diverted to <TT>ta</TT> by means of
561 one simple statement:
562 <PRE>
563 scanner_sink = &amp;ta;
564 </PRE>
565 This diversion simplifies the gathering of the tokens which comprise the
566 body of a macro or an argument to a macro call.
567
568 <P>In the main program, two local variables are defined in terms of these
569 token based structures:
570 <PRE>
571 c_parser cp;
572 token_translator tt(&amp;file);
573 </PRE>
574 Thus either <tt>cp</tt> or <tt>tt</tt> can serve as an output
575 destination for the token scanner.
576 The main program sets <tt>scanner_sink</tt> to point to one or the
577 other depending
578 on a command line switch.
579 </P>
580
581 <BR>
582
583 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
584 WIDTH=1010 HEIGHT=2 >
585 <P>
586 <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
587 WIDTH=181 HEIGHT=25>
588 <BR CLEAR="right">
589 <P>
590 Back to <A HREF="../../index.html">Index</A>
591 <P>
592 <ADDRESS>
593 <FONT SIZE=-1>AnaGram parser generator - examples</FONT>
594 <BR><FONT SIZE=-1>Macro preprocessor and C Parser</FONT>
595 <BR><FONT SIZE=-1>Copyright &copy; 1993-1999, Parsifal Software.</FONT>
596 <BR><FONT SIZE=-1>All Rights Reserved.</FONT>
597 <BR>
598 </ADDRESS>
599 </BODY>
600 </HTML>
601