Mercurial > ~dholland > hg > ag > index.cgi
diff doc/misc/html/examples/mpp/index.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/misc/html/examples/mpp/index.html Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,601 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> +<HEAD> + <TITLE> Macro preprocessor and C Parser </TITLE> +</HEAD> + + +<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" + TEXT="#000000" LINK="#0033CC" + VLINK="#CC0033" ALINK="#CC0099"> + +<P> +<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram" + WIDTH=124 HEIGHT=30 > +<BR CLEAR="all"> +Back to <A HREF="../../index.html">Index</A> +<P> +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> + +<H1>Macro preprocessor and C Parser</H1> + +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > + +<H2>Introduction</H2> + +This document provides an overview of the entire macro preprocessor example. +Since the example consists of a number of modules, there is also a separate +document file for each module. These document files provide an overview +of the module and detailed descriptions of the variables, data structures +and syntactic elements associated with the module. + +<P>This implementation of a C macro preprocessor demonstrates: +<UL> +<LI> +the use of AnaGram in a real-world problem of considerable complexity.</LI> + +<LI> +the use of AnaGram in a C++ environment.</LI> +</UL> +It was felt that only a fairly complex problem would adequately demonstrate +the power of AnaGram. This example, therefore, may not be particularly +easy to grasp or to understand in its entirety. + +<P>However, it is not necessary to understand all facets of this example +to make good use of it. If you skim over it, you will see examples of many +common syntactic constructs. You will find that in many cases you can copy +these constructs verbatim and incorporate them directly into your own programs. + +<P>A number of AnaGram's features and options are well illustrated. This +example makes use of four separate syntaxes to deal with the preprocessing +so that the complete program, with one or another of the C parsers linked +in, contains five separate parsers. There are, therefore, numerous examples +of interfacing a parser to the rest of a program. In particular, several +of the parsers are configured as C++ classes. + +<P>Other AnaGram features, such as semantically determined productions +and context tracking, are used to good avail, particularly in the token +scanner, which also illustrates the use of AnaGram to write lexical scanners. + +<P>In addition to the macro preprocessor, this example provides a choice +of two C parsers which have been interfaced to the preprocessor. These +parsers are simply syntax checkers. They have essentially no reduction +procedures except for enough to give them rudimentary (and not fully +correct) capabilities for +coping with typedef types. You may, of course, add your own reduction procedures +to adapt them to your needs. +<P> +Note that this macro preprocessor is not particularly standards +compliant; if you feed it difficult or pedantic test cases, it will +probably give you wrong output. +<P> +<BR> + +<H2> +Components of the Macro Preprocessor</H2> +<TABLE WIDTH="100%"> + +<TR> +<TD COLSPAN=4> +The macro preprocessor example comprises the following modules: +<BR><BR> +</TD> +</TR> + +<TR> +<td rowspan=10 width="4%"> </td> +<TD><tt><A HREF=mpp.html>mpp.cpp</A></tt></TD> +<td rowspan=10 width="4%"> </td> + +<TD>data declarations and main program</TD> +</TR> + +<TR> +<TD><tt><A HREF=mpp.html>mpp.h</A></tt></TD> + +<TD>Structure definitions, data and function declarations</TD> +</TR> + +<TR> +<TD><tt><A HREF=token.html>token.cpp</A></tt></TD> + +<TD>token class function definitions</TD> +</TR> + +<TR> +<TD><tt><A HREF=token.html>token.h</A></tt></TD> + +<TD>Token class definitions</TD> +</TR> + +<TR> +<TD><tt><A HREF=ts.html>ts.syn</A></tt></TD> + +<TD>token scanner</TD> +</TR> + +<TR> +<TD><tt><A HREF=mas.html>mas.syn</A></tt></TD> + +<TD>macro and argument substitution module</TD> +</TR> + +<TR> +<TD><tt><A HREF=ex.html>ex.syn</A></tt></TD> + +<TD>constant expression evaluator</TD> +</TR> + +<TR> +<TD><tt><A HREF=ct.html>ct.syn</A></tt></TD> + +<TD>token classifier</TD> +</TR> + +<TR> +<TD><tt><A HREF=parsers.html>jrc.syn</A></tt></TD> + +<TD>C parser, based on C grammar by James A. Roskind</TD> +</TR> + +<TR> +<TD><tt><A HREF=parsers.html>krc.syn</A></tt></TD> + +<TD>C parser, based on C grammar in K & R, section A13</TD> +</TR> + +<!-- +<P> +Here are links to the corresponding +document files: +<CENTER><A HREF="mpp.html">MPP </A> | <A HREF="token.html">TOKEN</A> +| <A HREF="ts.html">TS</A> | <A HREF="mas.html">MAS</A> | <A HREF="ex.html">EX</A> +| <A HREF="ct.html">CT</A> | <A HREF="parsers.html">PARSERS</A></CENTER> +--> + +<TR> +<TD COLSPAN=4> +<BR> +In addition, the following modules found in the <tt>oldclasslib</tt> +directory provide supporting functions: +<BR><BR> +</TD> +</TR> + +<TR> +<td rowspan=6 width="4%"> </td> +<TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.cpp</A></tt></TD> +<td rowspan=6 width="4%"> </td> + +<TD>Character sink support</TD> +</TR> + +<TR> +<TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.h</A></tt></TD> + +<TD>Character sink class definitions</TD> +</TR> + +<TR> +<TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.cpp</A></tt></TD> + +<TD>String dictionary support</TD> +</TR> + +<TR> +<TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.h</A></tt></TD> + +<TD>String dictionary class definition</TD> +</TR> + +<TR> +<TD><tt><A HREF=../../oldclasslib/array.html>array.h</A></tt></TD> + +<TD>Array class definition</TD> +</TR> + +<TR> +<TD><tt><A HREF=../../oldclasslib/stack.html>stack.h</A></tt></TD> + +<TD>Stack class definition</TD> +</TR> + +</TABLE> + +<!-- +Here are links to the corresponding +document files: +<CENTER><A HREF="../../oldclasslib/charsink.html">CHARSINK</A> | <A HREF="../../oldclasslib/array.html">ARRAY</A> +| <A HREF="../../oldclasslib/stack.html">STACK</A> | <A HREF="../../oldclasslib/strdict.html">STRDICT</A></CENTER> +--> + +<P> +<BR> +<H2> +Data Flow in the Macro Preprocessor</H2> +Of the four parsers that make up the macro preprocessor itself, three are +simply operators which transform their input: +<UL> +<LI> +MAS transforms a token string (e.g., the body of a macro) into another +token string (e.g., the expansion of the macro). MAS is called only from +TS, and, recursively, from itself.</LI> + +<LI> +EX transforms a token string (e.g., the text of a conditional expression) +into a long integer (e.g., the value of the expression). EX is called only +from TS.</LI> + +<LI> +CT transforms a character string (ostensibly a C token) into a type identification +code (e.g., STRINGliteral, identifier, etc.). CT is called only from MAS.</LI> +</UL> +The fourth is the token scanner, TS, which controls the entire process. +The relationships are illustrated in the diagrams below which show the +type direction of data flow among the modules. +</P> +<BR> +<H3> +Relationship between Token Scanner, Macro/Argument Scanner and Token Classifier +modules:</H3> + +<CENTER><IMG SRC="reltmt24.gif" ALT="TS, translator, and output diagram" ></CENTER> +<P> +<BR> + + +<H3> +Relationship between Token Scanner and Expression Evaluator:</H3> + +<CENTER><IMG SRC="relte24.gif" ALT="TS, translator, and output diagram" ></CENTER> +<P> +<BR> + +<H3> +Relationship between Token Scanner, token translator and output file:</H3> + +<CENTER><IMG SRC="reltto24.gif" ALT="TS, translator, and output diagram" ></CENTER> +<P> +<BR> +<H3> +Relationship between Token Scanner and C Parser:</H3> + +<CENTER><IMG SRC="reltc24.gif" ALT="TS, translator, and output diagram" ></CENTER> +<P> + +<BR> +<H2> +Building and Running the Macro Preprocessor</H2> +To make a working version of the macro preprocessor you need to take the +following steps: +<OL> +<LI> +Run AnaGram and build parsers for TS, MAS, CT, and EX.</LI> + +<LI> +Choose which C grammar you would like to use (JRC or KRC), run AnaGram, and +build a parser for your choice.</LI> + +<LI> +If you are using JRC, edit the <tt>#include</tt> near the top of +<tt>mpp.h</tt> to load <tt>jrc.h</tt> instead of <tt>krc.h</tt>. + +<LI> +Make sure your compiler can find include files from +<tt>oldclasslib/include</tt>.</LI> + +<LI> +Then, compile and link the following modules:</LI> + +<BR><tt>mpp.cpp</tt> +<BR><tt>token.cpp</tt> +<BR><tt>ts.cpp</tt> +<BR><tt>mas.cpp</tt> +<BR><tt>ct.cpp</tt> +<BR><tt>ex.cpp</tt> +<BR><tt>krc.cpp</tt> or <tt>jrc.cpp</tt> +<BR><tt>oldclasslib/source/charsink.cpp</tt> +<BR><tt>oldclasslib/source/strdict.cpp</tt> +</OL> +Now you can run the macro preprocessor. + +<P>The command line syntax is as follows: +<PRE> + mpp [-c] [-n] <input file name> [<output file name>] +</PRE> +The -c switch causes output of the preprocessor to be directed to the C +parser you have included, rather than to an output file. + +<P>The -n switch allows the recognition of nested comments. + +<P>If you do not set the -c switch and do not specify an output file name, +output will be directed to stdout. + +<P> +<BR> +<H2> +Theory of Operation</H2> +This implementation of a macro preprocessor is based on the description +of preprocessing given in Section A12, Appendix A, of "The C Programming +Language", Second Edition, by Kernighan and Ritchie, Prentice-Hall, 1988. + +<P>The preprocessor itself comprises four modules: A token scanner, +<tt>ts.syn</tt>; +a macro/argument substitution module, <tt>mas.syn</tt>; a token +classifier, <tt>ct.syn</tt>; +and an expression evaluator, <tt>ex.syn</tt>. These modules, working +together, deal +with conditional compilation, include files, macro definition, and macro +expansion. The output of the preprocessor may be directed to stdout, to +a file, or to either of two C parsers, depending on which you choose to +link into your version of the program. + +<P>Two of the modules, <tt>ts.syn</tt> and <tt>mas.syn</tt> do most of +the work. <tt>ts.syn</tt> breaks +the input into a sequence of "tokens" as defined by section A2.1 in Kernighan +and Ritchie. It also determines the syntactic type of each such token. +Descriptors, consisting of a type identifier and a storage handle, are +then used as the units for further processing. <tt>ts.syn</tt> also handles the +conditional compilation logic and fields macro definitions. When it encounters +a macro call, it enlists <tt>mas.syn</tt> to expand the macro. + +<P><tt>ex.syn</tt> exists only to evaluate the conditional expressions +in <TT>#if</TT> and <TT>#elif </TT>control statements. <tt>ct.syn</tt> +is used only when a +new token has been created during macro expansion. The "<TT>##</TT>" operator +requires that two tokens be pasted together to make a single token. +<tt>ct.syn</tt> +is then used to determine what manner of beast has been created. + +<P> +<BR> +<H2> +Supporting Class Libraries</H2> +The macro preprocessor uses a number of simple data structures implemented +as C++ classes to record and analyze the data generated by the parsers. +Some of these structures are of general utility and are found in +the <A HREF="../../oldclasslib/index.html">oldclasslib</A> directory. +The others are specific to the preprocessor and are to be found in the +files <tt>token.h</tt> and <tt>token.cpp</tt> with the rest of the +preprocessor files. + +<P> +<BR> +<H2> +General Purpose Data Structures</H2> +The general purpose data structures are the following: +<UL> +<LI><tt>character_sink</tt></LI> +<LI><tt>string_accumulator</tt></LI> +<LI><tt>output_file</tt></LI> +<LI><tt>array<class T></tt></LI> +<LI><tt>stack<class T></tt></LI> +<LI><tt>string_dictionary</tt></LI> +</UL> +A <tt>character_sink</tt> is an abstract class. It represents simply a +general purpose +character output device which can be plugged in to any character generator +to accept its output. + +<P>A <tt>string_accumulator</tt> is a species of +<tt>character_sink</tt>, which can store +up characters as they arrive. It has multiple levels, so it can be used +in recursive contexts without any confusion. + +<P>An <tt>output_file</tt> is another species of +<tt>character_sink</tt>. It is simply a +very simple implementation of stream output, set up so that it can be used +interchangeably with other kinds of <tt>character_sink</tt>. + +<P><tt>array</tt> is a template class that simplifies the allocation +and freeing of local storage for arrays of arbitrary type. + +<P>A <tt>stack</tt> is a template class that provides for +multi-leveled push-down stacks of arbitrary types of data. + +<P>A <tt>string_dictionary</tt> is a device for associating a unique +integer handle +with a string so that the integer handle may be used as an alias for the +string. + +<P>All of these classes use operator overloading in a consistent manner: + +<P><TT><< </TT>is used to add data to an entity, for example, to +push data onto a stack, to add a string to a string dictionary, to add +data to a string accumulator, to send data to an output file, or to transmit +data to a parser. In all cases, <TT><< </TT>may be chained: +<PRE> ta << s1 << s2;</PRE> +<TT>>> </TT>is used to remove data from an entity, in particular, to pop +something from a stack, or to remove a character from a string accumulator. +Like " << ", ">>" may be chained: +<PRE> ta >> s1 >> s2;</PRE> +<TT>++ </TT>is used with string accumulators and with stacks to increment +the level number. It is defined only as a pre-increment operator. + +<P><TT>-- </TT>is used with string accumulators and with stacks to decrement +the level number. It is defined only as a pre-decrement operator. + +<P><TT>[] </TT>is used to access a particular item. In the case of the +string dictionary, <TT>[] </TT>with a string argument returns the handle, +or zero, if the string is not in the dictionary. <TT>[] </TT>with a handle +returns a pointer to the string. In the case of the "array" class, <TT>[] +</TT>provides access to a single element and checks for out of bounds references. + +<P>Cast operators are also overloaded to provide simple access to the data +stored in an instance of a class. + +<P>Several overloaded functions are defined consistently where they are +defined at all: +<TABLE WIDTH="100%"> + +<TR> +<TD ROWSPAN=3 WIDTH="4%"> +<TD><tt>reset(</tt><i>object</i><tt>)</tt></TD> +<TD>restores initial state </TD> +</TR> + +<TR> +<TD><tt>size(</tt><i>object</i><tt>)</tt></TD> +<TD>returns size </TD> +</TR> + +<TR> +<TD><tt>error(</tt><i>object</i><tt>)</tt> </TD> +<TD>returns error flag </TD> +</TR> + +</TABLE> +The macro preprocessor uses instances of the above classes for global data +storage and manipulation: +<PRE> + extern stack<char *> paths; + extern string_accumulator sa; + extern string_dictionary td; +</PRE> +<TT>paths </TT>is used to hold a list of search paths to look for include +files whose names are enclosed in angle brackets. + +<P><TT>sa </TT>is used in the token scanner, to accumulate the strings +that constitute C tokens. Once complete, each string is added to the string_dictionary +<TT>td </TT>to get a handle which identifies the string uniquely. <TT>td +</TT>is generally referred to as the "token dictionary". + +<P>In the main program, an output file is defined in terms of these classes: +<PRE> output_file file;</PRE> + +<P> +<BR> +<H2> +Token Classes</H2> +A number of class and structure definitions specific to the macro preprocessor +are given in <tt>token.h</tt>. Member functions are defined in +<tt>token.cpp</tt>. + +<P>The definitions in <tt>token.h</tt> are geared toward the +transmission and sharing +of data among the modules that make up the macro preprocessor. An enumeration +statement defines enumeration constants for all the different kinds of +terminal tokens a C parser can expect to see. These enumeration constants +are defined to be of type <tt>token_id</tt>. + +<!-- this sentence needs to be shot. --> +<P>A structure definition defines a token as a pair consisting of a +<tt>token_id</tt>, +and an unsigned integer which represents the handle in the token dictionary +of the string of characters that constitutes the actual token as defined +in K&R. + +<P>Then, to facilitate working with these tokens, a set of classes is +defined using the <tt>character_sink</tt> class and its derived +classes <!-- more or less --> as a model: + +<UL> +<LI><tt>token_sink</tt></LI> +<LI><tt>token_accumulator</tt></LI> +<LI><tt>token_translator</tt></LI> +<LI><tt>expression_evaluator</tt></LI> +<LI><tt>c_parser</tt></LI> +</UL> + +Like <tt>character_sink</tt>, <tt>token_sink</tt> is an abstract class +that serves +as a general purpose output device for processes which create a stream +of tokens. + +<P>A <tt>token_accumulator</tt> is a species of <tt>token_sink</tt>. +It is a repository for +sequences of tokens. It has multiple levels, like a +<tt>string_accumulator</tt>, +so it can be used safely in recursive procedures. + +<P>A <tt>token_translator</tt> is a species of <tt>token_sink</tt> +which converts a stream +of tokens to a stream of characters. The constructor for a +<tt>token_translator</tt> +takes a pointer to a <tt>character_sink</tt>, so that tokens handed to +a <tt>token_translator</tt> +are converted to strings and passed on to the specified character sink. + +<P>The <tt>expression_evaluator</tt> class is a class structure wrapped about +the expression evaluation module, <tt>ex.syn</tt>. It is a species of +<tt>token_sink</tt>, +so that tokens may be passed to the <tt>expression_evaluator</tt> just +as they are to a <tt>token_accumulator</tt> or a <tt>token_translator</tt>. + +<P>The <tt>c_parser</tt> class is a class structure wrapped about a C +parser module. +Implementations of this class are found in both <tt>jrc.syn</tt> and +<tt>krc.syn</tt>. The +<tt>c_parser</tt> class is also a <tt>token_sink</tt>. + +<P>The macro preprocessor uses several global variables based on the token +based classes defined above: +<PRE> + extern token_sink *scanner_sink; + extern token_accumulator ta; + extern expression_evaluator condition; +</PRE> +<tt>scanner_sink</tt> is the generic output device for the token +scanner. As the +token scanner develops tokens it sends them to the <tt>token_sink</tt> pointed +to by <tt>scanner_sink</tt>. + +<P><tt>condition</tt> is used to evaluate constant expressions in +<TT>#if</TT> and +<TT>#elif</TT> statements. The token scanner diverts its output +to the expression evaluator with the statement: +<PRE> + scanner_sink = &condition; +</PRE> +Until the <tt>scanner_sink</tt> is restored to its previous value, all +output from +the token scanner flows to the expression_evaluator, <tt>condition</tt>. + +<P><TT>ta</TT> is a token_accumulator, used in the token scanner and in +<tt>mas.syn</tt> to accumulate sequences of tokens. As with the +<tt>expression_evaluator</tt>, +output from the token scanner can be diverted to <TT>ta</TT> by means of +one simple statement: +<PRE> + scanner_sink = &ta; +</PRE> +This diversion simplifies the gathering of the tokens which comprise the +body of a macro or an argument to a macro call. + +<P>In the main program, two local variables are defined in terms of these +token based structures: +<PRE> + c_parser cp; + token_translator tt(&file); +</PRE> +Thus either <tt>cp</tt> or <tt>tt</tt> can serve as an output +destination for the token scanner. +The main program sets <tt>scanner_sink</tt> to point to one or the +other depending +on a command line switch. +</P> + +<BR> + +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> +<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software" + WIDTH=181 HEIGHT=25> +<BR CLEAR="right"> +<P> +Back to <A HREF="../../index.html">Index</A> +<P> +<ADDRESS> +<FONT SIZE=-1>AnaGram parser generator - examples</FONT> +<BR><FONT SIZE=-1>Macro preprocessor and C Parser</FONT> +<BR><FONT SIZE=-1>Copyright © 1993-1999, Parsifal Software.</FONT> +<BR><FONT SIZE=-1>All Rights Reserved.</FONT> +<BR> +</ADDRESS> +</BODY> +</HTML> +