diff doc/misc/html/examples/mpp/index.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/misc/html/examples/mpp/index.html	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,601 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+   <TITLE> Macro preprocessor and C Parser </TITLE>
+</HEAD>
+
+
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+ TEXT="#000000" LINK="#0033CC"
+ VLINK="#CC0033" ALINK="#CC0099">
+
+<P>
+<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
+         WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to <A HREF="../../index.html">Index</A>
+<P>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+
+<H1>Macro preprocessor and C Parser</H1>
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+
+<H2>Introduction</H2>
+
+This document provides an overview of the entire macro preprocessor example.
+Since the example consists of a number of modules, there is also a separate
+document file for each module. These document files provide an overview
+of the module and detailed descriptions of the variables, data structures
+and syntactic elements associated with the module.
+
+<P>This implementation of a C macro preprocessor demonstrates:
+<UL>
+<LI>
+the use of AnaGram in a real-world problem of considerable complexity.</LI>
+
+<LI>
+the use of AnaGram in a C++ environment.</LI>
+</UL>
+It was felt that only a fairly complex problem would adequately demonstrate
+the power of AnaGram. This example, therefore, may not be particularly
+easy to grasp or to understand in its entirety.
+
+<P>However, it is not necessary to understand all facets of this example
+to make good use of it. If you skim over it, you will see examples of many
+common syntactic constructs. You will find that in many cases you can copy
+these constructs verbatim and incorporate them directly into your own programs.
+
+<P>A number of AnaGram's features and options are well illustrated. This
+example makes use of four separate syntaxes to deal with the preprocessing
+so that the complete program, with one or another of the C parsers linked
+in, contains five separate parsers. There are, therefore, numerous examples
+of interfacing a parser to the rest of a program. In particular, several
+of the parsers are configured as C++ classes.
+
+<P>Other AnaGram features, such as semantically determined productions
+and context tracking, are used to good avail, particularly in the token
+scanner, which also illustrates the use of AnaGram to write lexical scanners.
+
+<P>In addition to the macro preprocessor, this example provides a choice
+of two C parsers which have been interfaced to the preprocessor. These
+parsers are simply syntax checkers. They have essentially no reduction
+procedures except for enough to give them rudimentary (and not fully
+correct) capabilities for
+coping with typedef types. You may, of course, add your own reduction procedures
+to adapt them to your needs.
+<P>
+Note that this macro preprocessor is not particularly standards
+compliant; if you feed it difficult or pedantic test cases, it will
+probably give you wrong output.
+<P>
+<BR>
+
+<H2>
+Components of the Macro Preprocessor</H2>
+<TABLE WIDTH="100%">
+
+<TR>
+<TD COLSPAN=4>
+The macro preprocessor example comprises the following modules:
+<BR><BR>
+</TD>
+</TR>
+
+<TR>
+<td rowspan=10 width="4%">&nbsp;</td>
+<TD><tt><A HREF=mpp.html>mpp.cpp</A></tt></TD>
+<td rowspan=10 width="4%">&nbsp;</td>
+
+<TD>data declarations and main program</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=mpp.html>mpp.h</A></tt></TD>
+
+<TD>Structure definitions, data and function declarations</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=token.html>token.cpp</A></tt></TD>
+
+<TD>token class function definitions</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=token.html>token.h</A></tt></TD>
+
+<TD>Token class definitions</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=ts.html>ts.syn</A></tt></TD>
+
+<TD>token scanner</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=mas.html>mas.syn</A></tt></TD>
+
+<TD>macro and argument substitution module</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=ex.html>ex.syn</A></tt></TD>
+
+<TD>constant expression evaluator</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=ct.html>ct.syn</A></tt></TD>
+
+<TD>token classifier</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=parsers.html>jrc.syn</A></tt></TD>
+
+<TD>C parser, based on C grammar by James A. Roskind</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=parsers.html>krc.syn</A></tt></TD>
+
+<TD>C parser, based on C grammar in K &amp; R, section A13</TD>
+</TR>
+
+<!--
+<P>
+Here are links to the corresponding
+document files:
+<CENTER><A HREF="mpp.html">MPP&nbsp;</A> | <A HREF="token.html">TOKEN</A>
+| <A HREF="ts.html">TS</A> | <A HREF="mas.html">MAS</A> | <A HREF="ex.html">EX</A>
+| <A HREF="ct.html">CT</A> | <A HREF="parsers.html">PARSERS</A></CENTER>
+-->
+
+<TR>
+<TD COLSPAN=4>
+<BR>
+In addition, the following modules found in the <tt>oldclasslib</tt>
+directory provide supporting functions:
+<BR><BR>
+</TD>
+</TR>
+
+<TR>
+<td rowspan=6 width="4%">&nbsp;</td>
+<TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.cpp</A></tt></TD>
+<td rowspan=6 width="4%">&nbsp;</td>
+
+<TD>Character sink support</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=../../oldclasslib/charsink.html>charsink.h</A></tt></TD>
+
+<TD>Character sink class definitions</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.cpp</A></tt></TD>
+
+<TD>String dictionary support</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=../../oldclasslib/strdict.html>strdict.h</A></tt></TD>
+
+<TD>String dictionary class definition</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=../../oldclasslib/array.html>array.h</A></tt></TD>
+
+<TD>Array class definition</TD>
+</TR>
+
+<TR>
+<TD><tt><A HREF=../../oldclasslib/stack.html>stack.h</A></tt></TD>
+
+<TD>Stack class definition</TD>
+</TR>
+
+</TABLE>
+
+<!--
+Here are links to the corresponding
+document files:
+<CENTER><A HREF="../../oldclasslib/charsink.html">CHARSINK</A> | <A HREF="../../oldclasslib/array.html">ARRAY</A>
+| <A HREF="../../oldclasslib/stack.html">STACK</A> | <A HREF="../../oldclasslib/strdict.html">STRDICT</A></CENTER>
+-->
+
+<P>
+<BR>
+<H2>
+Data Flow in the Macro Preprocessor</H2>
+Of the four parsers that make up the macro preprocessor itself, three are
+simply operators which transform their input:
+<UL>
+<LI>
+MAS transforms a token string (e.g., the body of a macro) into another
+token string (e.g., the expansion of the macro). MAS is called only from
+TS, and, recursively, from itself.</LI>
+
+<LI>
+EX transforms a token string (e.g., the text of a conditional expression)
+into a long integer (e.g., the value of the expression). EX is called only
+from TS.</LI>
+
+<LI>
+CT transforms a character string (ostensibly a C token) into a type identification
+code (e.g., STRINGliteral, identifier, etc.). CT is called only from MAS.</LI>
+</UL>
+The fourth is the token scanner, TS, which controls the entire process.
+The relationships are illustrated in the diagrams below which show the
+type direction of data flow among the modules.
+</P>
+<BR>
+<H3>
+Relationship between Token Scanner, Macro/Argument Scanner and Token Classifier
+modules:</H3>
+
+<CENTER><IMG SRC="reltmt24.gif" ALT="TS, translator, and output diagram" ></CENTER>
+<P>
+<BR>
+
+
+<H3>
+Relationship between Token Scanner and Expression Evaluator:</H3>
+
+<CENTER><IMG SRC="relte24.gif" ALT="TS, translator, and output diagram" ></CENTER>
+<P>
+<BR>
+
+<H3>
+Relationship between Token Scanner, token translator and output file:</H3>
+
+<CENTER><IMG SRC="reltto24.gif" ALT="TS, translator, and output diagram" ></CENTER>
+<P>
+<BR>
+<H3>
+Relationship between Token Scanner and C Parser:</H3>
+
+<CENTER><IMG SRC="reltc24.gif" ALT="TS, translator, and output diagram" ></CENTER>
+<P>
+
+<BR>
+<H2>
+Building and Running the Macro Preprocessor</H2>
+To make a working version of the macro preprocessor you need to take the
+following steps:
+<OL>
+<LI>
+Run AnaGram and build parsers for TS, MAS, CT, and EX.</LI>
+
+<LI>
+Choose which C grammar you would like to use (JRC or KRC), run AnaGram, and
+build a parser for your choice.</LI>
+
+<LI>
+If you are using JRC, edit the <tt>#include</tt> near the top of
+<tt>mpp.h</tt> to load <tt>jrc.h</tt> instead of <tt>krc.h</tt>.
+
+<LI>
+Make sure your compiler can find include files from
+<tt>oldclasslib/include</tt>.</LI>
+
+<LI>
+Then, compile and link the following modules:</LI>
+
+<BR><tt>mpp.cpp</tt>
+<BR><tt>token.cpp</tt>
+<BR><tt>ts.cpp</tt>
+<BR><tt>mas.cpp</tt>
+<BR><tt>ct.cpp</tt>
+<BR><tt>ex.cpp</tt>
+<BR><tt>krc.cpp</tt> or <tt>jrc.cpp</tt>
+<BR><tt>oldclasslib/source/charsink.cpp</tt>
+<BR><tt>oldclasslib/source/strdict.cpp</tt>
+</OL>
+Now you can run the macro preprocessor.
+
+<P>The command line syntax is as follows:
+<PRE>
+    mpp [-c] [-n] &lt;input file name&gt; [&lt;output file name&gt;]
+</PRE>
+The -c switch causes output of the preprocessor to be directed to the C
+parser you have included, rather than to an output file.
+
+<P>The -n switch allows the recognition of nested comments.
+
+<P>If you do not set the -c switch and do not specify an output file name,
+output will be directed to stdout.
+
+<P>
+<BR>
+<H2>
+Theory of Operation</H2>
+This implementation of a macro preprocessor is based on the description
+of preprocessing given in Section A12, Appendix A, of "The C Programming
+Language", Second Edition, by Kernighan and Ritchie, Prentice-Hall, 1988.
+
+<P>The preprocessor itself comprises four modules: A token scanner,
+<tt>ts.syn</tt>;
+a macro/argument substitution module, <tt>mas.syn</tt>; a token
+classifier, <tt>ct.syn</tt>;
+and an expression evaluator, <tt>ex.syn</tt>. These modules, working
+together, deal
+with conditional compilation, include files, macro definition, and macro
+expansion. The output of the preprocessor may be directed to stdout, to
+a file, or to either of two C parsers, depending on which you choose to
+link into your version of the program.
+
+<P>Two of the modules, <tt>ts.syn</tt> and <tt>mas.syn</tt> do most of
+the work. <tt>ts.syn</tt> breaks 
+the input into a sequence of "tokens" as defined by section A2.1 in Kernighan
+and Ritchie. It also determines the syntactic type of each such token.
+Descriptors, consisting of a type identifier and a storage handle, are
+then used as the units for further processing. <tt>ts.syn</tt> also handles the
+conditional compilation logic and fields macro definitions. When it encounters
+a macro call, it enlists <tt>mas.syn</tt> to expand the macro.
+
+<P><tt>ex.syn</tt> exists only to evaluate the conditional expressions
+in <TT>#if</TT> and <TT>#elif </TT>control statements. <tt>ct.syn</tt>
+is used only when a 
+new token has been created during macro expansion. The "<TT>##</TT>" operator
+requires that two tokens be pasted together to make a single token. 
+<tt>ct.syn</tt>
+is then used to determine what manner of beast has been created.
+
+<P>
+<BR>
+<H2>
+Supporting Class Libraries</H2>
+The macro preprocessor uses a number of simple data structures implemented
+as C++ classes to record and analyze the data generated by the parsers.
+Some of these structures are of general utility and are found in
+the <A HREF="../../oldclasslib/index.html">oldclasslib</A> directory.
+The others are specific to the preprocessor and are to be found in the
+files <tt>token.h</tt> and <tt>token.cpp</tt> with the rest of the
+preprocessor files.
+
+<P>
+<BR>
+<H2>
+General Purpose Data Structures</H2>
+The general purpose data structures are the following:
+<UL>
+<LI><tt>character_sink</tt></LI>
+<LI><tt>string_accumulator</tt></LI>
+<LI><tt>output_file</tt></LI>
+<LI><tt>array&lt;class T&gt;</tt></LI>
+<LI><tt>stack&lt;class T&gt;</tt></LI>
+<LI><tt>string_dictionary</tt></LI>
+</UL>
+A <tt>character_sink</tt> is an abstract class. It represents simply a
+general purpose
+character output device which can be plugged in to any character generator
+to accept its output.
+
+<P>A <tt>string_accumulator</tt> is a species of
+<tt>character_sink</tt>, which can store
+up characters as they arrive. It has multiple levels, so it can be used
+in recursive contexts without any confusion.
+
+<P>An <tt>output_file</tt> is another species of
+<tt>character_sink</tt>. It is simply a
+very simple implementation of stream output, set up so that it can be used
+interchangeably with other kinds of <tt>character_sink</tt>.
+
+<P><tt>array</tt> is a template class that simplifies the allocation
+and freeing of local storage for arrays of arbitrary type.
+
+<P>A <tt>stack</tt> is a template class that provides for
+multi-leveled push-down stacks of arbitrary types of data.
+
+<P>A <tt>string_dictionary</tt> is a device for associating a unique
+integer handle
+with a string so that the integer handle may be used as an alias for the
+string.
+
+<P>All of these classes use operator overloading in a consistent manner:
+
+<P><TT>&lt;&lt; </TT>is used to add data to an entity, for example, to
+push data onto a stack, to add a string to a string dictionary, to add
+data to a string accumulator, to send data to an output file, or to transmit
+data to a parser. In all cases, <TT>&lt;&lt; </TT>may be chained:
+<PRE>        ta &lt;&lt; s1 &lt;&lt; s2;</PRE>
+<TT>&gt;&gt; </TT>is used to remove data from an entity, in particular, to pop
+something from a stack, or to remove a character from a string accumulator.
+Like " &lt;&lt; ", "&gt;&gt;" may be chained:
+<PRE>        ta &gt;&gt; s1 &gt;&gt; s2;</PRE>
+<TT>++ </TT>is used with string accumulators and with stacks to increment
+the level number. It is defined only as a pre-increment operator.
+
+<P><TT>-- </TT>is used with string accumulators and with stacks to decrement
+the level number. It is defined only as a pre-decrement operator.
+
+<P><TT>[] </TT>is used to access a particular item. In the case of the
+string dictionary, <TT>[] </TT>with a string argument returns the handle,
+or zero, if the string is not in the dictionary. <TT>[] </TT>with a handle
+returns a pointer to the string. In the case of the "array" class, <TT>[]
+</TT>provides access to a single element and checks for out of bounds references.
+
+<P>Cast operators are also overloaded to provide simple access to the data
+stored in an instance of a class.
+
+<P>Several overloaded functions are defined consistently where they are
+defined at all:
+<TABLE WIDTH="100%">
+
+<TR>
+<TD ROWSPAN=3 WIDTH="4%">
+<TD><tt>reset(</tt><i>object</i><tt>)</tt></TD>
+<TD>restores initial state&nbsp;</TD>
+</TR>
+
+<TR>
+<TD><tt>size(</tt><i>object</i><tt>)</tt></TD>
+<TD>returns size&nbsp;</TD>
+</TR>
+
+<TR>
+<TD><tt>error(</tt><i>object</i><tt>)</tt>&nbsp;</TD>
+<TD>returns error flag&nbsp;</TD>
+</TR>
+
+</TABLE>
+The macro preprocessor uses instances of the above classes for global data
+storage and manipulation:
+<PRE>
+    extern stack&lt;char *&gt;     paths;
+    extern string_accumulator   sa;
+    extern string_dictionary    td;
+</PRE>
+<TT>paths </TT>is used to hold a list of search paths to look for include
+files whose names are enclosed in angle brackets.
+
+<P><TT>sa </TT>is used in the token scanner, to accumulate the strings
+that constitute C tokens. Once complete, each string is added to the string_dictionary
+<TT>td </TT>to get a handle which identifies the string uniquely. <TT>td
+</TT>is generally referred to as the "token dictionary".
+
+<P>In the main program, an output file is defined in terms of these classes:
+<PRE>   output_file file;</PRE>
+
+<P>
+<BR>
+<H2>
+Token Classes</H2>
+A number of class and structure definitions specific to the macro preprocessor
+are given in <tt>token.h</tt>. Member functions are defined in
+<tt>token.cpp</tt>.
+
+<P>The definitions in <tt>token.h</tt> are geared toward the
+transmission and sharing
+of data among the modules that make up the macro preprocessor. An enumeration
+statement defines enumeration constants for all the different kinds of
+terminal tokens a C parser can expect to see. These enumeration constants
+are defined to be of type <tt>token_id</tt>.
+
+<!-- this sentence needs to be shot. -->
+<P>A structure definition defines a token as a pair consisting of a
+<tt>token_id</tt>,
+and an unsigned integer which represents the handle in the token dictionary
+of the string of characters that constitutes the actual token as defined
+in K&amp;R.
+
+<P>Then, to facilitate working with these tokens, a set of classes is
+defined using the <tt>character_sink</tt> class and its derived
+classes <!-- more or less --> as a model:
+
+<UL>
+<LI><tt>token_sink</tt></LI>
+<LI><tt>token_accumulator</tt></LI>
+<LI><tt>token_translator</tt></LI>
+<LI><tt>expression_evaluator</tt></LI>
+<LI><tt>c_parser</tt></LI>
+</UL>
+
+Like <tt>character_sink</tt>, <tt>token_sink</tt> is an abstract class
+that serves
+as a general purpose output device for processes which create a stream
+of tokens.
+
+<P>A <tt>token_accumulator</tt> is a species of <tt>token_sink</tt>.
+It is a repository for
+sequences of tokens. It has multiple levels, like a
+<tt>string_accumulator</tt>,
+so it can be used safely in recursive procedures.
+
+<P>A <tt>token_translator</tt> is a species of <tt>token_sink</tt>
+which converts a stream
+of tokens to a stream of characters. The constructor for a
+<tt>token_translator</tt>
+takes a pointer to a <tt>character_sink</tt>, so that tokens handed to
+a <tt>token_translator</tt>
+are converted to strings and passed on to the specified character sink.
+
+<P>The <tt>expression_evaluator</tt> class is a class structure wrapped about
+the expression evaluation module, <tt>ex.syn</tt>. It is a species of
+<tt>token_sink</tt>,
+so that tokens may be passed to the <tt>expression_evaluator</tt> just
+as they are to a <tt>token_accumulator</tt> or a <tt>token_translator</tt>.
+
+<P>The <tt>c_parser</tt> class is a class structure wrapped about a C
+parser module.
+Implementations of this class are found in both <tt>jrc.syn</tt> and
+<tt>krc.syn</tt>. The
+<tt>c_parser</tt> class is also a <tt>token_sink</tt>.
+
+<P>The macro preprocessor uses several global variables based on the token
+based classes defined above:
+<PRE>
+    extern token_sink            *scanner_sink;
+    extern token_accumulator     ta;
+    extern expression_evaluator  condition;
+</PRE>
+<tt>scanner_sink</tt> is the generic output device for the token
+scanner. As the
+token scanner develops tokens it sends them to the <tt>token_sink</tt> pointed
+to by <tt>scanner_sink</tt>.
+
+<P><tt>condition</tt> is used to evaluate constant expressions in
+<TT>#if</TT> and
+<TT>#elif</TT> statements. The token scanner diverts its output
+to the expression evaluator with the statement:
+<PRE>
+      scanner_sink = &amp;condition;
+</PRE>
+Until the <tt>scanner_sink</tt> is restored to its previous value, all
+output from
+the token scanner flows to the expression_evaluator, <tt>condition</tt>.
+
+<P><TT>ta</TT> is a token_accumulator, used in the token scanner and in
+<tt>mas.syn</tt> to accumulate sequences of tokens. As with the
+<tt>expression_evaluator</tt>,
+output from the token scanner can be diverted to <TT>ta</TT> by means of
+one simple statement:
+<PRE>
+    scanner_sink = &amp;ta;
+</PRE>
+This diversion simplifies the gathering of the tokens which comprise the
+body of a macro or an argument to a macro call.
+
+<P>In the main program, two local variables are defined in terms of these
+token based structures:
+<PRE>
+    c_parser            cp;
+    token_translator    tt(&amp;file);
+</PRE>
+Thus either <tt>cp</tt> or <tt>tt</tt> can serve as an output
+destination for the token scanner.
+The main program sets <tt>scanner_sink</tt> to point to one or the
+other depending
+on a command line switch.
+</P>
+
+<BR>
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+      WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
+                WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+<P>
+Back to <A HREF="../../index.html">Index</A>
+<P>
+<ADDRESS>
+<FONT SIZE=-1>AnaGram parser generator - examples</FONT>
+<BR><FONT SIZE=-1>Macro preprocessor and C Parser</FONT>
+<BR><FONT SIZE=-1>Copyright &copy; 1993-1999, Parsifal Software.</FONT>
+<BR><FONT SIZE=-1>All Rights Reserved.</FONT>
+<BR>
+</ADDRESS>
+</BODY>
+</HTML>
+