Mercurial > ~dholland > hg > ag > index.cgi
diff doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/misc/html/examples/mpp/mas.html Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,419 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> +<HEAD> +<TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE> +</HEAD> + + +<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" + TEXT="000000" LINK="0033CC" + VLINK="CC0033" ALINK="CC0099"> + +<P> +<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram" + WIDTH=124 HEIGHT=30 > +<BR CLEAR="all"> +Back to : +<A HREF="../../index.html">Index</A> | +<A HREF="index.html">Macro preprocessor overview</A> +<P> + +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> +<H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser </H1> + +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> + +<H2>Introduction</H2> +<P> + + The Macro/Argument Substitution module, MAS.SYN, + accomplishes the following tasks: +<OL> +<LI> It can scan the body of a macro, to identify and + substitute for parameters and macro calls. </LI> +<LI> It can scan the set of arguments for a macro call + embedded within another macro, only substituting + arguments to the outer macro for parameters found + within arguments to the inner call. </LI> +<LI> It can scan the argument to a macro for macro calls + prior to substituting the argument for a parameter. </LI> +<LI> It can recognize the "##" operator and paste two tokens + together. </LI> +<LI> It can recognize the "#" operator and turn a macro + argument into a string. </LI> +</OL> + The macro/argument substitution parser, mas(), is called + from a shell function, expand_text(). expand_text() is, in + turn, called by expand_macro() and expand_arg(). Output from + mas() is accumulated on the token_accumulator, ta. +<P> +<BR> + +<H2> Theory of Operation </H2> + + The primary purpose of mas() is to scan sequences of tokens + for macro calls and parameter names and to do the indicated + substitutions. At the same time it must correctly handle the + "##" and '#' operators both of which inhibit macro expansion + of their operands. Thus the entire grammar is structured + around the requirements for these two operators. +<P> + A further complication is the handling of white space. + White space within a macro argument cannot be deleted, since + otherwise the '#' operator would not provide a correct + result. Thus, in numerous circumstances in this grammar, it + is not clear what to do with white space at the time it is + encountered. For this reason, any particular sequence of + white space tokens is saved up on a temporary stack, + space_stack, which can be output later or simply + disregarded. +<P> + Like TS.SYN, MAS.SYN must have special syntax for + accumulating the arguments of macro calls. The differences + between the two grammars arise from the fact that TS.SYN is + converting an ascii representation to a token + representation, while MAS.SYN already has token input. +<P> +<BR> + +<H2> Elements of the Macro/Argument Substitution Module </H2> + + + The remainder of this document describes the macro + definitions, the structure definitions, the static data + definitions, all configuration parameter settings, and all + non-terminal parsing tokens used in the macro/argument + substitution module. It also explains each configuration + parameter setting in the syntax file. In MAS.SYN, each + function that is defined is preceded by a short explanation + of its purpose. +<P> +<BR> + +<H2> Macro Definitions </H2> + +<DL> +<DT> INPUT_CODE + <DD> Since this grammar uses "input values" and "pointer input", + the parser needs to know how to extract the identification + code for an input token from the item identified by the + pointer. The parser expects this macro to be appropriately + defined. Here it is defined so that it extracts the id field + of the token. + +<DT> PCB + <DD> Since the "declare pcb" switch has been turned off, PCB has + to be defined manually. + +<DT> SYNTAX_ERROR + <DD> This definition of SYNTAX_ERROR overrides the default + definition provided by AnaGram. +</DL> +<P> +<BR> + +<H2> Static variables </H2> + +<DL> +<DT> active_macros + <DD> Type: stack<unsigned> +<P> + This is a multilevel stack used to keep track of which + macros have been invoked in any particular expansion. If, + after an expansion pass it is determined that the result of + a concatenation is a macro name, all the macros which have + been expanded so far are marked busy and the text is scanned + again. Once there is no need for further scans, the busy + flags are turned off. The stack is multi-level so that it + can nest easily for recursive usage. + +<DT> args + <DD> Type: token ** +<P> + "args" is an array of pointers to token strings. It is the + set of argument strings for the macro currently being + expanded. + +<DT> args_only + <DD> Type: int +<P> + "args_only" is a switch to tell the macro/argument + substitution logic only to make argument substitutions and + not to expand macros. In id_macro it is interrogated and if + set, NAME tokens are not checked to see if they identify + macros. + +<DT> mas_pcb + <DD> Type: mas_pcb_type * +<P> + This variable contains a pointer to the currently active + parser control block for mas(). It is saved, set and + restored in expand_text(). + +<DT> n_concats + <DD> Type: int +<P> + This variable is used to count the number of concatenation + operations that result in a macro name in the course of a + single scan for macros. If it is non-zero, the text is + rescanned. + +<DT> n_args + <DD> Type: int +<P> + This variable specifies the number of arguments for the + current macro being expanded. + +<DT> params + <DD> Type: unsigned * +<P> + This variable is a pointer to a list of n_args unsigned + integers. The integers are the indices in the token + dictionary of the parameter names for the macro currently + being expanded. + +<DT> space_stack + <DD> Type: token_accumulator +<P> + In a number of places in this grammar, it is necessary to + pass over white space tokens without knowing whether they + are to be output or disregarded. "space_stack" provides a + place to store them temporarily until the decision can be + made. Remember that within macro arguments spaces can be + significant, and therefore must not be discarded + prematurely. +</DL> +<P> +<BR> + +<H2> Configuration Parameters </H2> +<DL> + +<DT> ~allow macros + <DD> This statement turns off the allow macros switch so that + AnaGram implements all reduction procedures as explicit + function definitions. This simplifies debugging at the cost + of a slight performance degradation. + +<DT> ~backtrack + <DD> This statement turns off the backtrack switch. This means + that if the token scanner encounters a syntax error, it will + not undo default reductions that may have been caused by the + bad input before it generates diagnostics. + +<DT> context type = location + <DD> This statement specifies that the generated parser is to + track context automatically. The context variables have type + "location". location is defined elsewhere to consist of two + fields: line number and column number. + +<DT> ~declare pcb + <DD> This statement tells AnaGram not to declare a parser control + block for the parser. Access to the parser control block is + through a pointer. Actual allocation of storage and setting + of the pointer takes place in expand_text(). + +<DT> default input type = token + <DD> This statement tells AnaGram how to code reduction procedure + calls that involve input tokens. + +<DT> enum + <DD> This enumeration statement provides definitions for terminal + tokens. The same enum statement is found in EX.SYN, in + JRC.SYN, in KRC.SYN and in MPP.H. + +<DT> ~error frame + <DD> This turns off the error frame portion of the automatic + syntax error diagnostic generator, since the context of the + error in the macro substition syntax is of little interest. + If an error frame were to be used in diagnostics that of the + C parser would be more appropriate. + +<DT> error trace + <DD> This turns on the error trace function, so that if the token + scanner encounters a syntax error it will write an .etr + file. + +<DT> input values + <DD> This switch tells AnaGram that the input units carry some + baggage, that they have values apart from their identifying + code. Since this grammar uses pointer input, an INPUT_CODE + macro must also be defined. + +<DT> ~lines and columns + <DD> Turns off the lines and columns switch so your parser won't + try to track them here where they certainly make no sense. + +<DT> line numbers + <DD> This statement causes AnaGram to include #line statements in + the parser file so that your compiler can provided + diagnostics keyed to your syntax file. + +<DT> pointer input + <DD> This statement tells AnaGram that the input to mas() is an + array in memory that can be scanned simply by incrementing a + pointer. Since the input tokens are not simply characters, a + pointer type statement is required and the INPUT_CODE macro + must be defined. + +<DT> pointer type = token * + <DD> This statement provides the C data type of the input units + to the parser. If this statement were omitted, pointer type + would default to unsigned char *, and your compiler would + scold when it tried to compile the parser. + +<DT> subgrammar parse unit + <DD> This statement tells AnaGram that the specifications for + parse unit are internallly complete and that it should not + determine reductions by inspecting following tokens. + +<DT> ~test range + <DD> This statement tells AnaGram not to check input characters + to see if they are within allowable limits. This checking is + not necessary since the input to mas() has been generated in + such a way that it cannot possibly get an out of range + token. +</DL> +<P> +<BR> + +<H2> Grammar Tokens </H2> +<DL> +<DT> arg element + <DD> An "arg element" is a discrete token in an argument to a + macro or a sequence of "nested elements". + +<DT> arg elements + <DD> A sequence of "arg element". + +<DT> concatenation + <DD> These productions implement the "##" operator. "parameter + name" is distinguished on the right side to avoid improper + macro expansions. + +<DT> defined + <DD> See "variable". "defined" is the special operator available + in #if and #elif statements to determine whether a macro has + been defined. It is recognized only if "if_clause" has been + set. + +<DT> grammar + <DD> "grammar" simply describes the complete input to MAS. Since + "grammar" is a special name recognized by AnaGram, there is + no need for any further specification of the "start token" + for the grammar. + +<DT> left side + <DD> This refers to a "##" operator, its left operand, and any + intervening white space. The cross recursion with + "concatenation" allows for constructs of the form A ## B ## + C, with grouping to the left. + +<DT> macro + <DD> See "variable". This grammar distinguishes between a "simple + macro" which was defined without any parameter list, and a + "macro" which had an explicit, although perhaps empty, + parameter list. If a "macro" appears without following + parentheses, it is simply passed on without being expanded. + +<DT> macro arg list + <DD> This token counts the number of arguments to a macro, and + stacks them on so many levels of the token accumulator. The + logic is essentially the same as for macro arg list in + TS.SYN. + +<DT> nested elements + <DD> The "nested elements" token represents a sequence of macro + argument tokens enclosed in matching parentheses. + +<DT> not parameter + <DD> This token exists simply to avoid multiple copies of the + same reduction procedure. + +<DT> parameter expansion + <DD> "parameter expansion" is simply a device to defer the + replacement of a macro parameter name until it has been + determined whether it is followed by "##", with perhaps some + intervening white space. + +<DT> parameter name + <DD> See <STRONG>variable</STRONG>. + +<DT> parse unit + <DD> The major problem in expanding a macro body is dealing with + the "##" operator. Since the arguments of ## are not to have + their macros expanded, but only macro arguments replaced, + they have to be dealt with specially. Thus "parse unit" + distinguishes those tokens which are not macro parameters, + "simple parse units" from the parameters and allows for + recognition of the concatenation operator before it goes + ahead and allows complete expansion of a macro parameter. + +<DT> right side + <DD> This token consists of anything that can follow "##" with + the exception of a parameter name which needs special + treatment. If the token is a macro name, the macro is not + expanded. + +<DT> simple macro + <DD> See variable. A "simple macro" is one which was defined + without any following parameter list. + +<DT> simple parse unit + <DD> "simple parse unit" consists of the input constructs which + do not immediately involve macro parameters. It allows for + complete macro expansion. + +<DT> space + <DD> The token scanner passes spaces along because they cannot be + discarded until after macros have been expanded. This is + because of the # operator which turns macro arguments into + strings. +<P> + If the args_only flag is set, spaces have to be passed on to + the output. Otherwise they can be discarded. These + productions accumulate space tokens on a stack, so that the + decision to output them or to discard them can be deferred. + If they are not output, they will effectively be discarded + by the reset() operation the next time a sequence of spaces + is encountered. + +<DT> variable + <DD> Since a NAME token can, depending on circumstance, name a + parameter a macro or the "defined" operator, a semantically + determined production is used to make the distinction. + "variable" is the outcome for NAME tokens that are simply to + be passed on without any special treatment. +</DL> +<P> + +<BR> + +<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> +<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software" + WIDTH=181 HEIGHT=25> +<BR CLEAR="right"> + +<P> +Back to : +<A HREF="../../index.html">Index</A> | +<A HREF="index.html">Macro preprocessor overview</A> +<P> + +<P> +<ADDRESS><FONT SIZE="-1"> + AnaGram parser generator - examples<BR> + Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR> + Copyright © 1993-1999, Parsifal Software. <BR> + All Rights Reserved.<BR> +</FONT></ADDRESS> + +</BODY> +</HTML> +