Mercurial > ~dholland > hg > ag > index.cgi
view doc/misc/html/examples/mpp/mas.html @ 16:f9e4689b837d
Some minor updates for 15 years later.
author | David A. Holland |
---|---|
date | Tue, 31 May 2022 01:45:26 -0400 (2022-05-31) |
parents | 13d2b8934445 |
children |
line wrap: on
line source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE> </HEAD> <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" TEXT="000000" LINK="0033CC" VLINK="CC0033" ALINK="CC0099"> <P> <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram" WIDTH=124 HEIGHT=30 > <BR CLEAR="all"> Back to : <A HREF="../../index.html">Index</A> | <A HREF="index.html">Macro preprocessor overview</A> <P> <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser </H1> <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <H2>Introduction</H2> <P> The Macro/Argument Substitution module, MAS.SYN, accomplishes the following tasks: <OL> <LI> It can scan the body of a macro, to identify and substitute for parameters and macro calls. </LI> <LI> It can scan the set of arguments for a macro call embedded within another macro, only substituting arguments to the outer macro for parameters found within arguments to the inner call. </LI> <LI> It can scan the argument to a macro for macro calls prior to substituting the argument for a parameter. </LI> <LI> It can recognize the "##" operator and paste two tokens together. </LI> <LI> It can recognize the "#" operator and turn a macro argument into a string. </LI> </OL> The macro/argument substitution parser, mas(), is called from a shell function, expand_text(). expand_text() is, in turn, called by expand_macro() and expand_arg(). Output from mas() is accumulated on the token_accumulator, ta. <P> <BR> <H2> Theory of Operation </H2> The primary purpose of mas() is to scan sequences of tokens for macro calls and parameter names and to do the indicated substitutions. At the same time it must correctly handle the "##" and '#' operators both of which inhibit macro expansion of their operands. Thus the entire grammar is structured around the requirements for these two operators. <P> A further complication is the handling of white space. White space within a macro argument cannot be deleted, since otherwise the '#' operator would not provide a correct result. Thus, in numerous circumstances in this grammar, it is not clear what to do with white space at the time it is encountered. For this reason, any particular sequence of white space tokens is saved up on a temporary stack, space_stack, which can be output later or simply disregarded. <P> Like TS.SYN, MAS.SYN must have special syntax for accumulating the arguments of macro calls. The differences between the two grammars arise from the fact that TS.SYN is converting an ascii representation to a token representation, while MAS.SYN already has token input. <P> <BR> <H2> Elements of the Macro/Argument Substitution Module </H2> The remainder of this document describes the macro definitions, the structure definitions, the static data definitions, all configuration parameter settings, and all non-terminal parsing tokens used in the macro/argument substitution module. It also explains each configuration parameter setting in the syntax file. In MAS.SYN, each function that is defined is preceded by a short explanation of its purpose. <P> <BR> <H2> Macro Definitions </H2> <DL> <DT> INPUT_CODE <DD> Since this grammar uses "input values" and "pointer input", the parser needs to know how to extract the identification code for an input token from the item identified by the pointer. The parser expects this macro to be appropriately defined. Here it is defined so that it extracts the id field of the token. <DT> PCB <DD> Since the "declare pcb" switch has been turned off, PCB has to be defined manually. <DT> SYNTAX_ERROR <DD> This definition of SYNTAX_ERROR overrides the default definition provided by AnaGram. </DL> <P> <BR> <H2> Static variables </H2> <DL> <DT> active_macros <DD> Type: stack<unsigned> <P> This is a multilevel stack used to keep track of which macros have been invoked in any particular expansion. If, after an expansion pass it is determined that the result of a concatenation is a macro name, all the macros which have been expanded so far are marked busy and the text is scanned again. Once there is no need for further scans, the busy flags are turned off. The stack is multi-level so that it can nest easily for recursive usage. <DT> args <DD> Type: token ** <P> "args" is an array of pointers to token strings. It is the set of argument strings for the macro currently being expanded. <DT> args_only <DD> Type: int <P> "args_only" is a switch to tell the macro/argument substitution logic only to make argument substitutions and not to expand macros. In id_macro it is interrogated and if set, NAME tokens are not checked to see if they identify macros. <DT> mas_pcb <DD> Type: mas_pcb_type * <P> This variable contains a pointer to the currently active parser control block for mas(). It is saved, set and restored in expand_text(). <DT> n_concats <DD> Type: int <P> This variable is used to count the number of concatenation operations that result in a macro name in the course of a single scan for macros. If it is non-zero, the text is rescanned. <DT> n_args <DD> Type: int <P> This variable specifies the number of arguments for the current macro being expanded. <DT> params <DD> Type: unsigned * <P> This variable is a pointer to a list of n_args unsigned integers. The integers are the indices in the token dictionary of the parameter names for the macro currently being expanded. <DT> space_stack <DD> Type: token_accumulator <P> In a number of places in this grammar, it is necessary to pass over white space tokens without knowing whether they are to be output or disregarded. "space_stack" provides a place to store them temporarily until the decision can be made. Remember that within macro arguments spaces can be significant, and therefore must not be discarded prematurely. </DL> <P> <BR> <H2> Configuration Parameters </H2> <DL> <DT> ~allow macros <DD> This statement turns off the allow macros switch so that AnaGram implements all reduction procedures as explicit function definitions. This simplifies debugging at the cost of a slight performance degradation. <DT> ~backtrack <DD> This statement turns off the backtrack switch. This means that if the token scanner encounters a syntax error, it will not undo default reductions that may have been caused by the bad input before it generates diagnostics. <DT> context type = location <DD> This statement specifies that the generated parser is to track context automatically. The context variables have type "location". location is defined elsewhere to consist of two fields: line number and column number. <DT> ~declare pcb <DD> This statement tells AnaGram not to declare a parser control block for the parser. Access to the parser control block is through a pointer. Actual allocation of storage and setting of the pointer takes place in expand_text(). <DT> default input type = token <DD> This statement tells AnaGram how to code reduction procedure calls that involve input tokens. <DT> enum <DD> This enumeration statement provides definitions for terminal tokens. The same enum statement is found in EX.SYN, in JRC.SYN, in KRC.SYN and in MPP.H. <DT> ~error frame <DD> This turns off the error frame portion of the automatic syntax error diagnostic generator, since the context of the error in the macro substition syntax is of little interest. If an error frame were to be used in diagnostics that of the C parser would be more appropriate. <DT> error trace <DD> This turns on the error trace function, so that if the token scanner encounters a syntax error it will write an .etr file. <DT> input values <DD> This switch tells AnaGram that the input units carry some baggage, that they have values apart from their identifying code. Since this grammar uses pointer input, an INPUT_CODE macro must also be defined. <DT> ~lines and columns <DD> Turns off the lines and columns switch so your parser won't try to track them here where they certainly make no sense. <DT> line numbers <DD> This statement causes AnaGram to include #line statements in the parser file so that your compiler can provided diagnostics keyed to your syntax file. <DT> pointer input <DD> This statement tells AnaGram that the input to mas() is an array in memory that can be scanned simply by incrementing a pointer. Since the input tokens are not simply characters, a pointer type statement is required and the INPUT_CODE macro must be defined. <DT> pointer type = token * <DD> This statement provides the C data type of the input units to the parser. If this statement were omitted, pointer type would default to unsigned char *, and your compiler would scold when it tried to compile the parser. <DT> subgrammar parse unit <DD> This statement tells AnaGram that the specifications for parse unit are internallly complete and that it should not determine reductions by inspecting following tokens. <DT> ~test range <DD> This statement tells AnaGram not to check input characters to see if they are within allowable limits. This checking is not necessary since the input to mas() has been generated in such a way that it cannot possibly get an out of range token. </DL> <P> <BR> <H2> Grammar Tokens </H2> <DL> <DT> arg element <DD> An "arg element" is a discrete token in an argument to a macro or a sequence of "nested elements". <DT> arg elements <DD> A sequence of "arg element". <DT> concatenation <DD> These productions implement the "##" operator. "parameter name" is distinguished on the right side to avoid improper macro expansions. <DT> defined <DD> See "variable". "defined" is the special operator available in #if and #elif statements to determine whether a macro has been defined. It is recognized only if "if_clause" has been set. <DT> grammar <DD> "grammar" simply describes the complete input to MAS. Since "grammar" is a special name recognized by AnaGram, there is no need for any further specification of the "start token" for the grammar. <DT> left side <DD> This refers to a "##" operator, its left operand, and any intervening white space. The cross recursion with "concatenation" allows for constructs of the form A ## B ## C, with grouping to the left. <DT> macro <DD> See "variable". This grammar distinguishes between a "simple macro" which was defined without any parameter list, and a "macro" which had an explicit, although perhaps empty, parameter list. If a "macro" appears without following parentheses, it is simply passed on without being expanded. <DT> macro arg list <DD> This token counts the number of arguments to a macro, and stacks them on so many levels of the token accumulator. The logic is essentially the same as for macro arg list in TS.SYN. <DT> nested elements <DD> The "nested elements" token represents a sequence of macro argument tokens enclosed in matching parentheses. <DT> not parameter <DD> This token exists simply to avoid multiple copies of the same reduction procedure. <DT> parameter expansion <DD> "parameter expansion" is simply a device to defer the replacement of a macro parameter name until it has been determined whether it is followed by "##", with perhaps some intervening white space. <DT> parameter name <DD> See <STRONG>variable</STRONG>. <DT> parse unit <DD> The major problem in expanding a macro body is dealing with the "##" operator. Since the arguments of ## are not to have their macros expanded, but only macro arguments replaced, they have to be dealt with specially. Thus "parse unit" distinguishes those tokens which are not macro parameters, "simple parse units" from the parameters and allows for recognition of the concatenation operator before it goes ahead and allows complete expansion of a macro parameter. <DT> right side <DD> This token consists of anything that can follow "##" with the exception of a parameter name which needs special treatment. If the token is a macro name, the macro is not expanded. <DT> simple macro <DD> See variable. A "simple macro" is one which was defined without any following parameter list. <DT> simple parse unit <DD> "simple parse unit" consists of the input constructs which do not immediately involve macro parameters. It allows for complete macro expansion. <DT> space <DD> The token scanner passes spaces along because they cannot be discarded until after macros have been expanded. This is because of the # operator which turns macro arguments into strings. <P> If the args_only flag is set, spaces have to be passed on to the output. Otherwise they can be discarded. These productions accumulate space tokens on a stack, so that the decision to output them or to discard them can be deferred. If they are not output, they will effectively be discarded by the reset() operation the next time a sequence of spaces is encountered. <DT> variable <DD> Since a NAME token can, depending on circumstance, name a parameter a macro or the "defined" operator, a semantically determined production is used to make the distinction. "variable" is the outcome for NAME tokens that are simply to be passed on without any special treatment. </DL> <P> <BR> <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software" WIDTH=181 HEIGHT=25> <BR CLEAR="right"> <P> Back to : <A HREF="../../index.html">Index</A> | <A HREF="index.html">Macro preprocessor overview</A> <P> <P> <ADDRESS><FONT SIZE="-1"> AnaGram parser generator - examples<BR> Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR> Copyright © 1993-1999, Parsifal Software. <BR> All Rights Reserved.<BR> </FONT></ADDRESS> </BODY> </HTML>