AnaGram interim repo (temporary): doc/misc/html/examples/mpp/mas.html comparison

comparison doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.

author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:13d2b8934445
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE>
+</HEAD>
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+TEXT="000000" LINK="0033CC"
+VLINK="CC0033" ALINK="CC0099">
+<P>
+<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
+WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2  >
+<P>
+<H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser   </H1>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2  >
+<P>
+<H2>Introduction</H2>
+<P>
+The Macro/Argument Substitution module, MAS.SYN,
+accomplishes the following tasks:
+<OL>
+<LI>            It can scan the body of a macro, to identify and
+substitute for parameters and macro calls. </LI>
+<LI>            It can scan the set of arguments for a macro call
+embedded within another macro, only substituting
+arguments to the outer macro for parameters found
+within arguments to the inner call. </LI>
+<LI>            It can scan the argument to a macro for macro calls
+prior to substituting the argument for a parameter. </LI>
+<LI>            It can recognize the "##" operator and paste two tokens
+together. </LI>
+<LI>            It can recognize the "#" operator and turn a macro
+argument into a string. </LI>
+</OL>
+The macro/argument substitution parser, mas(), is called
+from a shell function, expand_text(). expand_text() is, in
+turn, called by expand_macro() and expand_arg(). Output from
+mas() is accumulated on the token_accumulator, ta.
+<P>
+<BR>
+<H2>     Theory of Operation </H2>
+The primary purpose of mas() is to scan sequences of tokens
+for macro calls and parameter names and to do the indicated
+substitutions. At the same time it must correctly handle the
+"##" and '#' operators both of which inhibit macro expansion
+of their operands. Thus the entire grammar is structured
+around the requirements for these two operators.
+<P>
+A further complication is the handling of white space.
+White space within a macro argument cannot be deleted, since
+otherwise the '#' operator would not provide a correct
+result.  Thus, in numerous circumstances in this grammar, it
+is not clear what to do with white space at the time it is
+encountered.  For this reason, any particular sequence of
+white space tokens is saved up on a temporary stack,
+space_stack, which can be output later or simply
+disregarded.
+<P>
+Like TS.SYN, MAS.SYN must have special syntax for
+accumulating the arguments of macro calls. The differences
+between the two grammars arise from the fact that TS.SYN is
+converting an ascii representation to a token
+representation, while MAS.SYN already has token input.
+<P>
+<BR>
+<H2>     Elements of the Macro/Argument Substitution Module </H2>
+The remainder of this document describes the macro
+definitions, the structure definitions, the static data
+definitions, all configuration parameter settings, and all
+non-terminal parsing tokens used in the macro/argument
+substitution module. It also explains each configuration
+parameter setting in the syntax file. In MAS.SYN, each
+function that is defined is preceded by a short explanation
+of its purpose.
+<P>
+<BR>
+<H2>     Macro Definitions </H2>
+<DL>
+<DT>     INPUT_CODE
+<DD>          Since this grammar uses "input values" and "pointer input",
+the parser needs to know how to extract the identification
+code for an input token from the item identified by the
+pointer. The parser expects this macro to be appropriately
+defined. Here it is defined so that it extracts the id field
+of the token.
+<DT>     PCB
+<DD>          Since the "declare pcb" switch has been turned off, PCB has
+to be defined manually.
+<DT>     SYNTAX_ERROR
+<DD>          This definition of SYNTAX_ERROR overrides the default
+definition provided by AnaGram.
+</DL>
+<P>
+<BR>
+<H2>     Static variables </H2>
+<DL>
+<DT>     active_macros
+<DD>          Type: stack<unsigned>
+<P>
+This is a multilevel stack used to keep track of which
+macros have been invoked in any particular expansion. If,
+after an expansion pass it is determined that the result of
+a concatenation is a macro name, all the macros which have
+been expanded so far are marked busy and the text is scanned
+again. Once there is no need for further scans, the busy
+flags are turned off. The stack is multi-level so that it
+can nest easily for recursive usage.
+<DT>     args
+<DD>          Type: token **
+<P>
+"args" is an array of pointers to token strings. It is the
+set of argument strings for the macro currently being
+expanded.
+<DT>     args_only
+<DD>         Type: int
+<P>
+"args_only" is a switch to tell the macro/argument
+substitution logic only to make argument substitutions and
+not to expand macros. In id_macro it is interrogated and if
+set, NAME tokens are not checked to see if they identify
+macros.
+<DT>     mas_pcb
+<DD>          Type: mas_pcb_type *
+<P>
+This variable contains a pointer to the currently active
+parser control block for mas(). It is saved, set and
+restored in expand_text().
+<DT>     n_concats
+<DD>         Type: int
+<P>
+This variable is used to count the number of concatenation
+operations that result in a macro name in the course of a
+single scan for macros. If it is non-zero, the text is
+rescanned.
+<DT>     n_args
+<DD>          Type: int
+<P>
+This variable specifies the number of arguments for the
+current macro being expanded.
+<DT>     params
+<DD>         Type: unsigned *
+<P>
+This variable is a pointer to a list of n_args unsigned
+integers. The integers are the indices in the token
+dictionary of the parameter names for the macro currently
+being expanded.
+<DT>     space_stack
+<DD>          Type: token_accumulator
+<P>
+In a number of places in this grammar, it is necessary to
+pass over white space tokens without knowing whether they
+are to be output or disregarded. "space_stack" provides a
+place to store them temporarily until the decision can be
+made. Remember that within macro arguments spaces can be
+significant, and therefore must not be discarded
+prematurely.
+</DL>
+<P>
+<BR>
+<H2>     Configuration Parameters  </H2>
+<DL>
+<DT>     ~allow macros
+<DD>         This statement turns off the allow macros switch so that
+AnaGram implements all reduction procedures as explicit
+function definitions. This simplifies debugging at the cost
+of a slight performance degradation.
+<DT>     ~backtrack
+<DD>          This statement turns off the backtrack switch. This means
+that if the token scanner encounters a syntax error, it will
+not undo default reductions that may have been caused by the
+bad input before it generates diagnostics.
+<DT>     context type = location
+<DD>          This statement specifies that the generated parser is to
+track context automatically. The context variables have type
+"location". location is defined elsewhere to consist of two
+fields: line number and column number.
+<DT>     ~declare pcb
+<DD>         This statement tells AnaGram not to declare a parser control
+block for the parser. Access to the parser control block is
+through a pointer. Actual allocation of storage and setting
+of the pointer takes place in expand_text().
+<DT>     default input type = token
+<DD>          This statement tells AnaGram how to code reduction procedure
+calls that involve input tokens.
+<DT>     enum
+<DD>          This enumeration statement provides definitions for terminal
+tokens. The same enum statement is found in EX.SYN, in
+JRC.SYN, in KRC.SYN and in MPP.H.
+<DT>     ~error frame
+<DD>          This turns off the error frame portion of the automatic
+syntax error diagnostic generator, since the context of the
+error in the macro substition syntax is of little interest.
+If an error frame were to be used in diagnostics that of the
+C parser would be more appropriate.
+<DT>     error trace
+<DD>          This turns on the error trace function, so that if the token
+scanner encounters a syntax error it will write an .etr
+file.
+<DT>     input values
+<DD>          This switch tells AnaGram that the input units carry some
+baggage, that they have values apart from their identifying
+code. Since this grammar uses pointer input, an INPUT_CODE
+macro must also be defined.
+<DT>     ~lines and columns
+<DD>          Turns off the lines and columns switch so your parser won't
+try to track them here where they certainly make no sense.
+<DT>     line numbers
+<DD>          This statement causes AnaGram to include #line statements in
+the parser file so that your compiler can provided
+diagnostics keyed to your syntax file.
+<DT>     pointer input
+<DD>          This statement tells AnaGram that the input to mas() is an
+array in memory that can be scanned simply by incrementing a
+pointer. Since the input tokens are not simply characters, a
+pointer type statement is required and the INPUT_CODE macro
+must be defined.
+<DT>     pointer type = token *
+<DD>          This statement provides the C data type of the input units
+to the parser. If this statement were omitted, pointer type
+would default to unsigned char *, and your compiler would
+scold when it tried to compile the parser.
+<DT>     subgrammar parse unit
+<DD>          This statement tells AnaGram that the specifications for
+parse unit are internallly complete and that it should not
+determine reductions by inspecting following tokens.
+<DT>     ~test range
+<DD>          This statement tells AnaGram not to check input characters
+to see if they are within allowable limits. This checking is
+not necessary since the input to mas() has been generated in
+such a way that it cannot possibly get an out of range
+token.
+</DL>
+<P>
+<BR>
+<H2>     Grammar Tokens </H2>
+<DL>
+<DT>     arg element
+<DD>         An "arg element" is a discrete token in an argument to a
+macro or a sequence of "nested elements".
+<DT>     arg elements
+<DD>          A sequence of "arg element".
+<DT>     concatenation
+<DD>         These productions implement the "##" operator. "parameter
+name" is distinguished on the right side to avoid improper
+macro expansions.
+<DT>     defined
+<DD>          See "variable". "defined" is the special operator available
+in #if and #elif statements to determine whether a macro has
+been defined. It is recognized only if "if_clause" has been
+set.
+<DT>     grammar
+<DD>          "grammar" simply describes the complete input to MAS. Since
+"grammar" is a special name recognized by AnaGram, there is
+no need for any further specification of the "start token"
+for the grammar.
+<DT>     left side
+<DD>          This refers to a "##" operator, its left operand, and any
+intervening white space. The cross recursion with
+"concatenation" allows for constructs of the form A ## B ##
+C, with grouping to the left.
+<DT>     macro
+<DD>          See "variable". This grammar distinguishes between a "simple
+macro" which was defined without any parameter list, and a
+"macro" which had an explicit, although perhaps empty,
+parameter list. If a "macro" appears without following
+parentheses, it is simply passed on without being expanded.
+<DT>     macro arg list
+<DD>         This token counts the number of arguments to a macro, and
+stacks them on so many levels of the token accumulator. The
+logic is essentially the same as for macro arg list in
+TS.SYN.
+<DT>     nested elements
+<DD>          The "nested elements" token represents a sequence of macro
+argument tokens enclosed in matching parentheses.
+<DT>     not parameter
+<DD>          This token exists simply to avoid multiple copies of the
+same reduction procedure.
+<DT>     parameter expansion
+<DD>          "parameter expansion" is simply a device to defer the
+replacement of a macro parameter name until it has been
+determined whether it is followed by "##", with perhaps some
+intervening white space.
+<DT>     parameter name
+<DD>          See <STRONG>variable</STRONG>.
+<DT>     parse unit
+<DD>          The major problem in expanding a macro body is dealing with
+the "##" operator. Since the arguments of ## are not to have
+their macros expanded, but only macro arguments replaced,
+they have to be dealt with specially. Thus "parse unit"
+distinguishes those tokens which are not macro parameters,
+"simple parse units" from the parameters and allows for
+recognition of the concatenation operator before it goes
+ahead and allows complete expansion of a macro parameter.
+<DT>     right side
+<DD>          This token consists of anything that can follow "##" with
+the exception of a parameter name which needs special
+treatment. If the token is a macro name, the macro is not
+expanded.
+<DT>     simple macro
+<DD>          See variable. A "simple macro" is one which was defined
+without any following parameter list.
+<DT>     simple parse unit
+<DD>          "simple parse unit" consists of the input constructs which
+do not immediately involve macro parameters. It allows for
+complete macro expansion.
+<DT>     space
+<DD>          The token scanner passes spaces along because they cannot be
+discarded until after macros have been expanded. This is
+because of the # operator which turns macro arguments into
+strings.
+<P>
+If the args_only flag is set, spaces have to be passed on to
+the output. Otherwise they can be discarded. These
+productions accumulate space tokens on a stack, so that the
+decision to output them or to discard them can be deferred.
+If they are not output, they will effectively be discarded
+by the reset() operation the next time a sequence of spaces
+is encountered.
+<DT>     variable
+<DD>       Since a NAME token can, depending on circumstance, name a
+parameter a macro or the "defined" operator, a semantically
+determined production is used to make the distinction.
+"variable" is the outcome for NAME tokens that are simply to
+be passed on without any special treatment.
+</DL>
+<P>
+<BR>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
+WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+<P>
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+<P>
+<ADDRESS><FONT SIZE="-1">
+AnaGram parser generator - examples<BR>
+Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR>
+Copyright &copy; 1993-1999, Parsifal Software. <BR>
+All Rights Reserved.<BR>
+</FONT></ADDRESS>
+</BODY>
+</HTML>

Mercurial > ~dholland > hg > ag > index.cgi

comparison doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445