Mercurial > ~dholland > hg > ag > index.cgi

diff doc/misc/html/examples/mpp/ts.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author: David A. Holland
date: Sat, 22 Dec 2007 17:52:45 -0500
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/misc/html/examples/mpp/ts.html	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,742 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE> Token Scanner - Macro preprocessor and C Parser </TITLE>
+</HEAD>
+
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+ TEXT="#000000" LINK="#0033CC"
+ VLINK="#CC0033" ALINK="#CC0099">
+
+<P>
+<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
+         WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+
+<H1> Token Scanner - Macro preprocessor and C Parser   </H1>
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+<BR>
+
+<H2>Introduction</H2>
+
+          The token scanner module, <tt>ts.syn</tt>, accomplishes the following
+          tasks:
+<OL>
+   <LI>          It reads the raw input, gathers tokens and identifies
+               them. </LI>
+   <LI>          It analyzes conditional compilation directives and
+               skips over text that is to be omitted. </LI>
+   <LI>         It analyzes macro definitions and maintains the macro
+               tables. </LI>
+    <LI>         It identifies macro calls in the input stream and calls
+               the <tt>macro_expand()</tt> function to expand them. </LI>
+    <LI>         It recognizes <tt>#include</tt> statements and calls itself
+               recursively to parse the include file. </LI>
+</OL>
+
+          The token_scanner parser, <tt>ts()</tt>, is called from a shell
+          function, <tt>scan_input(char *)</tt>, which takes the name
+          of a file
+          as an argument. <tt>scan_input()</tt> opens the file, calls
+          <tt>ts()</tt>, and
+          closes the file. <tt>scan_input()</tt> is called recursively by
+          <tt>include_file()</tt> when an <tt>#include</tt> statement
+          is found in the
+          input.
+<P>
+          Output from the token scanner is directed to a token_sink
+          pointed to by the <tt>scanner_sink</tt> global variable. The main
+          program may set scanner sink to point to either a
+          <tt>token_translator</tt> or a <tt>c_parser</tt>. During the
+          course of
+          processing, the token scanner redirects output to a token
+          accumulator or to the conditional expression evaluator, as
+          necessary, by temporarily changing the value of
+          <tt>scanner_sink</tt>.
+<P>
+          The token scanner module contains two syntax error
+          diagnostic procedures: <tt>syntax_error(char *)</tt> and
+          <tt>syntax_error_scanning(char *)</tt>. The former is set up to
+          provide correct line and column numbers for functions called
+          from reduction procedures in the token scanner. The latter
+          is set up to provide line and column numbers for errors
+          discovered in the scanner itself. Both functions accept a
+          pointer to an error message.
+<P>
+<BR>
+
+<H2>     Theory of Operation  </H2>
+
+          The primary purpose of the token scanner is to identify the
+          C language tokens in the input file and pass them on to
+          another module for further processing. In order to package
+          them for transmission, the token scanner maintains a "token
+          dictionary", <tt>td</tt>, which enables it to characterize each
+          distinct input token with a single number. The token scanner
+          also classifies tokens according to the definitions of the C
+          language. The "token" that it passes on for further
+          processing is a pair consisting of an id field, and a value
+          field. The id field is defined by the <tt>token_id</tt>
+          enumeration 
+          in <tt>token.h</tt>. The value field is the index of the
+          token in the
+          token dictionary, <tt>td</tt>.
+<P>
+          To support its primary purpose, the token scanner deals with
+          several other problems. First, it identifies preprocessor
+          control lines which control conditional compilation and
+          skips input appropriately. Second, it fields <tt>#include</tt>
+          statements, and recurses to process include files. Third, it
+          fields <tt>#define</tt> statements and manages the macro definition
+          tables. Finally, it checks the tokens it identifies and
+          calls the macro/argument expansion module to expand them if
+          they turn out to be macros.
+<P>
+          The conditional compilation logic in the token scanner is
+          carried out in its entirety by syntactic means. The only C
+          code involved deals with evaluating conditional statements.
+          <tt>#ifdef</tt> and <tt>#ifndef</tt> are quite
+          straightforward. <tt>#if</tt> is another 
+          matter. To deal with the generality of this statement, token
+          scanner output is diverted to the expression evaluator
+          module, <tt>ex.syn</tt>, where the expression is evaluated. The
+          outcome of the calculation is then used to control a
+          semantically determined production in the token scanner.
+<P>
+          Processing <tt>#include</tt> statements is reasonably
+          straightforward. Token scanner output is diverted to the
+          token accumulator, <tt>ta</tt>. The content of the token accumulator
+          is then translated back to ASCII string form. This takes
+          care of macro calls in the <tt>#include</tt> statement. Once the file
+          has been identified, <tt>scan_input()</tt> is called recursively to
+          deal with it.
+<P>
+          The only complication with macro definitions is that the
+          tokens which comprise the body of a macro must not be
+          expanded until the macro is invoked. For that reason, there
+          are two different definitions of token in the token scanner:
+          "simple token" and "expanded token". The difference is that
+          simple tokens are not checked for macro calls. When a macro
+          definition is encountered, the token scanner output is
+          diverted to the token accumulator, so that the body of the
+          macro can be captured and stored.
+<P>
+          When a macro call is recognized, the token scanner must pick
+          up the arguments for the macro. There are three
+          complications here: First, the tokens must not be scanned
+          for macros; second, the scan must distinguish the commas
+          that separate arguments from commas that may be contained
+          inside balanced parentheses within an argument; and finally,
+          leading white space tokens do not count as argument tokens.
+<P>
+<BR>
+
+<H2>     Elements of the Token Scanner  </H2>
+
+          The remainder of this document describes the macro
+          definitions, the structure definitions, the static data
+          definitions, all configuration parameter settings, and all
+          non-terminal parsing tokens used in the token scanner. It
+          also explains each configuration parameter setting in the
+          syntax file. In <tt>ts.syn</tt>, each function that is defined is
+          preceded by a short explanation of its purpose.
+<P>
+<BR>
+
+<H2>     Macro definitions  </H2>
+<DL>
+<DT>     <tt>GET_CONTEXT</tt>
+   <DD>       The <tt>GET_CONTEXT</tt> macro provides the parser with context
+          information for the input character. (Instead of writing a
+          <tt>GET_CONTEXT</tt> macro, the context information could be stored
+          as part of <tt>GET_INPUT</tt>.)
+
+<DT>     <tt>GET_INPUT</tt>
+    <DD>      The <tt>GET_INPUT</tt> macro provides the next input
+          character for
+          the parser. If the parser used <b>pointer input</b> or <b>event
+          driven</b> input, a <tt>GET_INPUT</tt> macro would not be
+          necessary. The
+          default for <tt>GET_INPUT</tt> would read <tt>stdin</tt> and
+          so is not
+          satisfactory for this parser.
+
+<DT>     <tt>PCB</tt>
+   <DD>       Since the <b>declare pcb</b> switch has been turned off, AnaGram
+          will not define <tt>PCB</tt>. Making the parser control block part of
+          the file descriptor structure simplifies saving and
+          restoring the pcb for nested #include files.
+
+<DT>     <tt>SYNTAX_ERROR</tt>
+   <DD>       <tt>ts.syn</tt> defines the <tt>SYNTAX_ERROR</tt> macro,
+          since otherwise the
+          generated parser would use the default definition of
+          <tt>SYNTAX_ERROR</tt>, which would not provide the name of the file
+          currently being read.
+</DL>
+<P>
+<BR>
+
+<H2>     Local Structure Definitions </H2>
+<DL><DT>     <tt>location</tt>
+   <DD>       <tt>location</tt> is a structure which records a line
+          number and a
+          column number. It is handed to AnaGram with the context type
+          statement found in the configuration segment. AnaGram then
+          declares two member fields of type <tt>location</tt> in the parser
+          control block: <tt>input_context</tt> and a stack, <tt>cs</tt>. In
+          <tt>scan_input()</tt>, the <tt>input_context</tt> variable
+	  is set explicitly
+          with the current line and column number. In <tt>syntax_error()</tt>
+          the <tt>CONTEXT</tt> macro is used to extract the line and column
+          number at which the rule currently being reduced started.
+
+<DT>     <tt>file_descriptor</tt>
+   <DD>       <tt>file_descriptor</tt> contains the information that
+          needs to be
+          saved and restored when nested include files are processed.
+</DL>
+<P>
+<BR>
+
+<H2>     Static Variables  </H2>
+<DL><DT>     <tt>error_modifier</tt>
+   <DD>       Type: <tt>char *</tt><BR>
+
+          The string identified by <tt>error_modifier</tt> is added to the
+          error diagnostic printed by <tt>syntax_error()</tt>. Normally it is
+          an empty string; however, when macros are being expanded it
+          is set so that the diagnostic will specify that the error
+          was found inside a macro expansion.
+
+<DT>     <tt>input</tt>
+    <DD>      Type: <tt>file_descriptor</tt><BR>
+
+          <tt>input</tt> provides the name and stream pointer for the
+          currently active
+          input file.
+
+<DT>     <tt>save_sink</tt>
+    <DD>      Type: <tt>stack&lt;token_sink *&gt;</tt><BR>
+
+          This stack provides for saving and restoring <tt>scanner_sink</tt>
+          when it is necessary to divert the scanner output for
+          dealing with conditional expressions, macro definitions and
+          macro arguments. Actually, a stack is not necessary, since
+          such diversions never nest more than one level deep, but it
+          seems clearer to use a stack.
+</DL>
+<P>
+<BR>
+
+<H2>     Configuration Parameters </H2>
+<DL><DT>     <tt>~allow macros</tt>
+   <DD>       This statement turns off the <b>allow macros</b> switch so that
+          AnaGram implements all reduction procedures as explicit
+          function definitions. This simplifies debugging at the cost
+          of a slight performance degradation.
+
+<DT>     <tt>auto resynch</tt>
+    <DD>      This switch turns on automatic resynchronization in case a
+          syntax error is encountered by the token scanner.
+
+<DT>     <tt>context type = location</tt>
+   <DD>       This statement specifies that the generated parser is to
+          track context automatically. The context variables have type
+          <tt>location</tt>. <tt>location</tt> is defined elsewhere to
+          consist of two
+          fields: line number and column number.
+
+<DT>     <tt>~declare pcb</tt>
+   <DD>       This statement tells AnaGram not to declare a parser control
+          block for the parser. The parser control block is declared
+          later as part of the <tt>file_descriptor</tt> structure.
+
+<DT>     <tt>~error frame</tt>
+   <DD>       This turns off the error frame portion of the automatic
+          syntax error diagnostic generator, since the context of the
+          error in the scanner syntax is of little interest. If an
+          error frame were to be used in diagnostics that of the C
+          parser would be more appropriate.
+
+<DT>     <tt>error trace</tt>
+  <DD>        This turns on the <b>error trace</b> functionality, so
+          that if the token
+          scanner encounters a syntax error it will write an <tt>.etr</tt>
+          file.
+
+<DT>     <tt>line numbers</tt>
+   <DD>       This statement causes AnaGram to include <tt>#line</tt>
+          statements in
+          the parser file so that your compiler can provided
+          diagnostics keyed to your syntax file.
+
+<DT>     <tt>subgrammar</tt>
+   <DD>       The basic token grammar for C is usually implemented using
+          some sort of regular expression parser, such as <tt>lex</tt>, which
+          always looks for the longest match to the regular
+          expression. In no case does the regular expression parser
+          use what follows a match to determine the nature of the
+          match. An LALR parser generator, on the other hand, normally
+          looks not only at the content of a token but also looks
+          ahead. The subgrammar declaration tells AnaGram not to look
+          ahead but to parse these tokens based only on their internal
+          structure. Thus the conflicts that would normally be
+          detected are not seen. To see what happens if lookahead is
+          allowed, simply comment out any one of these subgrammar
+          statements and look at the conflicts that result.
+
+<DT>     <tt>~test range</tt>
+   <DD>       This statement tells AnaGram not to check input characters
+          to see if they are within allowable limits. This checking is
+          not necessary since the token scanner is reading a text file
+          and cannot possibly get an out of range token.
+</DL>
+<P>
+<BR>
+
+<H2>     Scanner Tokens, in alphabetical order  </H2>
+<DL><DT>     any text
+    <DD>      These productions are used when skipping over text. "any
+          text" consists of all characters other than eof, newline and
+          backslash, as well as any character (including newline and
+          backslash) that is quoted with a preceding backslash
+          character.
+
+<DT>     arg element
+    <DD>      An "arg element" is a token in the argument list of a macro.
+          It is essentially the same as "simple token" except that
+          commas must be detected as separators and nested parentheses
+          must be recognized. An "arg element" is either a space or an
+          "initial arg element".
+
+<DT>    character constant
+   <DD>       A "character constant" is a quoted character or escape
+          sequence. The token scanner does not inquire closely into
+          the internal nature of the character constant.
+
+<DT>     comment
+   <DD>       A "comment" consists of a comment head followed by the
+          closing "*/".
+
+<DT>     comment head
+   <DD>       A "comment head" consists of the entire comment up to the
+          closing "*/". If a complete comment is found following a
+          comment head, its treatment depends on whether one believes,
+          with ANSI, that comments should not be nested, or whether
+          one prefers to allow nested comments. Followers of the ANSI
+          principle will want "comment head, comment" to reduce to
+          "comment". Believers in nested comments will want to finish
+          the comment that was in progress when the nested comment was
+          encountered, so they will want "comment head, comment" to
+          reduce to "comment head", which will allow the search for
+          "*/" to continue.
+
+<DT>     conditional block
+   <DD>       A "conditional block" is an #if, #ifdef, or #ifndef line and
+          all following lines through the terminating #endif. If the
+          initial condition turns out to be true, then everything has
+          to be skipped following an #elif or #else line. If the
+          initial condition is false, everything has to be skipped
+          until a true #elif condition or an #else line is found.
+
+ <DT>    confusion
+   <DD>       This token is designed to deal with a curious anomaly of C.
+          Integers which begin with a zero are octal, but floating
+          point numbers may have leading zeroes without losing their
+          fundamental decimal nature. "confusion" is an octal integer
+          that is followed by an eight or a nine. This will become
+          legitimate if eventually a decimal point or an exponent
+          field is encountered.
+
+<DT>     control line
+    <DD>      "control line" consists of any preprocessor control line
+          other than those associated with conditional compilation.
+
+<DT>     decimal constant
+    <DD>      A "decimal constant" is a "decimal integer" and any
+          following qualifiers.
+
+<DT>     decimal integer
+   <DD>       The digits which comprise the integer are pushed onto the
+          string accumulator. When the integer is complete, the string
+          will be entered into the token dictionary and subsequently
+          it will be described by its index in the token dictionary.
+
+<DT>    defined
+   <DD>       See "expanded word". id_macro will recognize "defined" only
+          when the if_clause switch is set.
+
+<DT>     eof
+  <DD>        end of file: equal to the null character.
+
+<DT>     eol
+   <DD>       end of line: a newline and all immediately following white
+          space or newline characters. eol is declared to be a
+          subgrammar since it is used in circumstances where space can
+          legitimately follow, according to the syntax as written.
+
+ <DT>    else if header
+  <DD>       This production is simply a portion of the rule for the
+          #elif statement. It is separated out in order to provide a
+          hook on which to hang the call to init_condition(), which
+          diverts scanner output to the expression_evaluator which
+          will calculate the value of the conditional expression.
+
+<DT>     else section
+  <DD>        An "else section" is an #else line and all immediately
+          following complete sections. An "else section" and a "skip
+          else section" are the same except that in an "else section"
+          tokens are sent to the scanner output and in a "skip else
+          section" they are discarded.
+
+ <DT>    endif line
+  <DD>        An "endif line" is simply a line that begins #endif
+
+<DT>    expanded token
+  <DD>        The word "token" is used here in the sense of Kernighan and
+          Ritchie, 2nd Edition, Appendix A, p. 191. In this program a
+          "simple token" is one which is simply passed on without
+          regard to macro processing. An "expanded token" is one which
+          has been checked to see if it is a macro identifier and, if
+          so, expanded. "simple tokens" are recognized only in the
+          bodies of macro definitions. Therefore spaces and '#'
+          characters are passed on. For "expanded tokens" they are
+          discarded.
+
+<DT>     expanded word
+   <DD>       This is the treatment of a simple identifier as an "expanded
+          token". "variable", "simple macro", "macro", and "defined"
+          are the various outcomes of semantic analysis of "name
+          string" performed by id_macro(). In this case reserved words
+          and identifiers which are not the names of macros are
+          subsumed under the rubric "variable". These tokens are
+          simply passed on to the scanner output.
+<P>
+          The distinction between "macro" and "simple macro" depends
+          on whether the macro was defined with or without following
+          parentheses. A "simple macro" is expanded by calling
+          expand(). expand() simply serves as a local interface to the
+          expand_text() function defined in <tt>mas.syn</tt>.
+<P>
+          If a "macro" was defined with parentheses but appears bereft
+          of an argument list, it is treated as a simple identifier
+          and passed on to the output.  Otherwise the argument tokens
+          for the macro are gathered and stacked on the token
+          accumulator, using "macro arg list". Finally, the macro is
+          expanded in the same way as a "simple macro". Note that
+          "macro arg list" provides a count of the number of arguments
+          found inside the balanced parentheses.
+<P>
+          If "if_clause" is set, it means that the conditional
+          expression of an #if or #elif line is being evaluated. In
+          this case, the pseudo-function defined() must be recognized
+          to determine whether a macro has or has not been defined.
+          The defined() function returns a "1" or "0" token depending
+          on whether the macro has been defined.
+
+<DT>     exponent
+ <DD>         This is simply the exponent field on a floating point number
+          with optional sign.
+
+
+<DT>     false condition
+  <DD>        The "true condition" and "false condition" tokens are
+          semantically determined. They consist of #if, #ifdef, or
+          #ifndef lines. If the result of the test is true the
+          reduction token is "true condition", otherwise it is "false
+          condition".
+
+<DT>     false else condition
+  <DD>        The "true else condition" and "false else condition" tokens
+          are semantically determined. They consist of an #elif line.
+          If the value of the conditional expression is true the
+          reduction token is "true else condition", otherwise it is
+          "false else condition".
+
+<DT>     false if section:
+   <DD>       A "false if section" is a #if, #ifdef, or #ifndef condition
+          that turns out to be false followed by any number, including
+          zero, of complete sections or false #elif condition lines.
+          All of the text within a "false if section" is discarded.
+<DT>     floating qualifier
+   <DD>       These productions are simply the optional qualifiers to
+          specify that a constant is to be treated as a float or as a
+          long double.
+
+<DT>     hex constant
+  <DD>        A "hex constant" is simply a "hex integer" plus any
+          following qualifiers.
+
+<DT>     hex integer
+   <DD>       The digits which comprise the integer are pushed onto the
+          string accumulator. When the integer is complete, the string
+          will be entered into the token dictionary and subsequently
+          it will be described by its index in the token dictionary.
+
+<DT>    if header
+  <DD>        This production is simply a portion of the rule for the #if
+          statement. It is separated out in order to provide a hook on
+          which to hang the call to init_condition(), which diverts
+          scanner output to the expression evaluator which will
+          calculate the value of the conditional expression.
+
+<DT>     initial arg element
+  <DD>        In gathering macro arguments, spaces must not be confused
+          with a true argument. Therefore, the arg element token is
+          broken down into two pieces so that each argument begins
+          with a nonblank token.
+
+<DT>     include header
+  <DD>        "include header" simply represents the initial portion of an
+          #include line and provides a hook for a reduction procedure
+          which diverts scanner output to the token accumulator. This
+          diversion allows the text which follows #include to be
+          scanned for macros and accumulated. The include_file()
+          function will be called to actually identify and scan the
+          specified file.
+
+ <DT>    input file
+   <DD>       This is the grammar, or start token. It describes the entire
+          file as alternating sections and eols, terminated by an eof
+
+<DT>     integer constant
+   <DD>       These productions simply gather together the varieties of
+          integer constants under one umbrella.
+
+<DT>     integer qualifier
+   <DD>       These productions are simply the optional qualifiers to
+          specify that an "integer constant" is to be treated as
+          unsigned, long, or both.
+
+<DT>     macro
+  <DD>        See "expanded word". id_macro specifies "macro" or "simple
+          macro" depending on whether the named macro was defined with
+          or without following parentheses.
+
+<DT>     macro arg list
+  <DD>        A "macro arg list" can be either empty or can consist of any
+          number of token sequences separated by commas. Commas that
+          are protected by nested parentheses do not separate
+          arguments. Argument strings are accumulated on the token
+          accumulator and counted by "macro args".
+
+ <DT>    macro args
+ <DD>         Each argument to a macro is gathered on a separate level of
+          the token accumulator, so the token accumulator level is
+          incremented before each argument, and the arguments are
+          counted.
+
+<DT>     macro definition header
+  <DD>        The "macro definition header" consists of the #define line
+          up to the beginning of the body text of the macro. It serves
+          as a hook to call init_macro_def() which begins the macro
+          definition and diverts scanner output to the token
+          accumulator. The macro definition will be completed by the
+          save_macro_body() function once the entire macro body has
+          been accumulated. Note that the tokens for the macro body
+          are not examined for macro calls.
+
+<DT>     name string
+   <DD>       "name string" is simply an accumulation on the string
+          accumulator of the characters which make up an identifier.
+
+<DT>     nested elements
+  <DD>       "nested elements" are "arg elements" that are found inside
+          nested parentheses.
+
+<DT>     not control mark
+  <DD>        This consists of any input character excepting eof, newline,
+          backslash and '#', but including any of these if preceded by
+          a backslash. It serves, at the beginning of a line, to
+          distinguish ordinary lines of text from preprocessor control
+          lines.
+<DT>
+     octal integer
+  <DD>        The digits which comprise the integer are pushed onto the
+          string accumulator. When the integer is complete, the string
+          will be entered into the token dictionary and subsequently
+          it will be described by its index in the token dictionary.
+
+<DT>     operator
+ <DD>        This is simply an inventory of all the multi-character
+          operators in C.
+
+<DT>     parameter list
+  <DD>        "parameter list" is simply a wrapper about "names" which
+          allows for empty parentheses. Note that both the "names"
+          token and the "parameter list" tokens provide the count of
+          the number of parameter names found inside the parentheses.
+          The names themselves have been stacked on the string
+          accumulator.
+
+<DT>     qualified real
+  <DD>        This production exists to allow the "floating qualifier" to
+          be appended to a "real constant".
+<DT>     real
+  <DD>        These productions itemize the various ways of writing a
+          floating point number with and without decimal points and
+          with and without exponent fields.
+
+ <DT>    real constant
+  <DD>        This production is simply an envelope to contain "real" and
+          write the output code once instead of four times.
+
+<DT>     section
+ <DD>         This is a logical block of input. It is either a single line
+          of ordinary code, a control line such as #define or #undef,
+          or an entire conditional compilation block, i.e., everything
+          from the #if to the closing #endif. Notice that the eol that
+          terminates a "section" is not part of the "section". The
+          only difference between a "section" and a "skip section" is
+          that in a "section", all tokens are sent to the scanner
+          output while in a "skip section", all input is discarded.
+
+<DT>     separator
+  <DD>        This is simply a gathering together of all the tokens that
+          are neither white space nor identifiers, since they are
+          treated uniformly throughout the grammar.
+
+ <DT>    simple macro
+  <DD>        See "expanded word".
+
+<DT>     simple real
+  <DD>       A "simple real" is one which has a decimal point and has
+          digits on at least one side of the decimal point.
+          Unaccompanied decimal points will be turned away at the
+          door.
+<DT>     simple token
+ <DD>        The word "token" is used here in the sense of Kernighan and
+          Ritchie, 2nd Edition, Appendix A, p. 191. In this program a
+          "simple token" is one which is simply passed on without
+          regard to macro processing. An "expanded token" is one which
+          has been checked to see if it is a
+<P>          macro identifier and, if so, expanded. "simple tokens" are
+          recognized only in the bodies of macro definitions.
+          Therefore spaces and '#' characters are passed on. For
+          "expanded tokens" they are discarded.
+
+<DT>     skip else line
+  <DD>        For purposes of skipping over complete conditional sections
+          #elif and #else lines are equivalent.
+
+<DT>    skip else section
+  <DD>        A "skip else section" consists of the #else or #elif line
+          following a satisfied conditional and all subsequent
+          sections and #elif and #else lines. All input in the "skip
+          else section" is discarded.
+
+<DT>     skip if section
+ <DD>         A "skip if section" consists of an #if, #ifdef, or #ifndef
+          line, and all following complete "sections" (represented as
+          "skip sections", so their content will be ignored) and #else
+          and #elif lines.
+
+ <DT>    skip line
+   <DD>       When skipping text, we have to distinguish between lines
+          which begin with the control mark ('#') and those which
+          don't so that we deal correctly with nested #endif
+          statements. We wouldn't want to terminate a block of
+          uncompiled code with the wrong #endif.
+
+<DT>     skip section
+  <DD>        A "skip section" is simply a "section" that follows an
+          unsatisfied conditional. In a "skip section", all input is
+          discarded.
+
+<DT>     space
+  <DD>        space consists of either a blank or a comment. If a comment
+          is found, it is replaced with a blank.
+<DT>     simple chars
+   <DD>       "simple chars" consists of the body of a character constant
+          up to but not including the final quote.
+
+<DT>     string chars
+  <DD>        "string chars" consists of the body of a string literal up
+          to but not including the final double quote.
+
+<DT>     string literal
+   <DD>       A "string literal" is simply a quoted string. It is
+          accumulated on the string accumulator.
+
+<DT>     true condition
+  <DD>        The "true condition" and "false condition" tokens are
+          semantically determined. They consist of #if, #ifdef, or
+          #ifndef lines. If the result of the test is true the
+          reduction token is "true condition", otherwise it is "false
+          condition".
+
+<DT>     true condition
+  <DD>        The "true condition" and "false condition" tokens are
+          semantically determined. They consist of #if, #ifdef, or
+          #ifndef lines. If the result of the test is true the
+          reduction token is "true condition", otherwise it is "false
+          condition".
+
+<DT>     true else condition
+   <DD>       The "true else condition" and "false else condition" tokens
+          are semantically determined. They consist of an #elif line.
+          If the value of the conditional expression is true the
+          reduction token is "true else condition", otherwise it is
+          "false else condition".
+
+ <DT>    true if section
+   <DD>       A "true if section" is a true #if, #ifdef, or #ifndef,
+          followed by any number of complete sections, including zero.
+          Alternatively, it could be a "false if section" that is
+          followed by a true #elif condition, followed by any number
+          of complete "sections". All input in a "true if section"
+          subsequent to the true condition is passed on to the scanner
+          output.
+
+<DT>     word
+  <DD>        This is the treatment of a simple identifier as a "simple
+          token". The name_token() procedure is called to pop the name
+          string from the string accumulator, identify it in the token
+          dictionary and assign a token_id to it by checking to see if
+          it is a reserved word.
+
+<DT>     variable
+  <DD>        See "expanded word".
+
+     ws
+   <DD>       The definition for ws as space... simply allows a briefer
+          reference in those places in the grammar where it is
+          necessary to skip over white space.
+</DL>
+<P>
+<BR>
+
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+      WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
+                WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+
+<P>
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+
+<ADDRESS><FONT SIZE="-1">
+                  AnaGram parser generator - examples<BR>
+                  Token Scanner - Macro preprocessor and C Parser <BR>
+                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
+                  All Rights Reserved.<BR>
+</FONT></ADDRESS>
+
+</BODY>
+</HTML>
+
author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children