Mercurial > ~dholland > hg > ag > index.cgi
view doc/manual/pcb.tex @ 15:f5acaf0c8a29
Don't cast through "volatile int". Causes a gcc warning nowadays.
XXX: should put something else back here to frighten the optimizer
author | David A. Holland |
---|---|
date | Tue, 31 May 2022 01:00:55 -0400 |
parents | 13d2b8934445 |
children |
line wrap: on
line source
\chapter{Parser Control Block} \index{PCB}\index{Parser control block} A \agterm{parser control block} is a structure which contains all of the data necessary to describe the instantaneous state of a parser. The \agcode{typedef} statement which defines the structure is included in the \index{Header file}\index{File}header file for your parser. AnaGram creates the name of the \index{Name}data type for the structure by appending \agcode{{\us}pcb{\us}type} to the parser name. % XXX: does \index{Name} belong? If the \index{Declare pcb}\index{Configuration switches}\agparam{declare pcb} configuration switch is on, its default state, AnaGram will declare a parser control block for you at the beginning of your parser file. AnaGram will determine the \index{Name}name of the parser control block by appending \agcode{{\us}pcb} to the parser name. AnaGram will also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a shorthand notation for use within the parser. If you wish to declare your own parser control block, you must include the header file for your parser before your declaration. Then you declare a parser control block and define \agcode{PCB} to refer to the control block you have declared. Suppose your grammar is called \agcode{widget}. You would then write the following statements in your embedded C in order to declare a parser control block named \agcode{widget{\us}control}: \begin{indentingcode}{0.4in} \#include "widget.h" widget{\us}pcb{\us}type widget{\us}control; \#define PCB widget{\us}control \end{indentingcode} The remainder of this appendix describes fields in the parser control block that may interest the user: \index{column}\index{PCB} \paragraph{\agcode{int column;}} \agcode{PCB.column} keeps track of the column number of the current character in your input. Line and column numbers are tracked only if the \index{Lines and columns}\index{Configuration switches} \agparam{lines and columns} configuration switch has been set. \index{cs}\index{PCB} \paragraph{\agcode{\textit{context-type} cs[];}} \agcode{PCB.cs} is your \index{Context stack}\index{Stack}context stack. \agcode{cs} will be defined only if you have assigned a value to the configuration parameter \index{Context type}\index{Configuration parameters}\agparam{context type}. \index{error{\us}frame{\us}ssx}\index{PCB} \paragraph{\agcode{int error{\us}frame{\us}ssx;}} \agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling routines may refer. When your \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is called, if you have set both the \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} and \index{Error frame}\index{Configuration switches}\agparam{error frame} configuration switches, \agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser stack index at the beginning of the frame token identified by \agcode{PCB.error{\us}frame{\us}token}. For example, if in a syntax file, you fail to close a comment, AnaGram will encounter an illegal end of file in the comment. In this situation, \agcode{error{\us}frame{\us}token} is the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser stack depth at the beginning of the comment. \index{error{\us}frame{\us}token}\index{PCB} \paragraph{\agcode{int error{\us}frame{\us}token;}} \agcode{PCB.error{\us}frame{\us}token} is a field to which your error handling routines may refer. When your \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is called, if you have set both \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} and \index{Error frame}\index{Configuration switches}\agparam{error frame}, it will contain the token number of the frame token, a token which identifies the context of the error. \index{error{\us}message}\index{PCB} \paragraph{\agcode{char *error{\us}message;}} \agcode{PCB.error{\us}message} is a field to which your error handling procedures may refer. If you have set the \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} configuration switch, on encountering a syntax error your parser will create a string containing an appropriate diagnostic message and store a pointer to it into \agcode{PCB.error{\us}message}. \index{PCB}\index{exit{\us}flag} \paragraph{\agcode{int exit{\us}flag;}} \agcode{PCB.exit{\us}flag} contains a code value which indicates whether the parser is still running or whether it has terminated. If the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the reason the parse has terminated. Mnemonic values for these \index{Exit codes}exit codes are defined in the header file for your parser. The values are as follows: % XXX s/mnemonic/symbolic/ \begin{tabular}{ll} \agcode{AG{\us}RUNNING{\us}CODE} & 0\\ \agcode{AG{\us}SUCCESS{\us}CODE} & 1\\ \agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\ \agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\ \agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\ \agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5 \end{tabular} \index{PCB}\index{input{\us}code} \paragraph{\agcode{int input{\us}code;}} \agcode{PCB.input{\us}code} contains the current input character, or the token number, if your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies token numbers directly. If you write your own \agcode{GET{\us}INPUT} macro, you must make sure that you store the input character or token number you get into \agcode{PCB.input{\us}code}. If you have configured your parser to be \index{Event driven}\index{Configuration switches}\agparam{event driven}, you must store the input character or token number for each token in turn into \agcode{PCB.input{\us}code} before you call your parser to process it. \index{PCB}\index{input{\us}context} \paragraph{\agcode{\textit{context-type} input{\us}context;}} \agcode{PCB.input{\us}context} is a field which AnaGram adds to the definition of the parser control block structure when you assign a value to the \index{Context type}\index{Configuration parameters}\agparam{context type} configuration parameter. If you choose, you can write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro so that it stores the context value in \agcode{PCB.input{\us}context}. The default definition for \index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will then stack the context value at the appropriate time. You can think of \agcode{PCB.input{\us}context} as a sort of temporary ``parking place'' for the context value. \index{PCB}\index{input{\us}value} \paragraph{\agcode{\textit{input-value-type} input{\us}value;}} \agcode{PCB.input{\us}value} is a field in the parser control block which is used to store the value of the input token. If you write your own \index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use \index{Event driven}\index{Configuration switches}\agparam{event driven} input, and you have set the \index{Input values}\index{Configuration switches}\agparam{input values} configuration switch, you should make sure that you store the value of the input character or token into \agcode{PCB.input{\us}value}. \index{PCB}\index{line} \paragraph{\agcode{int line;}} \agcode{PCB.line} contains the line number of the current character in your input. Line and column numbers are tracked only if the \index{Lines and columns}\index{Configuration switches} \agparam{lines and columns} configuration switch has been set. \index{PCB}\index{pointer} \paragraph{\agcode{\textit{pointer-type} pointer;}} \agcode{PCB.pointer} will be included in the parser control block for your parser if you have set the \index{Pointer input}\index{Configuration switches}\agparam{pointer input} configuration switch. The type of \agcode{PCB.pointer} is determined by the \index{Pointer type}\index{Configuration parameters}\agparam{pointer type} configuration parameter, which defaults to \agcode{unsigned char *}. Your main program should set \agcode{PCB.pointer} before it calls your parser. Thereafter, your parser will increment it appropriately. When you are executing a reduction procedure or a \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro \agcode{PCB.pointer} will always point to the next input character to be read. \index{PCB}\index{reduction{\us}token} \paragraph{\agcode{int reduction{\us}token;}} Whenever your parser executes a reduction procedure, \agcode{PCB.reduction{\us}token} contains the number of the token to which the rule being reduced is to reduce to. If your grammar uses semantically determined productions, your reduction procedure may change the value of \agcode{PCB.reduction{\us}token} to the desired value. Prior to calling your reduction procedure, your parser will set this field to the token number of the default reduction token, i.e., the first token in the reduction token list for the production being reduced. If the reduction procedure establishes that a different reduction token is appropriate, it should store the appropriate token number in \agcode{PCB.reduction{\us}token}. The easiest way to do this is to use the \index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION} macro. \index{PCB}\index{sn} \paragraph{\agcode{int sn;}} \agcode{PCB.sn} always contains the current \index{State}\index{Number}state number of your parser. \index{PCB}\index{ss} \paragraph{\agcode{int ss[];}} \agcode{PCB.ss} is the \index{Parser state stack}\index{State stack}\index{Stack}state stack for your parser. Before every shift action, the current state number, \agcode{PCB.sn}, is stored in \agcode{PCB.ss[PCB.ssx]}. \agcode{PCB.ssx} is then incremented. \index{PCB}\index{ssx} \paragraph{\agcode{int ssx;}} \agcode{PCB.ssx} contains the parser \index{Stack}stack index for your parser. On every shift action it is incremented. On every reduction action the length of the grammar rule being reduced is subtracted from \agcode{PCB.ssx}. \index{PCB}\index{token{\us}number} \paragraph{\agcode{int token{\us}number;}} \agcode{PCB.token{\us}number} contains the internal \index{Token number}\index{Number}token number of the current input token. If you are not supplying token numbers directly, it is the result of using the actual input character to index the token conversion array, \agcode{ag{\us}tcv}. Your parser automatically maintains the proper value in \agcode{PCB.token{\us}number}. Input token numbers should always be stored in \agcode{PCB.input{\us}code}. % XXX ``is a field is the''? \index{vs}\index{PCB} \paragraph{\agcode{\textit{value-stack-type} vs[];}} \agcode{PCB.vs} is a field is the \index{Parser value stack}\index{Stack}\index{Value stack}value stack for your parser. The semantic values of the tokens identified by the parser are stored in the value \index{Stack}\index{Value stack}stack. The value stack, like the other parser stacks, is indexed by \agcode{PCB.ssx}. When your parser is executing a reduction procedure, \agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]} contains the second, and so forth. The return value from your reduction procedure will be stored in turn in \agcode{PCB.vs[PCB.ssx]}. \index{{\us}dol{\us}vt} \agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where ``\agcode{\$}'' represents the name of your syntax file. AnaGram defines \agcode{\${\us}vt} so that it is large enough to store the semantic value of any of the tokens declared in your grammar.