Mercurial > ~dholland > hg > ag > index.cgi
diff doc/manual/pcb.tex @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/manual/pcb.tex Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,249 @@ +\chapter{Parser Control Block} + +\index{PCB}\index{Parser control block} +A \agterm{parser control block} is a structure which contains all of +the data necessary to describe the instantaneous state of a parser. +The \agcode{typedef} statement which defines the structure is included +in the \index{Header file}\index{File}header file for your parser. +AnaGram creates the name of the \index{Name}data type for the +structure by appending \agcode{{\us}pcb{\us}type} to the parser name. + +% XXX: does \index{Name} belong? +If the +\index{Declare pcb}\index{Configuration switches}\agparam{declare pcb} +configuration switch is on, its default state, AnaGram will declare a +parser control block for you at the beginning of your parser file. +AnaGram will determine the \index{Name}name of the parser control +block by appending \agcode{{\us}pcb} to the parser name. AnaGram will +also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a +shorthand notation for use within the parser. + +If you wish to declare your own parser control block, you must include +the header file for your parser before your declaration. Then you +declare a parser control block and define \agcode{PCB} to refer to the +control block you have declared. + +Suppose your grammar is called \agcode{widget}. You would then write +the following statements in your embedded C in order to declare a +parser control block named \agcode{widget{\us}control}: + +\begin{indentingcode}{0.4in} +\#include "widget.h" +widget{\us}pcb{\us}type widget{\us}control; +\#define PCB widget{\us}control +\end{indentingcode} + +The remainder of this appendix describes fields in the parser control +block that may interest the user: + +\index{column}\index{PCB} +\paragraph{\agcode{int column;}} +\agcode{PCB.column} keeps track of the column number of the current +character in your input. Line and column numbers are tracked only if +the \index{Lines and columns}\index{Configuration switches} +\agparam{lines and columns} configuration switch has been set. + +\index{cs}\index{PCB} +\paragraph{\agcode{\textit{context-type} cs[];}} +\agcode{PCB.cs} is your \index{Context stack}\index{Stack}context +stack. \agcode{cs} will be defined only if you have assigned a value +to the configuration parameter +\index{Context type}\index{Configuration parameters}\agparam{context type}. + +\index{error{\us}frame{\us}ssx}\index{PCB} +\paragraph{\agcode{int error{\us}frame{\us}ssx;}} +\agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling +routines may refer. When your +\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is +called, if you have set both the +\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} +and +\index{Error frame}\index{Configuration switches}\agparam{error frame} +configuration switches, +\agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser +stack index at the beginning of the frame token identified by +\agcode{PCB.error{\us}frame{\us}token}. For example, if in a syntax file, +you fail to close a comment, AnaGram will encounter an illegal end of +file in the comment. In this situation, \agcode{error{\us}frame{\us}token} is +the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser +stack depth at the beginning of the comment. + +\index{error{\us}frame{\us}token}\index{PCB} +\paragraph{\agcode{int error{\us}frame{\us}token;}} +\agcode{PCB.error{\us}frame{\us}token} is a field to which +your error handling routines may refer. When your +\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is +called, if you have set both +\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} +and +\index{Error frame}\index{Configuration switches}\agparam{error frame}, +it will contain the token number of the frame token, a token which +identifies the context of the error. + +\index{error{\us}message}\index{PCB} +\paragraph{\agcode{char *error{\us}message;}} +\agcode{PCB.error{\us}message} is a field to which your error handling +procedures may refer. If you have set the +\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} +configuration switch, on encountering a syntax error your parser will +create a string containing an appropriate diagnostic message and store +a pointer to it into \agcode{PCB.error{\us}message}. + +\index{PCB}\index{exit{\us}flag} +\paragraph{\agcode{int exit{\us}flag;}} +\agcode{PCB.exit{\us}flag} contains a code value which indicates +whether the parser is still running or whether it has terminated. If +the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the +reason the parse has terminated. Mnemonic values for these \index{Exit +codes}exit codes are defined in the header file for your parser. The +values are as follows: +% XXX s/mnemonic/symbolic/ + +\begin{tabular}{ll} +\agcode{AG{\us}RUNNING{\us}CODE} & 0\\ +\agcode{AG{\us}SUCCESS{\us}CODE} & 1\\ +\agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\ +\agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\ +\agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\ +\agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5 +\end{tabular} + +\index{PCB}\index{input{\us}code} +\paragraph{\agcode{int input{\us}code;}} +\agcode{PCB.input{\us}code} contains the current input character, or +the token number, if your +\index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies +token numbers directly. + +If you write your own \agcode{GET{\us}INPUT} macro, you must make sure +that you store the input character or token number you get into +\agcode{PCB.input{\us}code}. + +If you have configured your parser to be +\index{Event driven}\index{Configuration switches}\agparam{event driven}, +you must store the input character or token number for each token in +turn into \agcode{PCB.input{\us}code} before you call your parser to +process it. + +\index{PCB}\index{input{\us}context} +\paragraph{\agcode{\textit{context-type} input{\us}context;}} +\agcode{PCB.input{\us}context} is a field which AnaGram adds to the +definition of the parser control block structure when you assign a +value to the +\index{Context type}\index{Configuration parameters}\agparam{context type} +configuration parameter. If you choose, you can +write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro +so that it stores the context value in \agcode{PCB.input{\us}context}. +The default definition for +\index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will +then stack the context value at the appropriate time. You can think +of \agcode{PCB.input{\us}context} as a sort of temporary ``parking +place'' for the context value. + +\index{PCB}\index{input{\us}value} +\paragraph{\agcode{\textit{input-value-type} input{\us}value;}} +\agcode{PCB.input{\us}value} is a field in the parser control block which +is used to store the value of the input token. + +If you write your own +\index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use +\index{Event driven}\index{Configuration switches}\agparam{event driven} +input, and you have set the +\index{Input values}\index{Configuration switches}\agparam{input values} +configuration switch, you should make sure that you store the value of +the input character or token into \agcode{PCB.input{\us}value}. + +\index{PCB}\index{line} +\paragraph{\agcode{int line;}} +\agcode{PCB.line} contains the line number of the current character in +your input. Line and column numbers are tracked only if the +\index{Lines and columns}\index{Configuration switches} +\agparam{lines and columns} configuration switch has been set. + +\index{PCB}\index{pointer} +\paragraph{\agcode{\textit{pointer-type} pointer;}} +\agcode{PCB.pointer} will be included in the parser control block for +your parser if you have set the +\index{Pointer input}\index{Configuration switches}\agparam{pointer input} +configuration switch. The type of \agcode{PCB.pointer} is determined +by the +\index{Pointer type}\index{Configuration parameters}\agparam{pointer type} +configuration parameter, which defaults to \agcode{unsigned char *}. +Your main program should set \agcode{PCB.pointer} before it calls your +parser. Thereafter, your parser will increment it appropriately. +When you are executing a reduction procedure or a +\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro +\agcode{PCB.pointer} will always point to the next input character to +be read. + +\index{PCB}\index{reduction{\us}token} +\paragraph{\agcode{int reduction{\us}token;}} +Whenever your parser executes a reduction procedure, +\agcode{PCB.reduction{\us}token} contains the number of the token to +which the rule being reduced is to reduce to. If your grammar uses +semantically determined productions, your reduction procedure may +change the value of \agcode{PCB.reduction{\us}token} to the desired +value. + +Prior to calling your reduction procedure, your parser will set this +field to the token number of the default reduction token, i.e., the +first token in the reduction token list for the production being +reduced. If the reduction procedure establishes that a different +reduction token is appropriate, it should store the appropriate token +number in \agcode{PCB.reduction{\us}token}. The easiest way to do this +is to use the +\index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION} +macro. + +\index{PCB}\index{sn} +\paragraph{\agcode{int sn;}} +\agcode{PCB.sn} always contains the current +\index{State}\index{Number}state number of your parser. + +\index{PCB}\index{ss} +\paragraph{\agcode{int ss[];}} +\agcode{PCB.ss} is the \index{Parser state stack}\index{State +stack}\index{Stack}state stack for your parser. Before every shift action, +the current state number, \agcode{PCB.sn}, is stored in +\agcode{PCB.ss[PCB.ssx]}. \agcode{PCB.ssx} is then incremented. + +\index{PCB}\index{ssx} +\paragraph{\agcode{int ssx;}} +\agcode{PCB.ssx} contains the parser \index{Stack}stack index for your +parser. On every shift action it is incremented. On every reduction +action the length of the grammar rule being reduced is subtracted from +\agcode{PCB.ssx}. + +\index{PCB}\index{token{\us}number} +\paragraph{\agcode{int token{\us}number;}} +\agcode{PCB.token{\us}number} contains the internal \index{Token +number}\index{Number}token number of the current input token. If you +are not supplying token numbers directly, it is the result of using +the actual input character to index the token conversion array, +\agcode{ag{\us}tcv}. + +Your parser automatically maintains the proper value in +\agcode{PCB.token{\us}number}. Input token numbers should always be +stored in \agcode{PCB.input{\us}code}. + +% XXX ``is a field is the''? +\index{vs}\index{PCB} +\paragraph{\agcode{\textit{value-stack-type} vs[];}} +\agcode{PCB.vs} is a field is the +\index{Parser value stack}\index{Stack}\index{Value stack}value stack +for your parser. The semantic values of the tokens identified by the +parser are stored in the value \index{Stack}\index{Value stack}stack. +The value stack, like the other parser stacks, is indexed by +\agcode{PCB.ssx}. +When your parser is executing a reduction procedure, +\agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first +token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]} +contains the second, and so forth. The return value from your +reduction procedure will be stored in turn in +\agcode{PCB.vs[PCB.ssx]}. + +\index{{\us}dol{\us}vt} +\agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where +``\agcode{\$}'' represents the name of your syntax file. AnaGram +defines \agcode{\${\us}vt} so that it is large enough to store the +semantic value of any of the tokens declared in your grammar.