Mercurial > ~dholland > hg > ag > index.cgi

\chapter{Parser Control Block}

\index{PCB}\index{Parser control block}
A \agterm{parser control block} is a structure which contains all of
the data necessary to describe the instantaneous state of a parser.
The \agcode{typedef} statement which defines the structure is included
in the \index{Header file}\index{File}header file for your parser.
AnaGram creates the name of the \index{Name}data type for the
structure by appending \agcode{{\us}pcb{\us}type} to the parser name.

% XXX: does \index{Name} belong?
If the
\index{Declare pcb}\index{Configuration switches}\agparam{declare pcb}
configuration switch is on, its default state, AnaGram will declare a
parser control block for you at the beginning of your parser file.
AnaGram will determine the \index{Name}name of the parser control
block by appending \agcode{{\us}pcb} to the parser name. AnaGram will
also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a
shorthand notation for use within the parser.

If you wish to declare your own parser control block, you must include
the header file for your parser before your declaration.  Then you
declare a parser control block and define \agcode{PCB} to refer to the
control block you have declared.

Suppose your grammar is called \agcode{widget}. You would then write
the following statements in your embedded C in order to declare a
parser control block named \agcode{widget{\us}control}:

\begin{indentingcode}{0.4in}
\#include "widget.h"
widget{\us}pcb{\us}type widget{\us}control;
\#define PCB widget{\us}control
\end{indentingcode}

The remainder of this appendix describes fields in the parser control
block that may interest the user:

\index{column}\index{PCB}
\paragraph{\agcode{int column;}}
\agcode{PCB.column} keeps track of the column number of the current
character in your input.  Line and column numbers are tracked only if
the \index{Lines and columns}\index{Configuration switches}
\agparam{lines and columns} configuration switch has been set.

\index{cs}\index{PCB}
\paragraph{\agcode{\textit{context-type} cs[];}}
\agcode{PCB.cs} is your \index{Context stack}\index{Stack}context
stack.  \agcode{cs} will be defined only if you have assigned a value
to the configuration parameter
\index{Context type}\index{Configuration parameters}\agparam{context type}.

\index{error{\us}frame{\us}ssx}\index{PCB}
\paragraph{\agcode{int error{\us}frame{\us}ssx;}}
\agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling
routines may refer.  When your
\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
called, if you have set both the
\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
and
\index{Error frame}\index{Configuration switches}\agparam{error frame}
configuration switches,
\agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser
stack index at the beginning of the frame token identified by
\agcode{PCB.error{\us}frame{\us}token}.  For example, if in a syntax file,
you fail to close a comment, AnaGram will encounter an illegal end of
file in the comment.  In this situation, \agcode{error{\us}frame{\us}token} is
the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser
stack depth at the beginning of the comment.

\index{error{\us}frame{\us}token}\index{PCB}
\paragraph{\agcode{int error{\us}frame{\us}token;}}
\agcode{PCB.error{\us}frame{\us}token} is a field to which
your error handling routines may refer.  When your
\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
called, if you have set both
\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
and
\index{Error frame}\index{Configuration switches}\agparam{error frame},
it will contain the token number of the frame token, a token which
identifies the context of the error.

\index{error{\us}message}\index{PCB}
\paragraph{\agcode{char *error{\us}message;}}
\agcode{PCB.error{\us}message} is a field to which your error handling
procedures may refer.  If you have set the
\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
configuration switch, on encountering a syntax error your parser will
create a string containing an appropriate diagnostic message and store
a pointer to it into \agcode{PCB.error{\us}message}.

\index{PCB}\index{exit{\us}flag}
\paragraph{\agcode{int exit{\us}flag;}}
\agcode{PCB.exit{\us}flag} contains a code value which indicates
whether the parser is still running or whether it has terminated.  If
the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the
reason the parse has terminated.  Mnemonic values for these \index{Exit
codes}exit codes are defined in the header file for your parser. The
values are as follows:
% XXX s/mnemonic/symbolic/

\begin{tabular}{ll}
\agcode{AG{\us}RUNNING{\us}CODE} & 0\\
\agcode{AG{\us}SUCCESS{\us}CODE} & 1\\
\agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\
\agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\
\agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\
\agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5
\end{tabular}

\index{PCB}\index{input{\us}code}
\paragraph{\agcode{int input{\us}code;}}
\agcode{PCB.input{\us}code} contains the current input character, or
the token number, if your
\index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies
token numbers directly.

If you write your own \agcode{GET{\us}INPUT} macro, you must make sure
that you store the input character or token number you get into
\agcode{PCB.input{\us}code}.

If you have configured your parser to be
\index{Event driven}\index{Configuration switches}\agparam{event driven},
you must store the input character or token number for each token in
turn into \agcode{PCB.input{\us}code} before you call your parser to
process it.

\index{PCB}\index{input{\us}context}
\paragraph{\agcode{\textit{context-type} input{\us}context;}}
\agcode{PCB.input{\us}context} is a field which AnaGram adds to the
definition of the parser control block structure when you assign a
value to the
\index{Context type}\index{Configuration parameters}\agparam{context type}
configuration parameter.  If you choose, you can
write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro
so that it stores the context value in \agcode{PCB.input{\us}context}.
The default definition for
\index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will
then stack the context value at the appropriate time.  You can think
of \agcode{PCB.input{\us}context} as a sort of temporary ``parking
place'' for the context value.

\index{PCB}\index{input{\us}value}
\paragraph{\agcode{\textit{input-value-type} input{\us}value;}}
\agcode{PCB.input{\us}value} is a field in the parser control block which
is used to store the value of the input token.

If you write your own
\index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use
\index{Event driven}\index{Configuration switches}\agparam{event driven}
input, and you have set the
\index{Input values}\index{Configuration switches}\agparam{input values}
configuration switch, you should make sure that you store the value of
the input character or token into \agcode{PCB.input{\us}value}.

\index{PCB}\index{line}
\paragraph{\agcode{int line;}}
\agcode{PCB.line} contains the line number of the current character in
your input.  Line and column numbers are tracked only if the
\index{Lines and columns}\index{Configuration switches}
\agparam{lines and columns} configuration switch has been set.

\index{PCB}\index{pointer}
\paragraph{\agcode{\textit{pointer-type} pointer;}}
\agcode{PCB.pointer} will be included in the parser control block for
your parser if you have set the
\index{Pointer input}\index{Configuration switches}\agparam{pointer input}
configuration switch.  The type of \agcode{PCB.pointer} is determined
by the
\index{Pointer type}\index{Configuration parameters}\agparam{pointer type}
configuration parameter, which defaults to \agcode{unsigned char *}.
Your main program should set \agcode{PCB.pointer} before it calls your
parser.  Thereafter, your parser will increment it appropriately.
When you are executing a reduction procedure or a
\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro
\agcode{PCB.pointer} will always point to the next input character to
be read.

\index{PCB}\index{reduction{\us}token}
\paragraph{\agcode{int reduction{\us}token;}}
Whenever your parser executes a reduction procedure,
\agcode{PCB.reduction{\us}token} contains the number of the token to
which the rule being reduced is to reduce to.  If your grammar uses
semantically determined productions, your reduction procedure may
change the value of \agcode{PCB.reduction{\us}token} to the desired
value.

Prior to calling your reduction procedure, your parser will set this
field to the token number of the default reduction token, i.e., the
first token in the reduction token list for the production being
reduced. If the reduction procedure establishes that a different
reduction token is appropriate, it should store the appropriate token
number in \agcode{PCB.reduction{\us}token}.  The easiest way to do this
is to use the
\index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION}
macro.

\index{PCB}\index{sn}
\paragraph{\agcode{int sn;}}
\agcode{PCB.sn} always contains the current
\index{State}\index{Number}state number of your parser.

\index{PCB}\index{ss}
\paragraph{\agcode{int ss[];}}
\agcode{PCB.ss} is the \index{Parser state stack}\index{State
stack}\index{Stack}state stack for your parser.  Before every shift action,
the current state number, \agcode{PCB.sn}, is stored in
\agcode{PCB.ss[PCB.ssx]}.  \agcode{PCB.ssx} is then incremented.

\index{PCB}\index{ssx}
\paragraph{\agcode{int ssx;}}
\agcode{PCB.ssx} contains the parser \index{Stack}stack index for your
parser.  On every shift action it is incremented.  On every reduction
action the length of the grammar rule being reduced is subtracted from
\agcode{PCB.ssx}.

\index{PCB}\index{token{\us}number}
\paragraph{\agcode{int token{\us}number;}}
\agcode{PCB.token{\us}number} contains the internal \index{Token
number}\index{Number}token number of the current input token.  If you
are not supplying token numbers directly, it is the result of using
the actual input character to index the token conversion array,
\agcode{ag{\us}tcv}.

Your parser automatically maintains the proper value in
\agcode{PCB.token{\us}number}.  Input token numbers should always be
stored in \agcode{PCB.input{\us}code}.

% XXX ``is a field is the''?
\index{vs}\index{PCB}
\paragraph{\agcode{\textit{value-stack-type} vs[];}}
\agcode{PCB.vs} is a field is the
\index{Parser value stack}\index{Stack}\index{Value stack}value stack
for your parser.  The semantic values of the tokens identified by the
parser are stored in the value \index{Stack}\index{Value stack}stack.
The value stack, like the other parser stacks, is indexed by
\agcode{PCB.ssx}.
When your parser is executing a reduction procedure,
\agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first
token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]}
contains the second, and so forth. The return value from your
reduction procedure will be stored in turn in
\agcode{PCB.vs[PCB.ssx]}.

\index{{\us}dol{\us}vt}
\agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where
``\agcode{\$}'' represents the name of your syntax file.  AnaGram
defines \agcode{\${\us}vt} so that it is large enough to store the
semantic value of any of the tokens declared in your grammar.
author	David A. Holland
date	Mon, 30 May 2022 23:46:22 -0400
parents	13d2b8934445
children