diff doc/manual/pcb.tex @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/manual/pcb.tex	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,249 @@
+\chapter{Parser Control Block}
+
+\index{PCB}\index{Parser control block}
+A \agterm{parser control block} is a structure which contains all of
+the data necessary to describe the instantaneous state of a parser.
+The \agcode{typedef} statement which defines the structure is included
+in the \index{Header file}\index{File}header file for your parser.
+AnaGram creates the name of the \index{Name}data type for the
+structure by appending \agcode{{\us}pcb{\us}type} to the parser name.
+
+% XXX: does \index{Name} belong?
+If the
+\index{Declare pcb}\index{Configuration switches}\agparam{declare pcb}
+configuration switch is on, its default state, AnaGram will declare a
+parser control block for you at the beginning of your parser file.
+AnaGram will determine the \index{Name}name of the parser control
+block by appending \agcode{{\us}pcb} to the parser name. AnaGram will
+also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a
+shorthand notation for use within the parser.
+
+If you wish to declare your own parser control block, you must include
+the header file for your parser before your declaration.  Then you
+declare a parser control block and define \agcode{PCB} to refer to the
+control block you have declared.
+
+Suppose your grammar is called \agcode{widget}. You would then write
+the following statements in your embedded C in order to declare a
+parser control block named \agcode{widget{\us}control}:
+
+\begin{indentingcode}{0.4in}
+\#include "widget.h"
+widget{\us}pcb{\us}type widget{\us}control;
+\#define PCB widget{\us}control
+\end{indentingcode}
+
+The remainder of this appendix describes fields in the parser control
+block that may interest the user:
+
+\index{column}\index{PCB}
+\paragraph{\agcode{int column;}}
+\agcode{PCB.column} keeps track of the column number of the current
+character in your input.  Line and column numbers are tracked only if
+the \index{Lines and columns}\index{Configuration switches}
+\agparam{lines and columns} configuration switch has been set.
+
+\index{cs}\index{PCB}
+\paragraph{\agcode{\textit{context-type} cs[];}}
+\agcode{PCB.cs} is your \index{Context stack}\index{Stack}context
+stack.  \agcode{cs} will be defined only if you have assigned a value
+to the configuration parameter
+\index{Context type}\index{Configuration parameters}\agparam{context type}.
+
+\index{error{\us}frame{\us}ssx}\index{PCB}
+\paragraph{\agcode{int error{\us}frame{\us}ssx;}}
+\agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling
+routines may refer.  When your 
+\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
+called, if you have set both the
+\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
+and
+\index{Error frame}\index{Configuration switches}\agparam{error frame}
+configuration switches,
+\agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser
+stack index at the beginning of the frame token identified by
+\agcode{PCB.error{\us}frame{\us}token}.  For example, if in a syntax file,
+you fail to close a comment, AnaGram will encounter an illegal end of
+file in the comment.  In this situation, \agcode{error{\us}frame{\us}token} is 
+the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser
+stack depth at the beginning of the comment.
+
+\index{error{\us}frame{\us}token}\index{PCB}
+\paragraph{\agcode{int error{\us}frame{\us}token;}}
+\agcode{PCB.error{\us}frame{\us}token} is a field to which
+your error handling routines may refer.  When your
+\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
+called, if you have set both
+\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
+and
+\index{Error frame}\index{Configuration switches}\agparam{error frame},
+it will contain the token number of the frame token, a token which
+identifies the context of the error.
+
+\index{error{\us}message}\index{PCB}
+\paragraph{\agcode{char *error{\us}message;}}
+\agcode{PCB.error{\us}message} is a field to which your error handling
+procedures may refer.  If you have set the
+\index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
+configuration switch, on encountering a syntax error your parser will
+create a string containing an appropriate diagnostic message and store
+a pointer to it into \agcode{PCB.error{\us}message}.
+
+\index{PCB}\index{exit{\us}flag}
+\paragraph{\agcode{int exit{\us}flag;}}
+\agcode{PCB.exit{\us}flag} contains a code value which indicates
+whether the parser is still running or whether it has terminated.  If
+the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the
+reason the parse has terminated.  Mnemonic values for these \index{Exit
+codes}exit codes are defined in the header file for your parser. The
+values are as follows:
+% XXX s/mnemonic/symbolic/
+
+\begin{tabular}{ll}
+\agcode{AG{\us}RUNNING{\us}CODE} & 0\\
+\agcode{AG{\us}SUCCESS{\us}CODE} & 1\\
+\agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\
+\agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\
+\agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\
+\agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5 
+\end{tabular}
+
+\index{PCB}\index{input{\us}code}
+\paragraph{\agcode{int input{\us}code;}}
+\agcode{PCB.input{\us}code} contains the current input character, or
+the token number, if your
+\index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies
+token numbers directly.
+
+If you write your own \agcode{GET{\us}INPUT} macro, you must make sure
+that you store the input character or token number you get into
+\agcode{PCB.input{\us}code}.
+
+If you have configured your parser to be
+\index{Event driven}\index{Configuration switches}\agparam{event driven},
+you must store the input character or token number for each token in
+turn into \agcode{PCB.input{\us}code} before you call your parser to
+process it.
+
+\index{PCB}\index{input{\us}context}
+\paragraph{\agcode{\textit{context-type} input{\us}context;}}
+\agcode{PCB.input{\us}context} is a field which AnaGram adds to the
+definition of the parser control block structure when you assign a
+value to the
+\index{Context type}\index{Configuration parameters}\agparam{context type}
+configuration parameter.  If you choose, you can 
+write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro
+so that it stores the context value in \agcode{PCB.input{\us}context}.
+The default definition for
+\index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will
+then stack the context value at the appropriate time.  You can think
+of \agcode{PCB.input{\us}context} as a sort of temporary ``parking
+place'' for the context value.
+
+\index{PCB}\index{input{\us}value}
+\paragraph{\agcode{\textit{input-value-type} input{\us}value;}}
+\agcode{PCB.input{\us}value} is a field in the parser control block which
+is used to store the value of the input token.
+
+If you write your own
+\index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use
+\index{Event driven}\index{Configuration switches}\agparam{event driven}
+input, and you have set the
+\index{Input values}\index{Configuration switches}\agparam{input values}
+configuration switch, you should make sure that you store the value of
+the input character or token into \agcode{PCB.input{\us}value}.
+
+\index{PCB}\index{line}
+\paragraph{\agcode{int line;}}
+\agcode{PCB.line} contains the line number of the current character in
+your input.  Line and column numbers are tracked only if the
+\index{Lines and columns}\index{Configuration switches}
+\agparam{lines and columns} configuration switch has been set. 
+
+\index{PCB}\index{pointer}
+\paragraph{\agcode{\textit{pointer-type} pointer;}}
+\agcode{PCB.pointer} will be included in the parser control block for
+your parser if you have set the
+\index{Pointer input}\index{Configuration switches}\agparam{pointer input}
+configuration switch.  The type of \agcode{PCB.pointer} is determined
+by the
+\index{Pointer type}\index{Configuration parameters}\agparam{pointer type}
+configuration parameter, which defaults to \agcode{unsigned char *}.
+Your main program should set \agcode{PCB.pointer} before it calls your
+parser.  Thereafter, your parser will increment it appropriately.
+When you are executing a reduction procedure or a
+\index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro
+\agcode{PCB.pointer} will always point to the next input character to
+be read.
+
+\index{PCB}\index{reduction{\us}token}
+\paragraph{\agcode{int reduction{\us}token;}}
+Whenever your parser executes a reduction procedure,
+\agcode{PCB.reduction{\us}token} contains the number of the token to
+which the rule being reduced is to reduce to.  If your grammar uses
+semantically determined productions, your reduction procedure may
+change the value of \agcode{PCB.reduction{\us}token} to the desired
+value.
+
+Prior to calling your reduction procedure, your parser will set this
+field to the token number of the default reduction token, i.e., the
+first token in the reduction token list for the production being
+reduced. If the reduction procedure establishes that a different
+reduction token is appropriate, it should store the appropriate token
+number in \agcode{PCB.reduction{\us}token}.  The easiest way to do this
+is to use the
+\index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION}
+macro.
+
+\index{PCB}\index{sn}
+\paragraph{\agcode{int sn;}}
+\agcode{PCB.sn} always contains the current
+\index{State}\index{Number}state number of your parser.
+
+\index{PCB}\index{ss}
+\paragraph{\agcode{int ss[];}}
+\agcode{PCB.ss} is the \index{Parser state stack}\index{State
+stack}\index{Stack}state stack for your parser.  Before every shift action,
+the current state number, \agcode{PCB.sn}, is stored in
+\agcode{PCB.ss[PCB.ssx]}.  \agcode{PCB.ssx} is then incremented.
+
+\index{PCB}\index{ssx}
+\paragraph{\agcode{int ssx;}}
+\agcode{PCB.ssx} contains the parser \index{Stack}stack index for your
+parser.  On every shift action it is incremented.  On every reduction
+action the length of the grammar rule being reduced is subtracted from
+\agcode{PCB.ssx}.
+
+\index{PCB}\index{token{\us}number}
+\paragraph{\agcode{int token{\us}number;}}
+\agcode{PCB.token{\us}number} contains the internal \index{Token
+number}\index{Number}token number of the current input token.  If you
+are not supplying token numbers directly, it is the result of using
+the actual input character to index the token conversion array,
+\agcode{ag{\us}tcv}.
+
+Your parser automatically maintains the proper value in
+\agcode{PCB.token{\us}number}.  Input token numbers should always be
+stored in \agcode{PCB.input{\us}code}.
+
+% XXX ``is a field is the''?
+\index{vs}\index{PCB}
+\paragraph{\agcode{\textit{value-stack-type} vs[];}}
+\agcode{PCB.vs} is a field is the
+\index{Parser value stack}\index{Stack}\index{Value stack}value stack
+for your parser.  The semantic values of the tokens identified by the
+parser are stored in the value \index{Stack}\index{Value stack}stack.
+The value stack, like the other parser stacks, is indexed by
+\agcode{PCB.ssx}.
+When your parser is executing a reduction procedure,
+\agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first
+token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]}
+contains the second, and so forth. The return value from your
+reduction procedure will be stored in turn in
+\agcode{PCB.vs[PCB.ssx]}.
+
+\index{{\us}dol{\us}vt}
+\agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where
+``\agcode{\$}'' represents the name of your syntax file.  AnaGram
+defines \agcode{\${\us}vt} so that it is large enough to store the
+semantic value of any of the tokens declared in your grammar.