comparison doc/manual/pcb.tex @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 \chapter{Parser Control Block}
2
3 \index{PCB}\index{Parser control block}
4 A \agterm{parser control block} is a structure which contains all of
5 the data necessary to describe the instantaneous state of a parser.
6 The \agcode{typedef} statement which defines the structure is included
7 in the \index{Header file}\index{File}header file for your parser.
8 AnaGram creates the name of the \index{Name}data type for the
9 structure by appending \agcode{{\us}pcb{\us}type} to the parser name.
10
11 % XXX: does \index{Name} belong?
12 If the
13 \index{Declare pcb}\index{Configuration switches}\agparam{declare pcb}
14 configuration switch is on, its default state, AnaGram will declare a
15 parser control block for you at the beginning of your parser file.
16 AnaGram will determine the \index{Name}name of the parser control
17 block by appending \agcode{{\us}pcb} to the parser name. AnaGram will
18 also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a
19 shorthand notation for use within the parser.
20
21 If you wish to declare your own parser control block, you must include
22 the header file for your parser before your declaration. Then you
23 declare a parser control block and define \agcode{PCB} to refer to the
24 control block you have declared.
25
26 Suppose your grammar is called \agcode{widget}. You would then write
27 the following statements in your embedded C in order to declare a
28 parser control block named \agcode{widget{\us}control}:
29
30 \begin{indentingcode}{0.4in}
31 \#include "widget.h"
32 widget{\us}pcb{\us}type widget{\us}control;
33 \#define PCB widget{\us}control
34 \end{indentingcode}
35
36 The remainder of this appendix describes fields in the parser control
37 block that may interest the user:
38
39 \index{column}\index{PCB}
40 \paragraph{\agcode{int column;}}
41 \agcode{PCB.column} keeps track of the column number of the current
42 character in your input. Line and column numbers are tracked only if
43 the \index{Lines and columns}\index{Configuration switches}
44 \agparam{lines and columns} configuration switch has been set.
45
46 \index{cs}\index{PCB}
47 \paragraph{\agcode{\textit{context-type} cs[];}}
48 \agcode{PCB.cs} is your \index{Context stack}\index{Stack}context
49 stack. \agcode{cs} will be defined only if you have assigned a value
50 to the configuration parameter
51 \index{Context type}\index{Configuration parameters}\agparam{context type}.
52
53 \index{error{\us}frame{\us}ssx}\index{PCB}
54 \paragraph{\agcode{int error{\us}frame{\us}ssx;}}
55 \agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling
56 routines may refer. When your
57 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
58 called, if you have set both the
59 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
60 and
61 \index{Error frame}\index{Configuration switches}\agparam{error frame}
62 configuration switches,
63 \agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser
64 stack index at the beginning of the frame token identified by
65 \agcode{PCB.error{\us}frame{\us}token}. For example, if in a syntax file,
66 you fail to close a comment, AnaGram will encounter an illegal end of
67 file in the comment. In this situation, \agcode{error{\us}frame{\us}token} is
68 the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser
69 stack depth at the beginning of the comment.
70
71 \index{error{\us}frame{\us}token}\index{PCB}
72 \paragraph{\agcode{int error{\us}frame{\us}token;}}
73 \agcode{PCB.error{\us}frame{\us}token} is a field to which
74 your error handling routines may refer. When your
75 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is
76 called, if you have set both
77 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
78 and
79 \index{Error frame}\index{Configuration switches}\agparam{error frame},
80 it will contain the token number of the frame token, a token which
81 identifies the context of the error.
82
83 \index{error{\us}message}\index{PCB}
84 \paragraph{\agcode{char *error{\us}message;}}
85 \agcode{PCB.error{\us}message} is a field to which your error handling
86 procedures may refer. If you have set the
87 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors}
88 configuration switch, on encountering a syntax error your parser will
89 create a string containing an appropriate diagnostic message and store
90 a pointer to it into \agcode{PCB.error{\us}message}.
91
92 \index{PCB}\index{exit{\us}flag}
93 \paragraph{\agcode{int exit{\us}flag;}}
94 \agcode{PCB.exit{\us}flag} contains a code value which indicates
95 whether the parser is still running or whether it has terminated. If
96 the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the
97 reason the parse has terminated. Mnemonic values for these \index{Exit
98 codes}exit codes are defined in the header file for your parser. The
99 values are as follows:
100 % XXX s/mnemonic/symbolic/
101
102 \begin{tabular}{ll}
103 \agcode{AG{\us}RUNNING{\us}CODE} & 0\\
104 \agcode{AG{\us}SUCCESS{\us}CODE} & 1\\
105 \agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\
106 \agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\
107 \agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\
108 \agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5
109 \end{tabular}
110
111 \index{PCB}\index{input{\us}code}
112 \paragraph{\agcode{int input{\us}code;}}
113 \agcode{PCB.input{\us}code} contains the current input character, or
114 the token number, if your
115 \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies
116 token numbers directly.
117
118 If you write your own \agcode{GET{\us}INPUT} macro, you must make sure
119 that you store the input character or token number you get into
120 \agcode{PCB.input{\us}code}.
121
122 If you have configured your parser to be
123 \index{Event driven}\index{Configuration switches}\agparam{event driven},
124 you must store the input character or token number for each token in
125 turn into \agcode{PCB.input{\us}code} before you call your parser to
126 process it.
127
128 \index{PCB}\index{input{\us}context}
129 \paragraph{\agcode{\textit{context-type} input{\us}context;}}
130 \agcode{PCB.input{\us}context} is a field which AnaGram adds to the
131 definition of the parser control block structure when you assign a
132 value to the
133 \index{Context type}\index{Configuration parameters}\agparam{context type}
134 configuration parameter. If you choose, you can
135 write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro
136 so that it stores the context value in \agcode{PCB.input{\us}context}.
137 The default definition for
138 \index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will
139 then stack the context value at the appropriate time. You can think
140 of \agcode{PCB.input{\us}context} as a sort of temporary ``parking
141 place'' for the context value.
142
143 \index{PCB}\index{input{\us}value}
144 \paragraph{\agcode{\textit{input-value-type} input{\us}value;}}
145 \agcode{PCB.input{\us}value} is a field in the parser control block which
146 is used to store the value of the input token.
147
148 If you write your own
149 \index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use
150 \index{Event driven}\index{Configuration switches}\agparam{event driven}
151 input, and you have set the
152 \index{Input values}\index{Configuration switches}\agparam{input values}
153 configuration switch, you should make sure that you store the value of
154 the input character or token into \agcode{PCB.input{\us}value}.
155
156 \index{PCB}\index{line}
157 \paragraph{\agcode{int line;}}
158 \agcode{PCB.line} contains the line number of the current character in
159 your input. Line and column numbers are tracked only if the
160 \index{Lines and columns}\index{Configuration switches}
161 \agparam{lines and columns} configuration switch has been set.
162
163 \index{PCB}\index{pointer}
164 \paragraph{\agcode{\textit{pointer-type} pointer;}}
165 \agcode{PCB.pointer} will be included in the parser control block for
166 your parser if you have set the
167 \index{Pointer input}\index{Configuration switches}\agparam{pointer input}
168 configuration switch. The type of \agcode{PCB.pointer} is determined
169 by the
170 \index{Pointer type}\index{Configuration parameters}\agparam{pointer type}
171 configuration parameter, which defaults to \agcode{unsigned char *}.
172 Your main program should set \agcode{PCB.pointer} before it calls your
173 parser. Thereafter, your parser will increment it appropriately.
174 When you are executing a reduction procedure or a
175 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro
176 \agcode{PCB.pointer} will always point to the next input character to
177 be read.
178
179 \index{PCB}\index{reduction{\us}token}
180 \paragraph{\agcode{int reduction{\us}token;}}
181 Whenever your parser executes a reduction procedure,
182 \agcode{PCB.reduction{\us}token} contains the number of the token to
183 which the rule being reduced is to reduce to. If your grammar uses
184 semantically determined productions, your reduction procedure may
185 change the value of \agcode{PCB.reduction{\us}token} to the desired
186 value.
187
188 Prior to calling your reduction procedure, your parser will set this
189 field to the token number of the default reduction token, i.e., the
190 first token in the reduction token list for the production being
191 reduced. If the reduction procedure establishes that a different
192 reduction token is appropriate, it should store the appropriate token
193 number in \agcode{PCB.reduction{\us}token}. The easiest way to do this
194 is to use the
195 \index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION}
196 macro.
197
198 \index{PCB}\index{sn}
199 \paragraph{\agcode{int sn;}}
200 \agcode{PCB.sn} always contains the current
201 \index{State}\index{Number}state number of your parser.
202
203 \index{PCB}\index{ss}
204 \paragraph{\agcode{int ss[];}}
205 \agcode{PCB.ss} is the \index{Parser state stack}\index{State
206 stack}\index{Stack}state stack for your parser. Before every shift action,
207 the current state number, \agcode{PCB.sn}, is stored in
208 \agcode{PCB.ss[PCB.ssx]}. \agcode{PCB.ssx} is then incremented.
209
210 \index{PCB}\index{ssx}
211 \paragraph{\agcode{int ssx;}}
212 \agcode{PCB.ssx} contains the parser \index{Stack}stack index for your
213 parser. On every shift action it is incremented. On every reduction
214 action the length of the grammar rule being reduced is subtracted from
215 \agcode{PCB.ssx}.
216
217 \index{PCB}\index{token{\us}number}
218 \paragraph{\agcode{int token{\us}number;}}
219 \agcode{PCB.token{\us}number} contains the internal \index{Token
220 number}\index{Number}token number of the current input token. If you
221 are not supplying token numbers directly, it is the result of using
222 the actual input character to index the token conversion array,
223 \agcode{ag{\us}tcv}.
224
225 Your parser automatically maintains the proper value in
226 \agcode{PCB.token{\us}number}. Input token numbers should always be
227 stored in \agcode{PCB.input{\us}code}.
228
229 % XXX ``is a field is the''?
230 \index{vs}\index{PCB}
231 \paragraph{\agcode{\textit{value-stack-type} vs[];}}
232 \agcode{PCB.vs} is a field is the
233 \index{Parser value stack}\index{Stack}\index{Value stack}value stack
234 for your parser. The semantic values of the tokens identified by the
235 parser are stored in the value \index{Stack}\index{Value stack}stack.
236 The value stack, like the other parser stacks, is indexed by
237 \agcode{PCB.ssx}.
238 When your parser is executing a reduction procedure,
239 \agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first
240 token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]}
241 contains the second, and so forth. The return value from your
242 reduction procedure will be stored in turn in
243 \agcode{PCB.vs[PCB.ssx]}.
244
245 \index{{\us}dol{\us}vt}
246 \agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where
247 ``\agcode{\$}'' represents the name of your syntax file. AnaGram
248 defines \agcode{\${\us}vt} so that it is large enough to store the
249 semantic value of any of the tokens declared in your grammar.