Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/manual/pcb.tex @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 \chapter{Parser Control Block} | |
2 | |
3 \index{PCB}\index{Parser control block} | |
4 A \agterm{parser control block} is a structure which contains all of | |
5 the data necessary to describe the instantaneous state of a parser. | |
6 The \agcode{typedef} statement which defines the structure is included | |
7 in the \index{Header file}\index{File}header file for your parser. | |
8 AnaGram creates the name of the \index{Name}data type for the | |
9 structure by appending \agcode{{\us}pcb{\us}type} to the parser name. | |
10 | |
11 % XXX: does \index{Name} belong? | |
12 If the | |
13 \index{Declare pcb}\index{Configuration switches}\agparam{declare pcb} | |
14 configuration switch is on, its default state, AnaGram will declare a | |
15 parser control block for you at the beginning of your parser file. | |
16 AnaGram will determine the \index{Name}name of the parser control | |
17 block by appending \agcode{{\us}pcb} to the parser name. AnaGram will | |
18 also define the macro \index{PCB}\index{Macros}\agcode{PCB} as a | |
19 shorthand notation for use within the parser. | |
20 | |
21 If you wish to declare your own parser control block, you must include | |
22 the header file for your parser before your declaration. Then you | |
23 declare a parser control block and define \agcode{PCB} to refer to the | |
24 control block you have declared. | |
25 | |
26 Suppose your grammar is called \agcode{widget}. You would then write | |
27 the following statements in your embedded C in order to declare a | |
28 parser control block named \agcode{widget{\us}control}: | |
29 | |
30 \begin{indentingcode}{0.4in} | |
31 \#include "widget.h" | |
32 widget{\us}pcb{\us}type widget{\us}control; | |
33 \#define PCB widget{\us}control | |
34 \end{indentingcode} | |
35 | |
36 The remainder of this appendix describes fields in the parser control | |
37 block that may interest the user: | |
38 | |
39 \index{column}\index{PCB} | |
40 \paragraph{\agcode{int column;}} | |
41 \agcode{PCB.column} keeps track of the column number of the current | |
42 character in your input. Line and column numbers are tracked only if | |
43 the \index{Lines and columns}\index{Configuration switches} | |
44 \agparam{lines and columns} configuration switch has been set. | |
45 | |
46 \index{cs}\index{PCB} | |
47 \paragraph{\agcode{\textit{context-type} cs[];}} | |
48 \agcode{PCB.cs} is your \index{Context stack}\index{Stack}context | |
49 stack. \agcode{cs} will be defined only if you have assigned a value | |
50 to the configuration parameter | |
51 \index{Context type}\index{Configuration parameters}\agparam{context type}. | |
52 | |
53 \index{error{\us}frame{\us}ssx}\index{PCB} | |
54 \paragraph{\agcode{int error{\us}frame{\us}ssx;}} | |
55 \agcode{PCB.error{\us}frame{\us}ssx} is a field to which your error handling | |
56 routines may refer. When your | |
57 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is | |
58 called, if you have set both the | |
59 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} | |
60 and | |
61 \index{Error frame}\index{Configuration switches}\agparam{error frame} | |
62 configuration switches, | |
63 \agcode{PCB.error{\us}frame{\us}ssx} will contain the value of the parser | |
64 stack index at the beginning of the frame token identified by | |
65 \agcode{PCB.error{\us}frame{\us}token}. For example, if in a syntax file, | |
66 you fail to close a comment, AnaGram will encounter an illegal end of | |
67 file in the comment. In this situation, \agcode{error{\us}frame{\us}token} is | |
68 the comment token, and \agcode{PCB.error{\us}frame{\us}ssx} gives the parser | |
69 stack depth at the beginning of the comment. | |
70 | |
71 \index{error{\us}frame{\us}token}\index{PCB} | |
72 \paragraph{\agcode{int error{\us}frame{\us}token;}} | |
73 \agcode{PCB.error{\us}frame{\us}token} is a field to which | |
74 your error handling routines may refer. When your | |
75 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro is | |
76 called, if you have set both | |
77 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} | |
78 and | |
79 \index{Error frame}\index{Configuration switches}\agparam{error frame}, | |
80 it will contain the token number of the frame token, a token which | |
81 identifies the context of the error. | |
82 | |
83 \index{error{\us}message}\index{PCB} | |
84 \paragraph{\agcode{char *error{\us}message;}} | |
85 \agcode{PCB.error{\us}message} is a field to which your error handling | |
86 procedures may refer. If you have set the | |
87 \index{Diagnose errors}\index{Configuration switches}\agparam{diagnose errors} | |
88 configuration switch, on encountering a syntax error your parser will | |
89 create a string containing an appropriate diagnostic message and store | |
90 a pointer to it into \agcode{PCB.error{\us}message}. | |
91 | |
92 \index{PCB}\index{exit{\us}flag} | |
93 \paragraph{\agcode{int exit{\us}flag;}} | |
94 \agcode{PCB.exit{\us}flag} contains a code value which indicates | |
95 whether the parser is still running or whether it has terminated. If | |
96 the parser has terminated, \agcode{PCB.exit{\us}flag} indicates the | |
97 reason the parse has terminated. Mnemonic values for these \index{Exit | |
98 codes}exit codes are defined in the header file for your parser. The | |
99 values are as follows: | |
100 % XXX s/mnemonic/symbolic/ | |
101 | |
102 \begin{tabular}{ll} | |
103 \agcode{AG{\us}RUNNING{\us}CODE} & 0\\ | |
104 \agcode{AG{\us}SUCCESS{\us}CODE} & 1\\ | |
105 \agcode{AG{\us}SYNTAX{\us}ERROR{\us}CODE} & 2\\ | |
106 \agcode{AG{\us}REDUCTION{\us}ERROR{\us}CODE} & 3\\ | |
107 \agcode{AG{\us}STACK{\us}ERROR{\us}CODE} & 4\\ | |
108 \agcode{AG{\us}SEMANTIC{\us}ERROR{\us}CODE} & 5 | |
109 \end{tabular} | |
110 | |
111 \index{PCB}\index{input{\us}code} | |
112 \paragraph{\agcode{int input{\us}code;}} | |
113 \agcode{PCB.input{\us}code} contains the current input character, or | |
114 the token number, if your | |
115 \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro supplies | |
116 token numbers directly. | |
117 | |
118 If you write your own \agcode{GET{\us}INPUT} macro, you must make sure | |
119 that you store the input character or token number you get into | |
120 \agcode{PCB.input{\us}code}. | |
121 | |
122 If you have configured your parser to be | |
123 \index{Event driven}\index{Configuration switches}\agparam{event driven}, | |
124 you must store the input character or token number for each token in | |
125 turn into \agcode{PCB.input{\us}code} before you call your parser to | |
126 process it. | |
127 | |
128 \index{PCB}\index{input{\us}context} | |
129 \paragraph{\agcode{\textit{context-type} input{\us}context;}} | |
130 \agcode{PCB.input{\us}context} is a field which AnaGram adds to the | |
131 definition of the parser control block structure when you assign a | |
132 value to the | |
133 \index{Context type}\index{Configuration parameters}\agparam{context type} | |
134 configuration parameter. If you choose, you can | |
135 write your \index{GET{\us}INPUT}\index{Macros}\agcode{GET{\us}INPUT} macro | |
136 so that it stores the context value in \agcode{PCB.input{\us}context}. | |
137 The default definition for | |
138 \index{GET{\us}CONTEXT}\index{Macros}\agcode{GET{\us}CONTEXT} will | |
139 then stack the context value at the appropriate time. You can think | |
140 of \agcode{PCB.input{\us}context} as a sort of temporary ``parking | |
141 place'' for the context value. | |
142 | |
143 \index{PCB}\index{input{\us}value} | |
144 \paragraph{\agcode{\textit{input-value-type} input{\us}value;}} | |
145 \agcode{PCB.input{\us}value} is a field in the parser control block which | |
146 is used to store the value of the input token. | |
147 | |
148 If you write your own | |
149 \index{Macros}\index{GET{\us}INPUT}\agcode{GET{\us}INPUT} macro or use | |
150 \index{Event driven}\index{Configuration switches}\agparam{event driven} | |
151 input, and you have set the | |
152 \index{Input values}\index{Configuration switches}\agparam{input values} | |
153 configuration switch, you should make sure that you store the value of | |
154 the input character or token into \agcode{PCB.input{\us}value}. | |
155 | |
156 \index{PCB}\index{line} | |
157 \paragraph{\agcode{int line;}} | |
158 \agcode{PCB.line} contains the line number of the current character in | |
159 your input. Line and column numbers are tracked only if the | |
160 \index{Lines and columns}\index{Configuration switches} | |
161 \agparam{lines and columns} configuration switch has been set. | |
162 | |
163 \index{PCB}\index{pointer} | |
164 \paragraph{\agcode{\textit{pointer-type} pointer;}} | |
165 \agcode{PCB.pointer} will be included in the parser control block for | |
166 your parser if you have set the | |
167 \index{Pointer input}\index{Configuration switches}\agparam{pointer input} | |
168 configuration switch. The type of \agcode{PCB.pointer} is determined | |
169 by the | |
170 \index{Pointer type}\index{Configuration parameters}\agparam{pointer type} | |
171 configuration parameter, which defaults to \agcode{unsigned char *}. | |
172 Your main program should set \agcode{PCB.pointer} before it calls your | |
173 parser. Thereafter, your parser will increment it appropriately. | |
174 When you are executing a reduction procedure or a | |
175 \index{SYNTAX{\us}ERROR}\index{Macros}\agcode{SYNTAX{\us}ERROR} macro | |
176 \agcode{PCB.pointer} will always point to the next input character to | |
177 be read. | |
178 | |
179 \index{PCB}\index{reduction{\us}token} | |
180 \paragraph{\agcode{int reduction{\us}token;}} | |
181 Whenever your parser executes a reduction procedure, | |
182 \agcode{PCB.reduction{\us}token} contains the number of the token to | |
183 which the rule being reduced is to reduce to. If your grammar uses | |
184 semantically determined productions, your reduction procedure may | |
185 change the value of \agcode{PCB.reduction{\us}token} to the desired | |
186 value. | |
187 | |
188 Prior to calling your reduction procedure, your parser will set this | |
189 field to the token number of the default reduction token, i.e., the | |
190 first token in the reduction token list for the production being | |
191 reduced. If the reduction procedure establishes that a different | |
192 reduction token is appropriate, it should store the appropriate token | |
193 number in \agcode{PCB.reduction{\us}token}. The easiest way to do this | |
194 is to use the | |
195 \index{CHANGE{\us}REDUCTION}\index{Macros}\agcode{CHANGE{\us}REDUCTION} | |
196 macro. | |
197 | |
198 \index{PCB}\index{sn} | |
199 \paragraph{\agcode{int sn;}} | |
200 \agcode{PCB.sn} always contains the current | |
201 \index{State}\index{Number}state number of your parser. | |
202 | |
203 \index{PCB}\index{ss} | |
204 \paragraph{\agcode{int ss[];}} | |
205 \agcode{PCB.ss} is the \index{Parser state stack}\index{State | |
206 stack}\index{Stack}state stack for your parser. Before every shift action, | |
207 the current state number, \agcode{PCB.sn}, is stored in | |
208 \agcode{PCB.ss[PCB.ssx]}. \agcode{PCB.ssx} is then incremented. | |
209 | |
210 \index{PCB}\index{ssx} | |
211 \paragraph{\agcode{int ssx;}} | |
212 \agcode{PCB.ssx} contains the parser \index{Stack}stack index for your | |
213 parser. On every shift action it is incremented. On every reduction | |
214 action the length of the grammar rule being reduced is subtracted from | |
215 \agcode{PCB.ssx}. | |
216 | |
217 \index{PCB}\index{token{\us}number} | |
218 \paragraph{\agcode{int token{\us}number;}} | |
219 \agcode{PCB.token{\us}number} contains the internal \index{Token | |
220 number}\index{Number}token number of the current input token. If you | |
221 are not supplying token numbers directly, it is the result of using | |
222 the actual input character to index the token conversion array, | |
223 \agcode{ag{\us}tcv}. | |
224 | |
225 Your parser automatically maintains the proper value in | |
226 \agcode{PCB.token{\us}number}. Input token numbers should always be | |
227 stored in \agcode{PCB.input{\us}code}. | |
228 | |
229 % XXX ``is a field is the''? | |
230 \index{vs}\index{PCB} | |
231 \paragraph{\agcode{\textit{value-stack-type} vs[];}} | |
232 \agcode{PCB.vs} is a field is the | |
233 \index{Parser value stack}\index{Stack}\index{Value stack}value stack | |
234 for your parser. The semantic values of the tokens identified by the | |
235 parser are stored in the value \index{Stack}\index{Value stack}stack. | |
236 The value stack, like the other parser stacks, is indexed by | |
237 \agcode{PCB.ssx}. | |
238 When your parser is executing a reduction procedure, | |
239 \agcode{PCB.vs[PCB.ssx]} contains the semantic value of the first | |
240 token in the grammar rule you are reducing, \agcode{PCB.vs[PCB.ssx+1]} | |
241 contains the second, and so forth. The return value from your | |
242 reduction procedure will be stored in turn in | |
243 \agcode{PCB.vs[PCB.ssx]}. | |
244 | |
245 \index{{\us}dol{\us}vt} | |
246 \agcode{PCB.vs} is defined to be of type \agcode{\${\us}vt}, where | |
247 ``\agcode{\$}'' represents the name of your syntax file. AnaGram | |
248 defines \agcode{\${\us}vt} so that it is large enough to store the | |
249 semantic value of any of the tokens declared in your grammar. |