Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/misc/html/examples/sbb-doc.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> | |
2 <HTML> | |
3 <HEAD> | |
4 <TITLE>Syntactic Building Blocks</TITLE> | |
5 </HEAD> | |
6 | |
7 | |
8 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" | |
9 TEXT="#000000" LINK="#0033CC" | |
10 VLINK="#CC0033" ALINK="#CC0099"> | |
11 | |
12 <P> | |
13 | |
14 <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" | |
15 WIDTH=124 HEIGHT=30 > | |
16 <BR CLEAR="all"> | |
17 Back to <A HREF="../index.html">Index</A> | |
18 <P> | |
19 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
20 WIDTH=1010 HEIGHT=2 > | |
21 | |
22 | |
23 <BR CLEAR="all"> | |
24 | |
25 <H1>Syntactic Building Blocks</H1> | |
26 | |
27 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
28 WIDTH=1010 HEIGHT=2 > | |
29 <P> | |
30 <BR> | |
31 | |
32 <H2>Introduction</H2> | |
33 | |
34 <tt>examples/sbb-doc.txt</tt> contains examples of the | |
35 productions necessary to | |
36 parse the more common syntactic elements in your grammar. It | |
37 should be of interest once you have begun to write your own | |
38 grammars. Each example is provided with sample reduction | |
39 procedures, which are adequate in the vast majority of cases. | |
40 You can use these productions like building blocks to quickly | |
41 assemble quite powerful grammars. | |
42 <P> | |
43 You are reading an HTML version of <tt>sbb-doc.txt</tt>. The | |
44 actual text file | |
45 should be easier to copy sections from. The text of this file is | |
46 also to be found in Appendix D of the AnaGram User's Guide. | |
47 <P> | |
48 <BR> | |
49 | |
50 <H2>End of File</H2> | |
51 | |
52 If your parser is going to use | |
53 stream I/O to read its input, you can define end of file as | |
54 minus one: | |
55 <PRE> eof = -1 </PRE> | |
56 | |
57 If your parser is going to run under Windows/DOS and will use low | |
58 level I/O instead of stream I/O you might | |
59 define <STRONG>eof</STRONG> as | |
60 Control Z: | |
61 <PRE> eof = ^Z </PRE> | |
62 <P> | |
63 If your parser is going to run under UNIX and will use low | |
64 level I/O instead of stream I/O you might define | |
65 <STRONG>eof</STRONG> as | |
66 Control D: | |
67 <PRE> eof = ^D </PRE> | |
68 <P> | |
69 If your parser is simply going to parse a string in memory, | |
70 the definition of end of file should be a null character: | |
71 <PRE> eof = 0 </PRE> | |
72 <P> | |
73 It is often convenient, however, to simply define end of | |
74 file so it will work in all of these contexts: | |
75 <PRE> eof = -1 + 0 + ^D + ^Z </PRE> | |
76 <P> | |
77 <BR> | |
78 | |
79 <H2>White Space</H2> | |
80 | |
81 It is convenient to have a representation for white space in | |
82 your grammar. Usually you do not wish to distinguish between | |
83 space characters and tab characters, so you can write: | |
84 <PRE> blank = ' ' + '\t' </PRE> | |
85 <P> | |
86 Using this definition, you can represent required white | |
87 space of indeterminate length with | |
88 <PRE> blank... </PRE> and optional | |
89 white space with | |
90 <PRE> blank?... </PRE> | |
91 <P> | |
92 It is common, however, to include comments (see below) as | |
93 white space. In this case, you can define the following | |
94 productions: | |
95 <PRE> ws | |
96 ->: blank | |
97 -> comment | |
98 </PRE> | |
99 <P> | |
100 <BR> | |
101 | |
102 <H2>End of Line</H2> | |
103 | |
104 Because different systems use different representations for | |
105 end of line, it is wise to use an abstract end of line token | |
106 in your grammar, and define the token separately. If your | |
107 parser is going to use files that are known to use carriage | |
108 return alone or line feed alone as the end of line delimiter | |
109 you can use one of the following definitions: | |
110 <PRE> eol = '\r' //carriage return only | |
111 eol = '\n' //line feed only | |
112 </PRE> | |
113 If your input files use the newline character as a line | |
114 terminator, but you wish to allow for optional carriage | |
115 returns, you might write: | |
116 <PRE> | |
117 eol | |
118 -> '\r'?, '\n' | |
119 </PRE> | |
120 or even | |
121 <PRE> | |
122 eol | |
123 -> '\r'?..., '\n' | |
124 </PRE> | |
125 <P> | |
126 If you wish to make a grammar that can deal with a file of | |
127 more or less arbitrary provenance, some caution is in order. | |
128 If you allow carriage return and line feed both to function | |
129 as end of line characters and you allow blank lines, you may | |
130 wind up with a very ambiguous grammar. For example, suppose | |
131 you use <STRONG>eol...</STRONG> somewhere in your grammar and you have | |
132 defined <STRONG>eol</STRONG> thus: | |
133 <PRE> | |
134 eol | |
135 -> '\r' | |
136 -> '\n' | |
137 -> '\r', '\n' // trouble! | |
138 </PRE> | |
139 Your grammar is ambiguous since a carriage return, line feed | |
140 pair can be considered two line endings, according to the | |
141 first two production, or just one, according to the third. | |
142 <P> | |
143 If you really need to allow isolated carriage returns and | |
144 linefeeds to mark ends of line, but need to deal with the | |
145 pair as well, you can make good use of the idiosyncracies of | |
146 AnaGram's keyword logic by writing the following: | |
147 <PRE> | |
148 eol | |
149 -> '\r' | |
150 -> '\n' | |
151 -> "\r\n" | |
152 </PRE> | |
153 This will treat a carriage return followed by a line feed as | |
154 a single end of line. | |
155 <P> | |
156 <BR> | |
157 | |
158 <H2> Comments</H2> | |
159 | |
160 There are two different categories of comments in common | |
161 use: Those that run to the end of line, and those that run | |
162 to a specific terminator. The first can be conveniently | |
163 incorporated into your end of line token as a virtual | |
164 production: | |
165 <PRE> | |
166 eol | |
167 -> ["//", ~(eof + '\n')?...], '\n' | |
168 </PRE> | |
169 Thus, <STRONG>eol</STRONG> allows for an optional comment at the end of every | |
170 line. Note that you never want to allow end of file | |
171 characters in your comments. | |
172 <P> | |
173 Conventional C comments are slightly more complicated, | |
174 depending on your treatment of nesting. If you are not | |
175 interested in nesting comments, you can simply use the | |
176 following simple syntax: | |
177 <PRE> | |
178 comment | |
179 -> "/*", ~eof?..., "*/" | |
180 </PRE> | |
181 Note that this production works because keywords take | |
182 precedence over ordinary character input. | |
183 <P> | |
184 A somewhat more complex way to write this is useful: | |
185 <PRE> | |
186 comment | |
187 -> comment head, "*/" | |
188 -> comment head, comment | |
189 comment head | |
190 -> "/*" | |
191 -> comment head, ~eof | |
192 </PRE> | |
193 Note that where the previous grammar simply ignored the | |
194 beginning of any nested comment, this grammar recognizes a | |
195 nested comment, but then deliberately chooses to ignore | |
196 nesting. | |
197 <P> | |
198 If you want your comments to nest, you can easily rewrite | |
199 this grammar to allow nesting: | |
200 <PRE> | |
201 comment | |
202 -> comment head, "*/" | |
203 comment head | |
204 -> "/*" | |
205 -> comment head, ~eof | |
206 -> comment head, comment | |
207 </PRE> | |
208 Note that the only difference between nested and non-nested | |
209 comments is whether the grammar rule | |
210 <STRONG>comment head, comment</STRONG> | |
211 reduces to <STRONG>comment head</STRONG> or | |
212 to <STRONG>comment</STRONG>. | |
213 <P> | |
214 If you wish to defer the question of nesting to run-time, | |
215 you can use a semantically determined production to make the | |
216 decision: | |
217 <PRE> | |
218 comment | |
219 -> comment head, "*/" | |
220 comment head | |
221 -> "/*" | |
222 -> comment head, ~eof | |
223 comment, comment head | |
224 -> comment head, comment =check_nesting(); | |
225 </PRE> | |
226 <P> | |
227 Although the final production in this grammar has a somewhat | |
228 inscrutable air about it, a little study will show that if | |
229 <STRONG>check_nesting</STRONG> sets the reduction | |
230 token to <STRONG>comment</STRONG>, the | |
231 effective grammar is the same as the grammar above for | |
232 non-nested comments. On the other hand, | |
233 if <STRONG>check_nesting</STRONG> | |
234 sets the reduction token | |
235 to <STRONG>comment head</STRONG>, the effective | |
236 grammar is the same as the grammar for nested comments. | |
237 <P> | |
238 Assuming you have a switch called | |
239 <STRONG>nest_comments</STRONG>, you could | |
240 write <STRONG>check_nesting</STRONG> as follows: | |
241 <PRE> | |
242 check_nesting() { | |
243 if (nest_comments) | |
244 CHANGE_REDUCTION(comment_head); | |
245 else | |
246 CHANGE_REDUCTION(comment); | |
247 } | |
248 </PRE> | |
249 The else clause in this procedure is technically unnecessary | |
250 since comment is the default value of the reduction token. | |
251 <P> | |
252 If you wish to skip white space in your input, and wish to | |
253 consider comments as simple white space, you might want to | |
254 add the following production to your grammar: | |
255 <PRE> | |
256 ws | |
257 -> blank | |
258 -> comment | |
259 </PRE> | |
260 You can then use the <STRONG>disregard</STRONG> statement | |
261 to ignore <STRONG>ws</STRONG> | |
262 appropriately. | |
263 <P> | |
264 <BR> | |
265 | |
266 <H2>Integers</H2> | |
267 | |
268 It is quite easy to convert ordinary integers to binary | |
269 form. For instance: | |
270 <PRE> | |
271 (int) decimal integer | |
272 -> '0-9':d =d-'0'; | |
273 -> decimal integer:n, '0-9':d =10*n + d - '0'; | |
274 </PRE> | |
275 If necessary, you can specify that the value of <STRONG>decimal | |
276 integer</STRONG> be maintained as a long: | |
277 <PRE> | |
278 (long) decimal integer | |
279 -> '0-9':d =d-'0'; | |
280 -> decimal integer:n, '0-9':d =10*n + d - '0'; | |
281 </PRE> | |
282 Other representations are equally simple: | |
283 <PRE> | |
284 (long) octal integer | |
285 -> '0' =0; | |
286 -> octal integer:n, '0-7':d =8*n + d - '0'; | |
287 | |
288 (long) hex digit | |
289 -> '0-9':d =d-'0'; | |
290 -> 'A-F' + 'a-f':d =9 + (d & 7); | |
291 | |
292 (long) hex integer | |
293 -> {"0x" | "0X"}, hex digit:d =d; | |
294 -> hex integer:n, hex digit:d =16*n + d; | |
295 </PRE> | |
296 Note that if you use the C convention for octal integers, | |
297 you have to redefine <STRONG>decimal integer</STRONG> | |
298 to avoid ambiguity: | |
299 <PRE> | |
300 (long) decimal integer | |
301 -> '1-9':d =d-'0'; | |
302 -> decimal integer:n, '0-9':d =10*n + d - '0'; | |
303 </PRE> | |
304 You can then represent the general case as follows: | |
305 <PRE> | |
306 (long) integer | |
307 -> decimal integer | octal integer | hex integer | |
308 </PRE> | |
309 You can allow for a signed integer with the following | |
310 productions: | |
311 <PRE> | |
312 (long) signed integer | |
313 -> '+'?, integer:n =n; | |
314 -> '-', integer:n =-n; | |
315 </PRE> | |
316 If you have included a <STRONG>disregard</STRONG> | |
317 statement in your grammar | |
318 to skip over irrelevant white space in your input, you might | |
319 add the following to avoid skipping white space inside a | |
320 number: | |
321 <PRE> | |
322 [ lexeme {integer}] | |
323 </PRE> | |
324 Note that if you were to declare <STRONG>signed | |
325 integer</STRONG> as a lexeme, | |
326 your parser would not allow space between a leading sign and | |
327 the integer. | |
328 <P> | |
329 <BR> | |
330 | |
331 <H2>Floating Point Numbers</H2> | |
332 | |
333 Floating point numbers are somewhat more complex than | |
334 integers, but not significantly so: | |
335 <PRE> | |
336 (double) real | |
337 -> simple real | |
338 -> integer part:r, exponent field:x =r*pow10(x); | |
339 -> simple real:r, exponent field:x =r*pow10(x); | |
340 | |
341 | |
342 (double) simple real | |
343 -> integer part:i, '.', fraction part:f =i+f; | |
344 -> integer part, '.' | |
345 -> '.', fraction part:f =f; | |
346 | |
347 (double) integer part | |
348 -> '0-9':d =d-'0'; | |
349 -> integer part:n, '0-9':d =10*n + d-'0'; | |
350 | |
351 (double) fraction part | |
352 -> '0-9':d =(d-'0')/10.; | |
353 -> '0-9':d, fraction part:f =(d-'0'+f)/10.; | |
354 | |
355 exponent field | |
356 -> 'e' + 'E', signed exponent:x =x; | |
357 | |
358 signed exponent | |
359 -> '+'?, exponent:n =n; | |
360 -> '-', exponent:n =-n; | |
361 | |
362 exponent | |
363 -> '0-9':d =d-'0'; | |
364 -> exponent:n, '0-9':d =10*n + d - '0'; | |
365 </PRE> | |
366 Note that <STRONG>fraction part</STRONG> uses | |
367 right recursion rather than | |
368 left recursion. <STRONG>exponent</STRONG> is | |
369 effectively the same as <STRONG>decimal | |
370 integer</STRONG>, above, but allows for initial zeroes. | |
371 <P> | |
372 The situation becomes somewhat more complex if you wish to | |
373 allow both integer and floating point forms, particularly if | |
374 you wish to follow the C convention for octal integers. | |
375 First, you cannot have distinct productions | |
376 for <STRONG>integer part</STRONG> | |
377 and <STRONG>decimal integer</STRONG>, since | |
378 there is no way to distinguish | |
379 them until a decimal point or an exponent field is | |
380 encountered. Second, since 0129.3 looks for all the world | |
381 like an octal number until the '9' is encountered, one | |
382 either has to postpone all conversion calculations until the | |
383 issue is resolved or resort to trickery. Here is a way to | |
384 resolve the problem by redefining <STRONG>integer part</STRONG>: | |
385 <PRE> | |
386 (double) integer part | |
387 -> confusion | |
388 -> octal integer:n =octal2decimal(n); | |
389 -> decimal integer:n =n; | |
390 | |
391 (double) confusion | |
392 -> octal integer:n, '8-9':d =10*octal2decimal(n)+d-'0'; | |
393 -> confusion:x, '0-9':d =10*x + d-'0'; | |
394 </PRE> | |
395 where <STRONG>octal2decimal</STRONG> is defined thus: | |
396 <PRE> | |
397 double octal2decimal(int n) { | |
398 if (n) return 10*octal2decimal(n/8) + n%8; | |
399 else return 0; | |
400 } | |
401 </PRE> | |
402 Here <STRONG>confusion</STRONG> represents | |
403 a number that starts off looking | |
404 like an octal integer, but then turns into a decimal number, | |
405 because an eight, a nine, a decimal pointer or an exponent | |
406 field is encountered. When this occurs, the function | |
407 <STRONG>octal2decimal</STRONG> undoes the octal | |
408 conversion that had already been | |
409 done, and redoes the conversion as decimal conversion. | |
410 <P> | |
411 If you have included a <STRONG>disregard</STRONG> | |
412 statement in your grammar | |
413 to skip over irrelevant white space in your input, you might | |
414 add the following to avoid skipping white space inside a | |
415 real: | |
416 <PRE> | |
417 [ lexeme {real}] | |
418 </PRE> | |
419 <BR> | |
420 | |
421 <H2>Names</H2> | |
422 | |
423 In almost all grammars, it is necessary to identify names. | |
424 To accumulate the characters that make up the name it is | |
425 convenient to use a stack. The reduction procedures in this | |
426 example use the functions <STRONG>ipn</STRONG> and | |
427 <STRONG>pcn</STRONG> to accumulate the | |
428 characters. <STRONG>ipn</STRONG> initializes storage | |
429 for the name and stores | |
430 the first character. <STRONG>pcn</STRONG> adds a single | |
431 character to the | |
432 name. This grammar presumes the existence of a function | |
433 called <STRONG>identify_name</STRONG> which would | |
434 look up the name in a | |
435 symbol table and return an identifying index. | |
436 <PRE> | |
437 letter = 'a-z' + 'A-Z' + '_' | |
438 digit = '0-9' | |
439 (int) name | |
440 -> name string =identify_name(); | |
441 | |
442 name string | |
443 -> letter:c =ins(c); | |
444 -> name string, letter+digit:c =pcn(c); | |
445 | |
446 {/* embedded C to accumulate name */ | |
447 char name[82]; | |
448 int name_length; | |
449 void ipn(int c) { /* Init and Put char to Name */ | |
450 name[0] = c; | |
451 name_length = 1; | |
452 } | |
453 void pcn(int c) { /* Put Char to Name */ | |
454 assert(name_length < 82); | |
455 name[name_length++] = c; | |
456 } | |
457 } // End of embedded C | |
458 </PRE> | |
459 <P> | |
460 If you have included a | |
461 <STRONG>disregard</STRONG> statement in your grammar | |
462 to skip over irrelevant white space in your input, you might | |
463 add the following to avoid skipping white space inside a | |
464 name: | |
465 <PRE> | |
466 [ lexeme {name}] | |
467 </PRE> | |
468 | |
469 <BR> | |
470 | |
471 <H2>Names with Embedded White Space</H2> | |
472 | |
473 It is sometimes convenient to allow symbol names to contain | |
474 embedded white space. This is easily done, although it | |
475 requires a bit of a trick: | |
476 <PRE> | |
477 name | |
478 -> name string, ws?... =identify_name(); | |
479 | |
480 name string | |
481 -> letter:c =ipn(c); | |
482 -> name string, letter+digit:c =pcn(c); | |
483 -> name string, ws..., letter+digit:c =pcn(' '), pcn(c); | |
484 </PRE> | |
485 Note that the last production reduces all contiguous white | |
486 space within a name to a single blank. | |
487 <P> | |
488 Allowing optional blanks following name string is important. | |
489 If you don't allow them there, any <STRONG>ws</STRONG> | |
490 following a name will | |
491 be presumed to be embedded <STRONG>ws</STRONG>, | |
492 requiring another letter or | |
493 digit, which is not what you intend. There are two ways to | |
494 accomplish this. The first, shown above, explicitly allows | |
495 for optional white space following name string. The second | |
496 method is to use the <STRONG>disregard</STRONG> | |
497 and <STRONG>lexeme</STRONG> statements: | |
498 <PRE> | |
499 [ | |
500 disregard ws | |
501 lexeme {name} | |
502 ] | |
503 </PRE> | |
504 If you use the <STRONG>disregard</STRONG> statement you should not include a | |
505 provision for white space in the production for name. Just | |
506 leave it as it was in the previous example. | |
507 <P> | |
508 <BR> | |
509 | |
510 <H2>Character Strings</H2> | |
511 | |
512 Character strings are often required. The simplest way to | |
513 implement character strings is as follows: | |
514 <PRE> | |
515 character string | |
516 -> '"', ~(eof + '"')?..., '"' | |
517 </PRE> | |
518 This approach has the disadvantage that it makes no | |
519 provision for nonprinting characters. | |
520 <P> | |
521 There are numerous ways to provide for nonprinting | |
522 characters in your character strings. However, you can avoid | |
523 tedious documentation by using the same rules for | |
524 nonprinting characters that C uses. Unfortunately, the C | |
525 rules for octal and hexadecimal escape sequences complicate | |
526 the writing of the grammar quite substantially. For example, | |
527 if you wish to write a string that consists of ascii code 11 | |
528 followed by the ascii digit '2', you must pad with a leading | |
529 zero, writing "\0132", since "\132" according to the rules | |
530 is a single three digit octal escape sequence designating | |
531 ascii code 90. The problem is that the rules allow for one, | |
532 two or three digit octal escape sequences, but sequences | |
533 shorter than three digits have to be followed by the end of | |
534 the string or a character that is not an ascii digit. There | |
535 is a similar, if not worse, problem with hex escape | |
536 sequences. There is no limit on the length of a hex escape | |
537 sequence, so there is no possible way to make the character | |
538 '2' follow a hex escape sequence without using another | |
539 escape sequence. | |
540 <P> | |
541 A straightforward approach to writing a grammar for strings | |
542 consistent with the C conventions yields a number of | |
543 conflicts which correspond exactly to the problems discussed | |
544 above. While it is certainly possible to write a grammar for | |
545 strings that has no conflicts, it is easier to proceed in a | |
546 straightforward manner and use a <STRONG>sticky</STRONG> | |
547 declaration to | |
548 resolve the ambiguities. | |
549 <P> | |
550 Here is the complete grammar for a character string in | |
551 accordance with the C rules. In order to accumulate the | |
552 contents of the string, this grammar uses the functions | |
553 <STRONG>ics</STRONG> and <STRONG>acs</STRONG>, | |
554 to initialize storage for a character string and to | |
555 append a character to that string respectively. | |
556 <PRE> | |
557 character string | |
558 -> string chars, '"' | |
559 | |
560 string chars | |
561 -> '"' =ics(); | |
562 -> string chars, string char:c =acs(c); | |
563 | |
564 string char | |
565 -> simple string char | |
566 -> escape sequence | |
567 | |
568 simple string char = ~eof - ('"' + '\\' + '\n') | |
569 | |
570 (int) escape sequence | |
571 -> "\\a" ='\a'; | |
572 -> "\\b" ='\b'; | |
573 -> "\\f" ='\f'; | |
574 -> "\\n" ='\n'; | |
575 -> "\\r" ='\r'; | |
576 -> "\\t" ='\t'; | |
577 -> "\\v" ='\v'; | |
578 -> "\\\\" ='\\'; | |
579 -> "\\?" = '\?'; | |
580 -> "\\'" ='\''; | |
581 -> "\\\"" ='"'; | |
582 -> octal escape | |
583 -> hex escape | |
584 | |
585 (int) octal escape | |
586 -> one octal | two octal | three octal | |
587 | |
588 (int) one octal | |
589 -> '\\', '0-7':d =d-'0'; | |
590 | |
591 (int) two octal | |
592 -> one octal:n, '0-7':d =8*n + d-'0'; | |
593 | |
594 (int) three octal | |
595 -> two octal:n, '0-7':d =8*n + d-'0'; | |
596 | |
597 (int) hex escape | |
598 -> "\\x", hex number | |
599 | |
600 (int) hex number | |
601 -> hex digit | |
602 -> hex number:n, hex digit:d =16*n + d; | |
603 | |
604 [ | |
605 sticky one octal, two octal, hex number | |
606 ] | |
607 {/* embedded C to define ics and acs */ | |
608 char string_store[82]; | |
609 int length; | |
610 void ics(void) { | |
611 length = 0; | |
612 } | |
613 void acs(int c) { | |
614 assert(length < 82); | |
615 string_store[length++] = c; | |
616 } | |
617 </PRE> | |
618 <P> | |
619 If you have included a <STRONG>disregard</STRONG> | |
620 statement in your grammar | |
621 to skip over irrelevant white space in your input, you might | |
622 add the following to avoid skipping white space inside a | |
623 name: | |
624 <PRE> | |
625 [ lexeme {character string}] | |
626 </PRE> | |
627 | |
628 <BR> | |
629 | |
630 <H2>Character Constants</H2> | |
631 | |
632 It is almost trivial to extend the above syntax for | |
633 character strings to allow simple character constants: | |
634 <PRE> | |
635 (int) character constant | |
636 -> '\'', simple char:c, '\'' =c; | |
637 | |
638 (int) simple char | |
639 -> ~eof - ('\'' + '\\' + '\n') | |
640 -> escape sequence | |
641 </PRE> | |
642 The value of the character constant token is the character | |
643 code found in the input stream. | |
644 <P> | |
645 If you have included a <STRONG>disregard</STRONG> | |
646 statement in your grammar | |
647 to skip over irrelevant white space in your input, you might | |
648 add the following to avoid skipping white space inside a | |
649 character constant: | |
650 <PRE> | |
651 [ lexeme {character constant}] | |
652 </PRE> | |
653 | |
654 <BR> | |
655 | |
656 <H2>Simple Expressions</H2> | |
657 | |
658 It is often convenient to allow for simple expressions in | |
659 your input. The syntax for expression logic is written in a | |
660 conventional way, using a separate token for each precedence | |
661 level desired. The following grammar will accept simple | |
662 addition, subtraction, multiplication and division of | |
663 floating point numbers: | |
664 <PRE> | |
665 (double) expression | |
666 -> term | |
667 -> expression:x, '+', term:t =x+t; | |
668 -> expression:x, '-', term:t =x-t; | |
669 | |
670 (double) term | |
671 -> factor | |
672 -> term:t, '*', factor:f =t*f; | |
673 -> term:t, '/', factor:f =t/f; | |
674 | |
675 (double) factor | |
676 -> real | |
677 -> '(', expression:x, ')' =x; | |
678 -> '-', factor:f =-f; | |
679 </PRE> | |
680 <P> | |
681 An alternative way to write expression syntax is to write an | |
682 ambiguous grammar and use precedence declarations to resolve | |
683 the conflicts. This results in a slightly more compact and | |
684 faster parser: | |
685 <PRE> | |
686 (double) expression | |
687 -> expression:x, '+', expression:y =x+y; | |
688 -> expression:x, '-', expression:y =x-y; | |
689 -> expression:x, '*', expression:y =x*y; | |
690 -> expression:x, '/', expression:y =x/y; | |
691 -> unary minus, expression:x =-x; | |
692 -> '(', expression:x, ')' =x; | |
693 -> real | |
694 | |
695 unary minus = '-' | |
696 | |
697 [ | |
698 left '+', '-' | |
699 left '*', '/' | |
700 right unary minus | |
701 ] | |
702 </PRE> | |
703 Note that we deal with the different precedence levels of | |
704 unary and binary minus by defining unary minus, which | |
705 AnaGram treats as distinct from the simple '-', and giving | |
706 it different precedence. | |
707 | |
708 <P> | |
709 | |
710 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
711 WIDTH=1010 HEIGHT=2 > | |
712 <P> | |
713 <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" | |
714 WIDTH=181 HEIGHT=25> | |
715 <BR CLEAR="right"> | |
716 <P> | |
717 Back to <A HREF="../index.html">Index</A> | |
718 <P> | |
719 <ADDRESS><FONT SIZE="-1"> | |
720 AnaGram parser generator - examples<BR> | |
721 Syntactic Building Blocks<BR> | |
722 Copyright © 1993-1999, Parsifal Software. <BR> | |
723 All Rights Reserved.<BR> | |
724 </FONT></ADDRESS> | |
725 | |
726 </BODY> | |
727 </HTML> |