Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/admin/todo-large.txt @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children | f9e4689b837d |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 Large todo items | |
2 | |
3 This file describes the bigger and more involved projects or | |
4 undertakings that can or should be done on AnaGram moving forward. | |
5 | |
6 ------------------------------------------------------------ | |
7 USER INTERFACE | |
8 | |
9 - Build a new user interface that's based on a non-legacy toolkit. | |
10 This is the most pressing issue and almost nothing else in this file | |
11 should be undertaken until the new user interface is under control. | |
12 | |
13 - Conflict diagnostics. AG's conflict diagnostics are already far | |
14 better than yacc's; however, they are still opaque to most users and | |
15 could be improved a good deal. There are a number of areas for | |
16 improvement: (1) presentation; we have a GUI, we can and should use it | |
17 to draw diagrams and arrows and whatnot, and not limit ourselves to | |
18 rows of text like AG 1.x had to. We should show a sample input that | |
19 leads to the conflict, and bracket it on the bottom showing the rules | |
20 and reductions that go one way and on the top showing the rules and | |
21 reductions that go the other. (2) common forms; a lot of conflicts | |
22 arise from common mistakes and common issues. We should have a set of | |
23 pattern-matching rules to identify and report these common forms, and | |
24 link to explanations and fix suggestions in the help. | |
25 | |
26 - Better crossreferencing of tables. The right-button popup menu for | |
27 auxiliary windows is fine, but you ought to get something useful by | |
28 default if you click (or double-click?) on things directly. Also, | |
29 there are more useful crossreferences than currently exist. | |
30 | |
31 - Better presentation of tables. To really take advantage of the | |
32 information in the various tables and windows AG provides, you have to | |
33 understand quite a bit about how AG works. This should not be | |
34 necessary. Also, the AG 2.x user interface is a direct conceptual port | |
35 of the text-based AG 1.x user interface. It's still fundamentally | |
36 based on lines of text. This is not necessary and could be improved a | |
37 great deal without undue difficulty. | |
38 | |
39 - Command-line version. agcl should print basic conflict diagnoses as | |
40 well as the warning and error messages, so one doesn't have to fire up | |
41 the GUI every time one makes a silly editing mistake. | |
42 | |
43 - It would be nice if when you opened the Configuration Parameters | |
44 window you could change the settings for the current run. (Has to be | |
45 for the current run, because saving changes is not remotely | |
46 practical.) | |
47 | |
48 - Multiple grammars. In the long run it would be nice to be able to | |
49 have multiple grammars loaded at the same time and be able to shuffle | |
50 tokens between them when using File Trace or Grammar Trace. This would | |
51 allow easier testing and debugging of projects that use multiple | |
52 interacting grammars, or that use AG to implement communication | |
53 protocols. Unfortunately, this is *not* trivial. | |
54 | |
55 | |
56 ------------------------------------------------------------ | |
57 PROGRAMMING INTERFACE | |
58 | |
59 - Language support. There's been talk of Java support for AG for | |
60 quite a while. It would be nice to actually *do* it. Beyond that, | |
61 other languages which would be useful or interesting to support | |
62 include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now | |
63 adding any of these would be nontrivial; however, after the first | |
64 couple adding another should be relatively straightforward. Note also | |
65 that as many of these languages aren't syntactically compatible with | |
66 either C or AnaGram, we will need to design a way to put the syntax | |
67 and the accompanying code in separate files. | |
68 | |
69 - Configuration support for multiple languages. There also needs to | |
70 be a better and more systematic configuration mechanism for choosing | |
71 output language and output language dialect. | |
72 | |
73 - Output name. It's nice for makefiles to be able to know what the | |
74 output files will be named. Unfortunately, there's no good way to do | |
75 this that I can think of, because it depends on the output language. | |
76 If anyone has a brilliant idea, please share it. | |
77 | |
78 - Include files and/or a module system for grammars. Common bits of | |
79 grammar have to be cut and pasted all the time. Some mechanism to | |
80 avoid this would be nice. (Some users have used cpp or m4 to | |
81 preprocess AG input; this works but it's awfully ugly.) | |
82 | |
83 - Reading yacc files. There's been a yacc-to-AnaGram converter around | |
84 for a long time, but it's not fully functional and hasn't ever been | |
85 officially released or supported. (It originally got hung up because | |
86 yacc's input language is defined, at least in some references, in a | |
87 way that makes the grammar LR(2). This is not important now, if it | |
88 ever was; bison, for example, does not accept the offending | |
89 constructs.) It might be better at this point to merge the | |
90 functionality directly into AnaGram so it can read (and build) .y | |
91 files directly. Finding a clean version of y2ag is the first order of | |
92 business. Note: the y2ag.syn in the test suite is mangled in some | |
93 fashion and not a good place to start. | |
94 | |
95 | |
96 ------------------------------------------------------------ | |
97 BUILD | |
98 | |
99 - Dumping UI. To improve the regression testing I'd like to have a | |
100 fake user interface that takes *all* the data normally available in | |
101 the various windows and tables and just dumps it to stdout. | |
102 | |
103 - Parser equivalence. One of the problems with the test suite is that | |
104 sometimes an internal change will, perfectly legitimately, permute | |
105 elements in the various parse tables, causing huge test diffs that | |
106 need to be inspected by hand. It would be nice to figure out how to | |
107 either (1) canonicalize the tables so this doesn't happen, or (2) | |
108 write some kind of munging script that automatically validates and | |
109 suppresses these changes. (Obviously, (1) is better, but I'm not sure | |
110 it's feasible.) | |
111 | |
112 - Figure out how to do coverage checking in the test suite. (This | |
113 includes coverage of AG code, coverage of the code sections from the | |
114 engine definition, and also coverage of config parameters that affect | |
115 code generation.) | |
116 | |
117 | |
118 ------------------------------------------------------------ | |
119 INTERNALS | |
120 | |
121 - Character sets. Right now, AG is limited to 16-bit-wide character | |
122 sets, which is a problem as Unicode is now 24 bits wide. Also, the | |
123 internal handling of case folding and so on is totally inadequate and | |
124 needs a general revamp. And keywords only work with 8-bit character | |
125 sets. | |
126 | |
127 - Lexemes and disregard. I'd like to clean up the implementation of | |
128 disregard whitespace and lexemes. It has a sound theoretical | |
129 foundation, but you wouldn't know it from the existing code. (To be | |
130 fair, the existing code was severely constrained by DOS memory | |
131 issues.) | |
132 | |
133 - LR(k) grammars. Jerry was talking about this before he died; | |
134 unfortunately, I don't think anyone knows exactly what he had in | |
135 mind... | |
136 | |
137 - Regexp keywords. Keyword recognition is already a string search | |
138 process; there's no real reason it couldn't be extended to include | |
139 regular expression matching. The catch of course is how you preserve | |
140 any useful notion of soundness in the grammar. | |
141 | |
142 - Merge duplicate conflicts. A lot of times the same conflict will | |
143 happen with a whole pile of tokens. Right now these are all generated | |
144 and displayed independently; they should really be folded together. | |
145 | |
146 - Fix the code generator. The code generator is an ugly mess and | |
147 needs rationalization. | |
148 | |
149 - Interactive completions. Many command-driven programs nowadays | |
150 allow you to push a key (often tab) to get a list of legal inputs | |
151 based on what's been typed so far. If input is being handled by an AG | |
152 parser, you can in theory run the parser on the input so far, inspect | |
153 the state, and get a list of legal tokens, which you can then use to | |
154 give the user a context-sensitive list of possible things to | |
155 type. This is, however, not something that you can possibly code up | |
156 yourself by interfacing to AG, at least not in any pleasant way, so AG | |
157 ought to provide support. | |
158 | |
159 | |
160 ------------------------------------------------------------ | |
161 MANUAL AND DOCS | |
162 | |
163 - The manual index is in a parlous state. It was badly mauled by | |
164 conversion from WP to LaTeX and needs to be rebuilt with reference to | |
165 the old WP version. | |
166 | |
167 - Merge the various articles of shared text that are found in the | |
168 help, the manual, the miscellaneous documentation, the web page, | |
169 and/or elsewhere too. These should all be maintained from one master | |
170 source. The most urgent instances are "sbb" (the syntactic building | |
171 blocks, found in the examples directory, the misc documentation, and | |
172 the manual) and the Glossary. Note that the text in the help system is | |
173 often slightly different from the corresponding text in the manual on | |
174 purpose, because it plays a different role and has a somewhat | |
175 different narrative context. (However, the various copies of the | |
176 Glossary should all be the same. These include: the manual appendix, | |
177 the glossary web page, the copy in the misc docs, and the one in the | |
178 help.) | |
179 | |
180 - The manual needs a discussion of Unicode, wchar_t, wide characters, | |
181 multibyte characters, and all that. Of course, first we have to | |
182 actually *work* with such constructs... (All four listed terms need to | |
183 appear in the index, too.) | |
184 | |
185 - Somewhere in the manual there should be a section to tell you what | |
186 to do for interactive use. The most important issue, if each input | |
187 line isn't a complete parse, is how to make sure it won't try to look | |
188 ahead past the end of a line when the user expects the line to be | |
189 complete and the program to do something. | |
190 | |
191 - Somewhere in the manual there should be a section to tell you what | |
192 to do for a parser that runs indefinitely and only exits at program | |
193 shutdown, like you might use to handle a network protocol in a daemon. | |
194 | |
195 - It would be nice to have a section or appendix that describes | |
196 common sources of conflicts and what to do about them. Such constructs | |
197 include: | |
198 - if-then-else (text already exists) | |
199 - repeated repetitions (a -> b..., b -> c...) | |
200 - poisoning lookahead by bad nesting of productions | |
201 (I had an example involving trying to distinguish whitespace | |
202 and comments at too high a level) | |
203 - sequences of nondelimited tokens (a -> identifier...) | |
204 - generalize the previous to the full lexer model | |
205 - ... | |
206 | |
207 | |
208 ------------------------------------------------------------ | |
209 EXAMPLES | |
210 | |
211 - Update the examples to use agclib1 (as seen in XIDEK) instead of | |
212 oldclasslib. | |
213 |