comparison doc/admin/todo-large.txt @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children f9e4689b837d
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 Large todo items
2
3 This file describes the bigger and more involved projects or
4 undertakings that can or should be done on AnaGram moving forward.
5
6 ------------------------------------------------------------
7 USER INTERFACE
8
9 - Build a new user interface that's based on a non-legacy toolkit.
10 This is the most pressing issue and almost nothing else in this file
11 should be undertaken until the new user interface is under control.
12
13 - Conflict diagnostics. AG's conflict diagnostics are already far
14 better than yacc's; however, they are still opaque to most users and
15 could be improved a good deal. There are a number of areas for
16 improvement: (1) presentation; we have a GUI, we can and should use it
17 to draw diagrams and arrows and whatnot, and not limit ourselves to
18 rows of text like AG 1.x had to. We should show a sample input that
19 leads to the conflict, and bracket it on the bottom showing the rules
20 and reductions that go one way and on the top showing the rules and
21 reductions that go the other. (2) common forms; a lot of conflicts
22 arise from common mistakes and common issues. We should have a set of
23 pattern-matching rules to identify and report these common forms, and
24 link to explanations and fix suggestions in the help.
25
26 - Better crossreferencing of tables. The right-button popup menu for
27 auxiliary windows is fine, but you ought to get something useful by
28 default if you click (or double-click?) on things directly. Also,
29 there are more useful crossreferences than currently exist.
30
31 - Better presentation of tables. To really take advantage of the
32 information in the various tables and windows AG provides, you have to
33 understand quite a bit about how AG works. This should not be
34 necessary. Also, the AG 2.x user interface is a direct conceptual port
35 of the text-based AG 1.x user interface. It's still fundamentally
36 based on lines of text. This is not necessary and could be improved a
37 great deal without undue difficulty.
38
39 - Command-line version. agcl should print basic conflict diagnoses as
40 well as the warning and error messages, so one doesn't have to fire up
41 the GUI every time one makes a silly editing mistake.
42
43 - It would be nice if when you opened the Configuration Parameters
44 window you could change the settings for the current run. (Has to be
45 for the current run, because saving changes is not remotely
46 practical.)
47
48 - Multiple grammars. In the long run it would be nice to be able to
49 have multiple grammars loaded at the same time and be able to shuffle
50 tokens between them when using File Trace or Grammar Trace. This would
51 allow easier testing and debugging of projects that use multiple
52 interacting grammars, or that use AG to implement communication
53 protocols. Unfortunately, this is *not* trivial.
54
55
56 ------------------------------------------------------------
57 PROGRAMMING INTERFACE
58
59 - Language support. There's been talk of Java support for AG for
60 quite a while. It would be nice to actually *do* it. Beyond that,
61 other languages which would be useful or interesting to support
62 include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now
63 adding any of these would be nontrivial; however, after the first
64 couple adding another should be relatively straightforward. Note also
65 that as many of these languages aren't syntactically compatible with
66 either C or AnaGram, we will need to design a way to put the syntax
67 and the accompanying code in separate files.
68
69 - Configuration support for multiple languages. There also needs to
70 be a better and more systematic configuration mechanism for choosing
71 output language and output language dialect.
72
73 - Output name. It's nice for makefiles to be able to know what the
74 output files will be named. Unfortunately, there's no good way to do
75 this that I can think of, because it depends on the output language.
76 If anyone has a brilliant idea, please share it.
77
78 - Include files and/or a module system for grammars. Common bits of
79 grammar have to be cut and pasted all the time. Some mechanism to
80 avoid this would be nice. (Some users have used cpp or m4 to
81 preprocess AG input; this works but it's awfully ugly.)
82
83 - Reading yacc files. There's been a yacc-to-AnaGram converter around
84 for a long time, but it's not fully functional and hasn't ever been
85 officially released or supported. (It originally got hung up because
86 yacc's input language is defined, at least in some references, in a
87 way that makes the grammar LR(2). This is not important now, if it
88 ever was; bison, for example, does not accept the offending
89 constructs.) It might be better at this point to merge the
90 functionality directly into AnaGram so it can read (and build) .y
91 files directly. Finding a clean version of y2ag is the first order of
92 business. Note: the y2ag.syn in the test suite is mangled in some
93 fashion and not a good place to start.
94
95
96 ------------------------------------------------------------
97 BUILD
98
99 - Dumping UI. To improve the regression testing I'd like to have a
100 fake user interface that takes *all* the data normally available in
101 the various windows and tables and just dumps it to stdout.
102
103 - Parser equivalence. One of the problems with the test suite is that
104 sometimes an internal change will, perfectly legitimately, permute
105 elements in the various parse tables, causing huge test diffs that
106 need to be inspected by hand. It would be nice to figure out how to
107 either (1) canonicalize the tables so this doesn't happen, or (2)
108 write some kind of munging script that automatically validates and
109 suppresses these changes. (Obviously, (1) is better, but I'm not sure
110 it's feasible.)
111
112 - Figure out how to do coverage checking in the test suite. (This
113 includes coverage of AG code, coverage of the code sections from the
114 engine definition, and also coverage of config parameters that affect
115 code generation.)
116
117
118 ------------------------------------------------------------
119 INTERNALS
120
121 - Character sets. Right now, AG is limited to 16-bit-wide character
122 sets, which is a problem as Unicode is now 24 bits wide. Also, the
123 internal handling of case folding and so on is totally inadequate and
124 needs a general revamp. And keywords only work with 8-bit character
125 sets.
126
127 - Lexemes and disregard. I'd like to clean up the implementation of
128 disregard whitespace and lexemes. It has a sound theoretical
129 foundation, but you wouldn't know it from the existing code. (To be
130 fair, the existing code was severely constrained by DOS memory
131 issues.)
132
133 - LR(k) grammars. Jerry was talking about this before he died;
134 unfortunately, I don't think anyone knows exactly what he had in
135 mind...
136
137 - Regexp keywords. Keyword recognition is already a string search
138 process; there's no real reason it couldn't be extended to include
139 regular expression matching. The catch of course is how you preserve
140 any useful notion of soundness in the grammar.
141
142 - Merge duplicate conflicts. A lot of times the same conflict will
143 happen with a whole pile of tokens. Right now these are all generated
144 and displayed independently; they should really be folded together.
145
146 - Fix the code generator. The code generator is an ugly mess and
147 needs rationalization.
148
149 - Interactive completions. Many command-driven programs nowadays
150 allow you to push a key (often tab) to get a list of legal inputs
151 based on what's been typed so far. If input is being handled by an AG
152 parser, you can in theory run the parser on the input so far, inspect
153 the state, and get a list of legal tokens, which you can then use to
154 give the user a context-sensitive list of possible things to
155 type. This is, however, not something that you can possibly code up
156 yourself by interfacing to AG, at least not in any pleasant way, so AG
157 ought to provide support.
158
159
160 ------------------------------------------------------------
161 MANUAL AND DOCS
162
163 - The manual index is in a parlous state. It was badly mauled by
164 conversion from WP to LaTeX and needs to be rebuilt with reference to
165 the old WP version.
166
167 - Merge the various articles of shared text that are found in the
168 help, the manual, the miscellaneous documentation, the web page,
169 and/or elsewhere too. These should all be maintained from one master
170 source. The most urgent instances are "sbb" (the syntactic building
171 blocks, found in the examples directory, the misc documentation, and
172 the manual) and the Glossary. Note that the text in the help system is
173 often slightly different from the corresponding text in the manual on
174 purpose, because it plays a different role and has a somewhat
175 different narrative context. (However, the various copies of the
176 Glossary should all be the same. These include: the manual appendix,
177 the glossary web page, the copy in the misc docs, and the one in the
178 help.)
179
180 - The manual needs a discussion of Unicode, wchar_t, wide characters,
181 multibyte characters, and all that. Of course, first we have to
182 actually *work* with such constructs... (All four listed terms need to
183 appear in the index, too.)
184
185 - Somewhere in the manual there should be a section to tell you what
186 to do for interactive use. The most important issue, if each input
187 line isn't a complete parse, is how to make sure it won't try to look
188 ahead past the end of a line when the user expects the line to be
189 complete and the program to do something.
190
191 - Somewhere in the manual there should be a section to tell you what
192 to do for a parser that runs indefinitely and only exits at program
193 shutdown, like you might use to handle a network protocol in a daemon.
194
195 - It would be nice to have a section or appendix that describes
196 common sources of conflicts and what to do about them. Such constructs
197 include:
198 - if-then-else (text already exists)
199 - repeated repetitions (a -> b..., b -> c...)
200 - poisoning lookahead by bad nesting of productions
201 (I had an example involving trying to distinguish whitespace
202 and comments at too high a level)
203 - sequences of nondelimited tokens (a -> identifier...)
204 - generalize the previous to the full lexer model
205 - ...
206
207
208 ------------------------------------------------------------
209 EXAMPLES
210
211 - Update the examples to use agclib1 (as seen in XIDEK) instead of
212 oldclasslib.
213