Mercurial > ~dholland > hg > ag > index.cgi
view doc/admin/todo-large.txt @ 12:aab9ff6af791
Strengthen the build hack for non-DOS targets.
author | David A. Holland |
---|---|
date | Tue, 31 May 2022 00:58:42 -0400 |
parents | 13d2b8934445 |
children | f9e4689b837d |
line wrap: on
line source
Large todo items This file describes the bigger and more involved projects or undertakings that can or should be done on AnaGram moving forward. ------------------------------------------------------------ USER INTERFACE - Build a new user interface that's based on a non-legacy toolkit. This is the most pressing issue and almost nothing else in this file should be undertaken until the new user interface is under control. - Conflict diagnostics. AG's conflict diagnostics are already far better than yacc's; however, they are still opaque to most users and could be improved a good deal. There are a number of areas for improvement: (1) presentation; we have a GUI, we can and should use it to draw diagrams and arrows and whatnot, and not limit ourselves to rows of text like AG 1.x had to. We should show a sample input that leads to the conflict, and bracket it on the bottom showing the rules and reductions that go one way and on the top showing the rules and reductions that go the other. (2) common forms; a lot of conflicts arise from common mistakes and common issues. We should have a set of pattern-matching rules to identify and report these common forms, and link to explanations and fix suggestions in the help. - Better crossreferencing of tables. The right-button popup menu for auxiliary windows is fine, but you ought to get something useful by default if you click (or double-click?) on things directly. Also, there are more useful crossreferences than currently exist. - Better presentation of tables. To really take advantage of the information in the various tables and windows AG provides, you have to understand quite a bit about how AG works. This should not be necessary. Also, the AG 2.x user interface is a direct conceptual port of the text-based AG 1.x user interface. It's still fundamentally based on lines of text. This is not necessary and could be improved a great deal without undue difficulty. - Command-line version. agcl should print basic conflict diagnoses as well as the warning and error messages, so one doesn't have to fire up the GUI every time one makes a silly editing mistake. - It would be nice if when you opened the Configuration Parameters window you could change the settings for the current run. (Has to be for the current run, because saving changes is not remotely practical.) - Multiple grammars. In the long run it would be nice to be able to have multiple grammars loaded at the same time and be able to shuffle tokens between them when using File Trace or Grammar Trace. This would allow easier testing and debugging of projects that use multiple interacting grammars, or that use AG to implement communication protocols. Unfortunately, this is *not* trivial. ------------------------------------------------------------ PROGRAMMING INTERFACE - Language support. There's been talk of Java support for AG for quite a while. It would be nice to actually *do* it. Beyond that, other languages which would be useful or interesting to support include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now adding any of these would be nontrivial; however, after the first couple adding another should be relatively straightforward. Note also that as many of these languages aren't syntactically compatible with either C or AnaGram, we will need to design a way to put the syntax and the accompanying code in separate files. - Configuration support for multiple languages. There also needs to be a better and more systematic configuration mechanism for choosing output language and output language dialect. - Output name. It's nice for makefiles to be able to know what the output files will be named. Unfortunately, there's no good way to do this that I can think of, because it depends on the output language. If anyone has a brilliant idea, please share it. - Include files and/or a module system for grammars. Common bits of grammar have to be cut and pasted all the time. Some mechanism to avoid this would be nice. (Some users have used cpp or m4 to preprocess AG input; this works but it's awfully ugly.) - Reading yacc files. There's been a yacc-to-AnaGram converter around for a long time, but it's not fully functional and hasn't ever been officially released or supported. (It originally got hung up because yacc's input language is defined, at least in some references, in a way that makes the grammar LR(2). This is not important now, if it ever was; bison, for example, does not accept the offending constructs.) It might be better at this point to merge the functionality directly into AnaGram so it can read (and build) .y files directly. Finding a clean version of y2ag is the first order of business. Note: the y2ag.syn in the test suite is mangled in some fashion and not a good place to start. ------------------------------------------------------------ BUILD - Dumping UI. To improve the regression testing I'd like to have a fake user interface that takes *all* the data normally available in the various windows and tables and just dumps it to stdout. - Parser equivalence. One of the problems with the test suite is that sometimes an internal change will, perfectly legitimately, permute elements in the various parse tables, causing huge test diffs that need to be inspected by hand. It would be nice to figure out how to either (1) canonicalize the tables so this doesn't happen, or (2) write some kind of munging script that automatically validates and suppresses these changes. (Obviously, (1) is better, but I'm not sure it's feasible.) - Figure out how to do coverage checking in the test suite. (This includes coverage of AG code, coverage of the code sections from the engine definition, and also coverage of config parameters that affect code generation.) ------------------------------------------------------------ INTERNALS - Character sets. Right now, AG is limited to 16-bit-wide character sets, which is a problem as Unicode is now 24 bits wide. Also, the internal handling of case folding and so on is totally inadequate and needs a general revamp. And keywords only work with 8-bit character sets. - Lexemes and disregard. I'd like to clean up the implementation of disregard whitespace and lexemes. It has a sound theoretical foundation, but you wouldn't know it from the existing code. (To be fair, the existing code was severely constrained by DOS memory issues.) - LR(k) grammars. Jerry was talking about this before he died; unfortunately, I don't think anyone knows exactly what he had in mind... - Regexp keywords. Keyword recognition is already a string search process; there's no real reason it couldn't be extended to include regular expression matching. The catch of course is how you preserve any useful notion of soundness in the grammar. - Merge duplicate conflicts. A lot of times the same conflict will happen with a whole pile of tokens. Right now these are all generated and displayed independently; they should really be folded together. - Fix the code generator. The code generator is an ugly mess and needs rationalization. - Interactive completions. Many command-driven programs nowadays allow you to push a key (often tab) to get a list of legal inputs based on what's been typed so far. If input is being handled by an AG parser, you can in theory run the parser on the input so far, inspect the state, and get a list of legal tokens, which you can then use to give the user a context-sensitive list of possible things to type. This is, however, not something that you can possibly code up yourself by interfacing to AG, at least not in any pleasant way, so AG ought to provide support. ------------------------------------------------------------ MANUAL AND DOCS - The manual index is in a parlous state. It was badly mauled by conversion from WP to LaTeX and needs to be rebuilt with reference to the old WP version. - Merge the various articles of shared text that are found in the help, the manual, the miscellaneous documentation, the web page, and/or elsewhere too. These should all be maintained from one master source. The most urgent instances are "sbb" (the syntactic building blocks, found in the examples directory, the misc documentation, and the manual) and the Glossary. Note that the text in the help system is often slightly different from the corresponding text in the manual on purpose, because it plays a different role and has a somewhat different narrative context. (However, the various copies of the Glossary should all be the same. These include: the manual appendix, the glossary web page, the copy in the misc docs, and the one in the help.) - The manual needs a discussion of Unicode, wchar_t, wide characters, multibyte characters, and all that. Of course, first we have to actually *work* with such constructs... (All four listed terms need to appear in the index, too.) - Somewhere in the manual there should be a section to tell you what to do for interactive use. The most important issue, if each input line isn't a complete parse, is how to make sure it won't try to look ahead past the end of a line when the user expects the line to be complete and the program to do something. - Somewhere in the manual there should be a section to tell you what to do for a parser that runs indefinitely and only exits at program shutdown, like you might use to handle a network protocol in a daemon. - It would be nice to have a section or appendix that describes common sources of conflicts and what to do about them. Such constructs include: - if-then-else (text already exists) - repeated repetitions (a -> b..., b -> c...) - poisoning lookahead by bad nesting of productions (I had an example involving trying to distinguish whitespace and comments at too high a level) - sequences of nondelimited tokens (a -> identifier...) - generalize the previous to the full lexer model - ... ------------------------------------------------------------ EXAMPLES - Update the examples to use agclib1 (as seen in XIDEK) instead of oldclasslib.