view doc/admin/todo-large.txt @ 6:607e3be6bad8

Adjust to the moving target called the C++ standard. Apparently nowadays it's not allowed to define an explicit copy constructor but not an assignment operator. Consequently, defining the explicit copy constructor in terms of the implicit/automatic assignment operator for general convenience no longer works. Add assignment operators. Caution: not tested with the IBM compiler, but there's no particular reason it shouldn't work.
author David A. Holland
date Mon, 30 May 2022 23:46:22 -0400
parents 13d2b8934445
children f9e4689b837d
line wrap: on
line source

Large todo items

This file describes the bigger and more involved projects or
undertakings that can or should be done on AnaGram moving forward.

------------------------------------------------------------
USER INTERFACE

 - Build a new user interface that's based on a non-legacy toolkit.
This is the most pressing issue and almost nothing else in this file
should be undertaken until the new user interface is under control.

 - Conflict diagnostics. AG's conflict diagnostics are already far
better than yacc's; however, they are still opaque to most users and
could be improved a good deal. There are a number of areas for
improvement: (1) presentation; we have a GUI, we can and should use it
to draw diagrams and arrows and whatnot, and not limit ourselves to
rows of text like AG 1.x had to. We should show a sample input that
leads to the conflict, and bracket it on the bottom showing the rules
and reductions that go one way and on the top showing the rules and
reductions that go the other. (2) common forms; a lot of conflicts
arise from common mistakes and common issues. We should have a set of
pattern-matching rules to identify and report these common forms, and
link to explanations and fix suggestions in the help.

 - Better crossreferencing of tables. The right-button popup menu for
auxiliary windows is fine, but you ought to get something useful by
default if you click (or double-click?) on things directly. Also,
there are more useful crossreferences than currently exist.

 - Better presentation of tables. To really take advantage of the
information in the various tables and windows AG provides, you have to
understand quite a bit about how AG works. This should not be
necessary. Also, the AG 2.x user interface is a direct conceptual port
of the text-based AG 1.x user interface. It's still fundamentally
based on lines of text. This is not necessary and could be improved a
great deal without undue difficulty.

 - Command-line version. agcl should print basic conflict diagnoses as
well as the warning and error messages, so one doesn't have to fire up
the GUI every time one makes a silly editing mistake.

 - It would be nice if when you opened the Configuration Parameters
window you could change the settings for the current run. (Has to be
for the current run, because saving changes is not remotely
practical.)

 - Multiple grammars. In the long run it would be nice to be able to
have multiple grammars loaded at the same time and be able to shuffle
tokens between them when using File Trace or Grammar Trace. This would
allow easier testing and debugging of projects that use multiple
interacting grammars, or that use AG to implement communication
protocols. Unfortunately, this is *not* trivial.


------------------------------------------------------------
PROGRAMMING INTERFACE

 - Language support. There's been talk of Java support for AG for
quite a while. It would be nice to actually *do* it. Beyond that,
other languages which would be useful or interesting to support
include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now
adding any of these would be nontrivial; however, after the first
couple adding another should be relatively straightforward. Note also
that as many of these languages aren't syntactically compatible with
either C or AnaGram, we will need to design a way to put the syntax
and the accompanying code in separate files.

 - Configuration support for multiple languages. There also needs to
be a better and more systematic configuration mechanism for choosing
output language and output language dialect.

 - Output name. It's nice for makefiles to be able to know what the
output files will be named. Unfortunately, there's no good way to do
this that I can think of, because it depends on the output language.
If anyone has a brilliant idea, please share it.

 - Include files and/or a module system for grammars. Common bits of
grammar have to be cut and pasted all the time. Some mechanism to
avoid this would be nice. (Some users have used cpp or m4 to
preprocess AG input; this works but it's awfully ugly.)

 - Reading yacc files. There's been a yacc-to-AnaGram converter around
for a long time, but it's not fully functional and hasn't ever been
officially released or supported. (It originally got hung up because
yacc's input language is defined, at least in some references, in a
way that makes the grammar LR(2). This is not important now, if it
ever was; bison, for example, does not accept the offending
constructs.) It might be better at this point to merge the
functionality directly into AnaGram so it can read (and build) .y
files directly. Finding a clean version of y2ag is the first order of
business. Note: the y2ag.syn in the test suite is mangled in some
fashion and not a good place to start.


------------------------------------------------------------
BUILD

 - Dumping UI. To improve the regression testing I'd like to have a
fake user interface that takes *all* the data normally available in
the various windows and tables and just dumps it to stdout.

 - Parser equivalence. One of the problems with the test suite is that
sometimes an internal change will, perfectly legitimately, permute
elements in the various parse tables, causing huge test diffs that
need to be inspected by hand. It would be nice to figure out how to
either (1) canonicalize the tables so this doesn't happen, or (2)
write some kind of munging script that automatically validates and
suppresses these changes.  (Obviously, (1) is better, but I'm not sure
it's feasible.)

 - Figure out how to do coverage checking in the test suite. (This
includes coverage of AG code, coverage of the code sections from the
engine definition, and also coverage of config parameters that affect
code generation.)


------------------------------------------------------------
INTERNALS

 - Character sets. Right now, AG is limited to 16-bit-wide character
sets, which is a problem as Unicode is now 24 bits wide. Also, the
internal handling of case folding and so on is totally inadequate and
needs a general revamp. And keywords only work with 8-bit character
sets.

 - Lexemes and disregard. I'd like to clean up the implementation of
disregard whitespace and lexemes. It has a sound theoretical
foundation, but you wouldn't know it from the existing code. (To be
fair, the existing code was severely constrained by DOS memory
issues.)

 - LR(k) grammars. Jerry was talking about this before he died;
unfortunately, I don't think anyone knows exactly what he had in
mind...

 - Regexp keywords. Keyword recognition is already a string search
process; there's no real reason it couldn't be extended to include
regular expression matching. The catch of course is how you preserve
any useful notion of soundness in the grammar.

 - Merge duplicate conflicts. A lot of times the same conflict will
happen with a whole pile of tokens. Right now these are all generated
and displayed independently; they should really be folded together.

 - Fix the code generator. The code generator is an ugly mess and
needs rationalization.

 - Interactive completions. Many command-driven programs nowadays
allow you to push a key (often tab) to get a list of legal inputs
based on what's been typed so far. If input is being handled by an AG
parser, you can in theory run the parser on the input so far, inspect
the state, and get a list of legal tokens, which you can then use to
give the user a context-sensitive list of possible things to
type. This is, however, not something that you can possibly code up
yourself by interfacing to AG, at least not in any pleasant way, so AG
ought to provide support.


------------------------------------------------------------
MANUAL AND DOCS

 - The manual index is in a parlous state. It was badly mauled by
conversion from WP to LaTeX and needs to be rebuilt with reference to
the old WP version.

 - Merge the various articles of shared text that are found in the
help, the manual, the miscellaneous documentation, the web page,
and/or elsewhere too. These should all be maintained from one master
source. The most urgent instances are "sbb" (the syntactic building
blocks, found in the examples directory, the misc documentation, and
the manual) and the Glossary. Note that the text in the help system is
often slightly different from the corresponding text in the manual on
purpose, because it plays a different role and has a somewhat
different narrative context. (However, the various copies of the
Glossary should all be the same. These include: the manual appendix,
the glossary web page, the copy in the misc docs, and the one in the
help.)

 - The manual needs a discussion of Unicode, wchar_t, wide characters,
multibyte characters, and all that. Of course, first we have to
actually *work* with such constructs... (All four listed terms need to
appear in the index, too.)

 - Somewhere in the manual there should be a section to tell you what
to do for interactive use. The most important issue, if each input
line isn't a complete parse, is how to make sure it won't try to look
ahead past the end of a line when the user expects the line to be
complete and the program to do something.

 - Somewhere in the manual there should be a section to tell you what
to do for a parser that runs indefinitely and only exits at program
shutdown, like you might use to handle a network protocol in a daemon.

 - It would be nice to have a section or appendix that describes
common sources of conflicts and what to do about them. Such constructs
include:
	- if-then-else (text already exists)
	- repeated repetitions (a -> b..., b -> c...)
	- poisoning lookahead by bad nesting of productions
          (I had an example involving trying to distinguish whitespace
	  and comments at too high a level)
	- sequences of nondelimited tokens (a -> identifier...)
	- generalize the previous to the full lexer model
	- ...


------------------------------------------------------------
EXAMPLES

 - Update the examples to use agclib1 (as seen in XIDEK) instead of
oldclasslib.