# HG changeset patch # User David A. Holland # Date 1331799680 14400 # Node ID d42c0db81e28d519ae18c50f3036f78c59b5e249 # Parent 1720f45dd4953617bb698594b174fe57b4b05fb2 Update the classification stuff. There are still some rough edges but it should be mostly workable now. diff -r 1720f45dd495 -r d42c0db81e28 database/schema/classify.sql --- a/database/schema/classify.sql Sun Feb 19 20:21:51 2012 -0500 +++ b/database/schema/classify.sql Thu Mar 15 04:21:20 2012 -0400 @@ -9,174 +9,325 @@ -- be more equal than others, but that does not concern the -- database.) -- --- The classification schemes/taxonomies that already exist or are --- projected to be wanted, as of this writing, are: --- --- 1. The old GNATS system of categories. A PR has one category, which --- is taken from a fixed list of keywords, currently these: --- bin install kern lib misc pkg port-* security standards --- toolchain xsrc y2k --- Also the old "class" field. +-- There are several classification schemes/taxonomies that already +-- exist or have been invented but not yet deployed. These fall into +-- the following categories: -- --- 2. The old GNATS severity and priority fields, which are still --- useful despite being broad. --- --- 3. The nearest man page to where the problem appears to be. This is --- FreeBSD's idea and seems to work fairly well for many things. --- --- 4. The taxonomy dholland uses in the out-of-gnats PR lists, which --- is related to the previous (it uses man pages for drivers and --- programs) but is three-layer hierarchical. +-- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy +-- is the one dholland uses in the out-of-GNATS PR lists, which +-- classifies PRs by location in the system. A second projected one is +-- classification by consequences (crashes, hangs, etc.) The possible +-- classifications are defined in advance and stored in the database. -- --- 5. Assorted flat tags (e.g. PRs releng considers critical for the --- 6.0 release), or equivalently, arbitrary hand-maintained lists. --- It is anticipated that every developer will be able to make their --- own lists. +-- 2. Flat taxonomy, that is, a selection from an enumeration of +-- values. Existing schemes of this type include: -- --- 6. Properties derivable from other data that behave like flat tags --- (e.g. "PRs in feedback more than 6 months") - these should not be --- stored as tags in the database but it may be desirable to be able --- to present them the same way as tags. --- --- 7. Other hierarchical schemes that have yet to be invented. +-- * the old GNATS scheme of categories (bin, kern, lib, etc.) +-- * the old GNATS scheme of classes (sw-bug, doc-bug, etc.) +-- * the old GNATS severity field (critical, serious, non-critical) +-- * the old GNATS priority field (high, medium, low) -- --- Some of these classification schemes have additional special --- semantics. --- --- It is furthermore assumed that new classification schemes will be --- added on the fly, not frequently, but frequently enough that the --- list of schemes should not be hard-coded into programs that access +-- For these the possible values are defined in advance and stored in -- the database. -- --- This creates something of a problem; encoding hierarchical data in --- SQL is fun enough when the hierarchy is fixed. We need to be able --- to handle hierarchical data whose layout hasn't been invented yet, --- and that's really pushing it. --- --- Arbitrary flat tags we can do. For the rest, I think we need to set --- up a system that allows, essentially, extensions to the schema, --- which can be arbitrary SQL, and which provide views that allow --- nonspecialized client software to read (and maybe update, but maybe --- not) the available material in a standardized way. +-- 3. String value. Most of the time this is really a flat taxonomy +-- where the number of possible values is large or changes dynamically +-- with external circumstances, or both, such that storing a copy of +-- the legal values in the database would create a maintenanc hassle. +-- There are none of these at present; the only projected one at the +-- moment is FreeBSD's idea of "the nearest man page to where the +-- problem appears to be". +-- +-- 4. Systems of tags, that is, zero or more selections from a list +-- of possible values. Current schemes of this type include the +-- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch +-- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of +-- these should be separate flat taxonomy schemes rather than tags, +-- and some of them should maybe be states rather than tags too. +-- (And, perhaps some of the current states should be tags...) +-- In addition we want to allow every developer to have their own +-- private tags that aren't exposed. And, we might want to have +-- project-only tags that other developers can see but users can't. +-- (It remains to be seen if that's really a good idea.) +-- +-- Note also that a list of PRs is equivalent to a tag on those PRs. +-- +-- It is also possible that there may be additional classifications +-- (possibly of any of the above types, but most likely tags) that are +-- views rather than data, that is, derived from other information. +-- One example of this is "PRs in feedback more than 6 months". +-- -- --- We will then have to have an additional set of views that take the --- union of all the extensions; these views will need to be recreated --- when a new classification scheme is added. +-- We assume that new classification schemes will be added on the fly, +-- not frequently, but frequently enough that the list of schemes +-- should not be hard-coded into programs that access the database, +-- and we don't want to have to create a new table for each new scheme +-- either. +-- +-- Therefore, the way I'll do this is to create one table (or rather, +-- one family of tables) for each type of classification scheme, and +-- make them able to handle arbitrary instances. +-- +-- For hierarchical schemes there is no point trying to encode the +-- hierarchical structure in SQL; that is a waste of time. Instead +-- we'll expand the tree of allowed values and rely on the access +-- software to present the schemes sensibly. +-- +-- It remains unclear what views ought to be defined. +-- +-- +-- Notes on the representations: +-- * a scheme is "total" if it should be defined on all PRs +-- * if the data contains a null value for a particular scheme, +-- that means "not specified yet". +-- * the hierarchical and flat schemes are no different schema- +-- wise but are separate to allow different handling in software. +-- * the ordering field in the values tables should be used so that +-- ORDER BY ordering ASCENDING produces the desired output order +-- of the legal values. +-- * the obsolete field for values is for entries that are allowed +-- to still exist in the database but that should not be used +-- with new PRs or for new classifications of old PRs. +-- * the ordering field for the classifications themselves should +-- be used so that ORDER BY ordering ASCENDING produces the +-- desired output order of the classification schemes. The +-- numbers are global across classification scheme types so +-- schemes of the same type do not need to be sorted together. +-- -- --- Tags. -CREATE TABLE tags ( +------------------------------------------------------------ +-- Hierarchical schemes. + +-- names and properties of the hierarchical schemes +CREATE TABLE hierclass_names ( name text primary key, - confidential boolean not null - -- XXX should have an owner + ordering int not null, + total boolean not null, + description text not null ) WITHOUT OIDS; -CREATE TABLE has_tags ( +-- allowed values of the hierarchical schemes +CREATE TABLE hierclass_values ( + scheme text not null references hierclass_names (name), + value text not null, + ordering int not null, + obsolete boolean not null, + description text not null, + primary key (scheme, value) +) +WITHOUT OIDS; + +-- classification of PRs according to the hierarchical schemes +CREATE TABLE hierclass_data ( pr bigint not null references PRs (id), - tag text not null references tags (name), + scheme text not null references hierclass_names (name), + value text , + primary key (pr, scheme), + -- + -- what I want here is for it to reference hierclass_values unless + -- the value is null. or should the values table include a row + -- with null value where that's allowed? I forget how null values + -- interoperate with reference constraints except that it's messy. + -- XXX. (if changing this change the similar logic below as well) + -- + constraint (scheme, value) references hierclass_values (scheme, value) +) +WITHOUT OIDS; + +------------------------------------------------------------ +-- Enumerated flat schemes. + +-- names and properties of the flat schemes +CREATE TABLE flatclass_names ( + name text primary key, + ordering int not null, + total boolean not null, + description text not null +) +WITHOUT OIDS; + +-- allowed values of the flat schemes +CREATE TABLE flatclass_values ( + scheme text not null references flatclass_names (name), + value text not null, + ordering int not null, + obsolete boolean not null, + description text not null, + primary key (scheme, value) +) +WITHOUT OIDS; + +-- classification of PRs according to the flat schemes +CREATE TABLE flatclass_data ( + pr bigint not null references PRs (id), + scheme text not null references flatclass_names (name), + value text , + primary key (pr, scheme), + -- as above + constraint (scheme, value) references flatclass_values (scheme, value) ) WITHOUT OIDS; --- Classifications. --- --- Each classification scheme should define whatever tables and --- goop it needs. It should also define a text view, as follows --- assuming the scheme is called "foo": --- --- . The name of the view should be "foo_classification_text". --- . The view should produce tuples (scheme, order, pr, value). --- . The types should be (text, int, bigint, text). --- . The "scheme" column should be the scheme name, suitable --- for display, e.g. "Foo". --- . The "order" column should contain the canonical display --- order number for this scheme. Used with ORDER BY ASCENDING --- to get things to appear in a consistent order. --- . The "pr" column should have pr numbers. --- . The "value" column should have a text representation of the --- value. --- --- Choose the canonical display order number for a new scheme based on --- the existing values in use. See below for the numbers used by the --- built-in schemes. --- --- Ideally it should be possible to update the value through the text --- view. Ideally it would also be possible to insert through it, but --- that will never work... have to think about how to handle that --- best. (XXX?) --- +------------------------------------------------------------ +-- Text schemes. + +-- names and properties of the text schemes +CREATE TABLE textclass_names ( + name text primary key, + ordering int not null, + total boolean not null, + description text not null +) +WITHOUT OIDS; + +-- classification of PRs according to the text schemes +CREATE TABLE textclass_data ( + pr bigint not null references PRs (id), + scheme text not null references textclass_names (name), + value text , + primary key (pr, scheme), +) +WITHOUT OIDS; + +------------------------------------------------------------ +-- Tag schemes. --------------------------------------------------------------- +-- names and properties of the tag schemes +-- (total does not make sense here) +CREATE TABLE tagclass_names ( + name text primary key, + ordering int not null, + description text not null +) +WITHOUT OIDS; --- The "category" classification. --- Each PR has one category that must be taken from the list in the --- categories table. The category is used to select the default --- responsible entity for incoming PRs. -CREATE TABLE has_categories ( - pr bigint unique not null references PRs (id), - category text not null references categories (name) +-- allowed values of the tag schemes +-- (each PR can reference zero or more of these) +CREATE TABLE tagclass_values ( + scheme text not null references tagclass_names (name), + value text not null, + ordering int not null, + obsolete boolean not null, + description text not null, + primary key (scheme, value) +) +WITHOUT OIDS; + +-- classification of PRs according to the tag schemes +-- (each PR/scheme pair is listed once for each tag attached to it) +CREATE TABLE tagclass_data ( + pr bigint not null references PRs (id), + scheme text not null references tagclass_names (name), + value text , + -- as above + constraint (scheme, value) references tagclass_values (scheme, value) ) WITHOUT OIDS; -CREATE VIEW category_classification_text AS - SELECT "Category" as scheme, 100 as order, pr, category as value - FROM has_categories +-- view where tags are collected into a string +-- XXX does concat(value) work, and if not, is there any way to do this? +CREATE VIEW tagclass_stringdata AS + SELECT pr, scheme, concat(value) as value + FROM tagclass_data + GROUP BY pr, scheme ; --- The "class" classification. --- Each PR has one class that must be taken from the list in the --- classes table. -CREATE TABLE has_classes ( - pr bigint unique not null references PRs (id), - class text not null references classes (name) -) -WITHOUT OIDS; +------------------------------------------------------------ +-- some views + +-- views that pull in the ordering fields. +-- +-- (If you use only one of the ordering fields, as is likely to be +-- the case in practice, I think we can rely on the query optimizer +-- to drop the join that collects in the other ordering field. If +-- not, maybe this should be restructured.) +-- +-- XXX what do these do when _data.value is null? -CREATE VIEW class_classification_text AS - SELECT "Class" as scheme, 200 as order, pr, class as value - FROM has_classes +CREATE VIEW hierclass_data_ordered AS + SELECT hierclass_data.pr as pr, + hierclass_data.scheme as scheme, + hierclass_names.ordering as schemeordering, + hierclass_data.value as value, + hierclass_values.ordering as valueordering, + FROM hierclass_data, hierclass_names, hierclass_values + WHERE + hierclass_data.scheme = hierclass_names.name + AND hierclass_data.scheme = hierclass_values.scheme + AND hierclass_data.value = hierclass_values.value +; + +CREATE VIEW flatclass_data_ordered AS + SELECT flatclass_data.pr as pr, + flatclass_data.scheme as scheme, + flatclass_names.ordering as schemeordering, + flatclass_data.value as value, + flatclass_values.ordering as valueordering, + FROM flatclass_data, flatclass_names, flatclass_values + WHERE + flatclass_data.scheme = flatclass_names.name + AND flatclass_data.scheme = flatclass_values.scheme + AND flatclass_data.value = flatclass_values.value ; --- The "severity" classification. --- Each PR has one severity that must be taken from the list in the --- severities table. -CREATE TABLE has_severities ( - pr bigint unique not null references PRs (id), - severity text not null references severities (name) -) -WITHOUT OIDS; - -CREATE VIEW severity_classification_text AS - SELECT "Severity" as scheme, 300 as order, pr, severity as value - FROM has_severities +-- +-- XXX what I'd like to do here is produce textclass_data.value as +-- valueordering; then sorting by valueordering would produce the +-- desired results (the entries sorted by text order of the values). +-- That produces a type conflict though if we combine this view +-- with the other ordered views where the ordering is a number. +-- +-- Next best would be to order by textclass_data.value in this view +-- and produce the row number of the result (or some other fresh +-- sequence) as valueordering. However, I don't think that's possible. +-- +-- What's here (using a fixed value of 1) will run but it's not +-- particularly desirable. +-- +CREATE VIEW textclass_data_ordered AS + SELECT textclass_data.pr as pr, + textclass_data.scheme as scheme, + textclass_names.ordering as schemeordering, + textclass_data.value as value, + 1 as valueordering, + FROM textclass_data, textclass_names + WHERE + textclass_data.scheme = textclass_names.name ; --- The "priority" classification. --- Each PR has one priority that must be taken from the list in the --- priorities table. -CREATE TABLE has_priorities ( - pr bigint unique not null references PRs (id), - priority text not null references priorities (name) -) -WITHOUT OIDS; - -CREATE VIEW priority_classification_text AS - SELECT "Priority" as scheme, 400 as order, pr, priority as value - FROM has_priorities +CREATE VIEW tagclass_data_ordered AS + SELECT tagclass_data.pr as pr, + tagclass_data.scheme as scheme, + tagclass_names.ordering as schemeordering, + tagclass_data.value as value, + tagclass_values.ordering as valueordering, + FROM tagclass_data, tagclass_names, tagclass_values + WHERE + tagclass_data.scheme = tagclass_names.name + AND tagclass_data.scheme = tagclass_values.scheme + AND tagclass_data.value = tagclass_values.value ; --------------------------------------------------------------- - --- --- The collected view of all classifications. --- --- (This does not include tags, as tags should be presented --- differently.) --- -CREATE VIEW classification_text AS - SELECT * from category_classification_text - UNION SELECT * from class_classification_text - UNION SELECT * from severity_classification_text - UNION SELECT * from priority_classification_text +-- ordered version of tagclass_stringdata +-- XXX does concat(value) work, and if not, is there any way to do this? +CREATE VIEW tagclass_stringdata_ordered AS + SELECT pr, scheme, schemeordering, + concat(value) as value, first(valueordering) as valueordering + FROM tagclass_data + ORDER BY valueordering + GROUP BY pr, scheme ; + +-- a view that combines all the classification data. + +CREATE VIEW classifications AS + SELECT * FROM hierclass_data_ordered + UNION SELECT * FROM flatclass_data_ordered + UNION SELECT * FROM textclass_data_ordered + UNION SELECT * FROM tagclass_stringdata_ordered + ORDER BY schemeordering +; + diff -r 1720f45dd495 -r d42c0db81e28 database/schema/config.sql --- a/database/schema/config.sql Sun Feb 19 20:21:51 2012 -0500 +++ b/database/schema/config.sql Thu Mar 15 04:21:20 2012 -0400 @@ -7,35 +7,38 @@ -- allowed to exist in the database, but new uses are not permitted. -- --- Severity ratings of bugs -CREATE TABLE severities ( - name text primary key, - obsolete boolean not null, - description text -) -WITHOUT OIDS; +---- These are now handled in classify.sql +-- +-- -- Severity ratings of bugs +-- CREATE TABLE severities ( +-- name text primary key, +-- obsolete boolean not null, +-- description text +-- ) +-- WITHOUT OIDS; +-- +-- -- Priority ratings of bugs +-- CREATE TABLE priorities ( +-- name text primary key, +-- obsolete boolean not null, +-- description text +-- ) +-- WITHOUT OIDS; +-- +-- -- Types of bugs (sw-bug, doc-bug, etc.) +-- CREATE TABLE classes ( +-- name text primary key, +-- obsolete boolean not null, +-- description text +-- ) WITHOUT OIDS; +-- +-- -- Categories of bugs (bin, kern, pkg, etc.) +-- CREATE TABLE categories ( +-- name text primary key, +-- obsolete boolean not null, +-- description text +-- ) WITHOUT OIDS; --- Priority ratings of bugs -CREATE TABLE priorities ( - name text primary key, - obsolete boolean not null, - description text -) -WITHOUT OIDS; - --- Types of bugs (sw-bug, doc-bug, etc.) -CREATE TABLE classes ( - name text primary key, - obsolete boolean not null, - description text -) WITHOUT OIDS; - --- Categories of bugs (bin, kern, pkg, etc.) -CREATE TABLE categories ( - name text primary key, - obsolete boolean not null, - description text -) WITHOUT OIDS; -- States of bugs --