changeset 11:d42c0db81e28

Update the classification stuff. There are still some rough edges but it should be mostly workable now.
author David A. Holland
date Thu, 15 Mar 2012 04:21:20 -0400
parents 1720f45dd495
children d4c3bd255653
files database/schema/classify.sql database/schema/config.sql
diffstat 2 files changed, 320 insertions(+), 166 deletions(-) [+]
line wrap: on
line diff
--- a/database/schema/classify.sql	Sun Feb 19 20:21:51 2012 -0500
+++ b/database/schema/classify.sql	Thu Mar 15 04:21:20 2012 -0400
@@ -9,174 +9,325 @@
 -- be more equal than others, but that does not concern the
 -- database.)
 --
--- The classification schemes/taxonomies that already exist or are
--- projected to be wanted, as of this writing, are:
---
--- 1. The old GNATS system of categories. A PR has one category, which
--- is taken from a fixed list of keywords, currently these:
---    bin install kern lib misc pkg port-* security standards
---    toolchain xsrc y2k
--- Also the old "class" field.
+-- There are several classification schemes/taxonomies that already
+-- exist or have been invented but not yet deployed. These fall into
+-- the following categories:
 --
--- 2. The old GNATS severity and priority fields, which are still
--- useful despite being broad.
---
--- 3. The nearest man page to where the problem appears to be. This is
--- FreeBSD's idea and seems to work fairly well for many things.
---
--- 4. The taxonomy dholland uses in the out-of-gnats PR lists, which
--- is related to the previous (it uses man pages for drivers and
--- programs) but is three-layer hierarchical.
+-- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy
+-- is the one dholland uses in the out-of-GNATS PR lists, which
+-- classifies PRs by location in the system. A second projected one is
+-- classification by consequences (crashes, hangs, etc.) The possible
+-- classifications are defined in advance and stored in the database.
 --
--- 5. Assorted flat tags (e.g. PRs releng considers critical for the
--- 6.0 release), or equivalently, arbitrary hand-maintained lists.
--- It is anticipated that every developer will be able to make their
--- own lists.
+-- 2. Flat taxonomy, that is, a selection from an enumeration of
+-- values. Existing schemes of this type include:
 --
--- 6. Properties derivable from other data that behave like flat tags
--- (e.g. "PRs in feedback more than 6 months") - these should not be
--- stored as tags in the database but it may be desirable to be able
--- to present them the same way as tags.
---
--- 7. Other hierarchical schemes that have yet to be invented.
+--   * the old GNATS scheme of categories (bin, kern, lib, etc.)
+--   * the old GNATS scheme of classes (sw-bug, doc-bug, etc.)
+--   * the old GNATS severity field (critical, serious, non-critical)
+--   * the old GNATS priority field (high, medium, low)
 --
--- Some of these classification schemes have additional special
--- semantics.
---
--- It is furthermore assumed that new classification schemes will be
--- added on the fly, not frequently, but frequently enough that the
--- list of schemes should not be hard-coded into programs that access
+-- For these the possible values are defined in advance and stored in
 -- the database.
 --
--- This creates something of a problem; encoding hierarchical data in
--- SQL is fun enough when the hierarchy is fixed. We need to be able
--- to handle hierarchical data whose layout hasn't been invented yet,
--- and that's really pushing it.
--- 
--- Arbitrary flat tags we can do. For the rest, I think we need to set
--- up a system that allows, essentially, extensions to the schema,
--- which can be arbitrary SQL, and which provide views that allow
--- nonspecialized client software to read (and maybe update, but maybe
--- not) the available material in a standardized way.
+-- 3. String value. Most of the time this is really a flat taxonomy
+-- where the number of possible values is large or changes dynamically
+-- with external circumstances, or both, such that storing a copy of
+-- the legal values in the database would create a maintenanc hassle.
+-- There are none of these at present; the only projected one at the
+-- moment is FreeBSD's idea of "the nearest man page to where the
+-- problem appears to be".
+--
+-- 4. Systems of tags, that is, zero or more selections from a list
+-- of possible values. Current schemes of this type include the
+-- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch
+-- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of
+-- these should be separate flat taxonomy schemes rather than tags,
+-- and some of them should maybe be states rather than tags too. 
+-- (And, perhaps some of the current states should be tags...)
+-- In addition we want to allow every developer to have their own
+-- private tags that aren't exposed. And, we might want to have
+-- project-only tags that other developers can see but users can't.
+-- (It remains to be seen if that's really a good idea.)
+--
+-- Note also that a list of PRs is equivalent to a tag on those PRs.
+--
+-- It is also possible that there may be additional classifications
+-- (possibly of any of the above types, but most likely tags) that are
+-- views rather than data, that is, derived from other information.
+-- One example of this is "PRs in feedback more than 6 months".
+--
 --
--- We will then have to have an additional set of views that take the
--- union of all the extensions; these views will need to be recreated
--- when a new classification scheme is added.
+-- We assume that new classification schemes will be added on the fly,
+-- not frequently, but frequently enough that the list of schemes
+-- should not be hard-coded into programs that access the database,
+-- and we don't want to have to create a new table for each new scheme
+-- either.
+--
+-- Therefore, the way I'll do this is to create one table (or rather,
+-- one family of tables) for each type of classification scheme, and
+-- make them able to handle arbitrary instances.
+--
+-- For hierarchical schemes there is no point trying to encode the
+-- hierarchical structure in SQL; that is a waste of time. Instead
+-- we'll expand the tree of allowed values and rely on the access
+-- software to present the schemes sensibly.
+--
+-- It remains unclear what views ought to be defined.
+--
+--
+-- Notes on the representations:
+--    * a scheme is "total" if it should be defined on all PRs
+--    * if the data contains a null value for a particular scheme,
+--      that means "not specified yet".
+--    * the hierarchical and flat schemes are no different schema-
+--      wise but are separate to allow different handling in software.
+--    * the ordering field in the values tables should be used so that
+--      ORDER BY ordering ASCENDING produces the desired output order
+--      of the legal values.
+--    * the obsolete field for values is for entries that are allowed
+--      to still exist in the database but that should not be used
+--      with new PRs or for new classifications of old PRs.
+--    * the ordering field for the classifications themselves should
+--      be used so that ORDER BY ordering ASCENDING produces the
+--      desired output order of the classification schemes. The
+--      numbers are global across classification scheme types so
+--      schemes of the same type do not need to be sorted together.
+--
 --
 
--- Tags.
-CREATE TABLE tags (
+------------------------------------------------------------
+-- Hierarchical schemes.
+
+-- names and properties of the hierarchical schemes
+CREATE TABLE hierclass_names (
 	name text		primary key,
-	confidential boolean	not null
-	-- XXX should have an owner
+	ordering int		not null,
+	total boolean		not null,
+	description text	not null
 )
 WITHOUT OIDS;
 
-CREATE TABLE has_tags (
+-- allowed values of the hierarchical schemes
+CREATE TABLE hierclass_values (
+	scheme text		not null references hierclass_names (name),
+	value text		not null,
+	ordering int		not null,
+	obsolete boolean	not null,
+	description text	not null,
+	primary key (scheme, value)
+)
+WITHOUT OIDS;
+
+-- classification of PRs according to the hierarchical schemes
+CREATE TABLE hierclass_data (
 	pr bigint		not null references PRs (id),
-	tag text		not null references tags (name),
+	scheme text		not null references hierclass_names (name),
+	value text		,
+	primary key (pr, scheme),
+	--
+	-- what I want here is for it to reference hierclass_values unless
+	-- the value is null. or should the values table include a row
+	-- with null value where that's allowed? I forget how null values
+	-- interoperate with reference constraints except that it's messy.
+	-- XXX. (if changing this change the similar logic below as well)
+	--
+	constraint (scheme, value) references hierclass_values (scheme, value)
+)
+WITHOUT OIDS;
+
+------------------------------------------------------------
+-- Enumerated flat schemes.
+
+-- names and properties of the flat schemes
+CREATE TABLE flatclass_names (
+	name text		primary key,
+	ordering int		not null,
+	total boolean		not null,
+	description text	not null
+)
+WITHOUT OIDS;
+
+-- allowed values of the flat schemes
+CREATE TABLE flatclass_values (
+	scheme text		not null references flatclass_names (name),
+	value text		not null,
+	ordering int		not null,
+	obsolete boolean	not null,
+	description text	not null,
+	primary key (scheme, value)
+)
+WITHOUT OIDS;
+
+-- classification of PRs according to the flat schemes
+CREATE TABLE flatclass_data (
+	pr bigint		not null references PRs (id),
+	scheme text		not null references flatclass_names (name),
+	value text		,
+	primary key (pr, scheme),
+	-- as above
+	constraint (scheme, value) references flatclass_values (scheme, value)
 )
 WITHOUT OIDS;
 
--- Classifications.
---
--- Each classification scheme should define whatever tables and
--- goop it needs. It should also define a text view, as follows
--- assuming the scheme is called "foo":
---
---   . The name of the view should be "foo_classification_text".
---   . The view should produce tuples (scheme, order, pr, value).
---   . The types should be (text, int, bigint, text).
---   . The "scheme" column should be the scheme name, suitable
---     for display, e.g. "Foo".
---   . The "order" column should contain the canonical display
---     order number for this scheme. Used with ORDER BY ASCENDING
---     to get things to appear in a consistent order.
---   . The "pr" column should have pr numbers.
---   . The "value" column should have a text representation of the
---     value.
---
--- Choose the canonical display order number for a new scheme based on
--- the existing values in use. See below for the numbers used by the
--- built-in schemes.
---
--- Ideally it should be possible to update the value through the text
--- view. Ideally it would also be possible to insert through it, but
--- that will never work... have to think about how to handle that
--- best. (XXX?)
---
+------------------------------------------------------------
+-- Text schemes.
+
+-- names and properties of the text schemes
+CREATE TABLE textclass_names (
+	name text		primary key,
+	ordering int		not null,
+	total boolean		not null,
+	description text	not null
+)
+WITHOUT OIDS;
+
+-- classification of PRs according to the text schemes
+CREATE TABLE textclass_data (
+	pr bigint		not null references PRs (id),
+	scheme text		not null references textclass_names (name),
+	value text		,
+	primary key (pr, scheme),
+)
+WITHOUT OIDS;
+
+------------------------------------------------------------
+-- Tag schemes.
 
---------------------------------------------------------------
+-- names and properties of the tag schemes
+-- (total does not make sense here)
+CREATE TABLE tagclass_names (
+	name text		primary key,
+	ordering int		not null,
+	description text	not null
+)
+WITHOUT OIDS;
 
--- The "category" classification.
--- Each PR has one category that must be taken from the list in the
--- categories table. The category is used to select the default
--- responsible entity for incoming PRs.
-CREATE TABLE has_categories (
-	pr bigint		unique not null references PRs (id),
-	category text		not null references categories (name)
+-- allowed values of the tag schemes
+-- (each PR can reference zero or more of these)
+CREATE TABLE tagclass_values (
+	scheme text		not null references tagclass_names (name),
+	value text		not null,
+	ordering int		not null,
+	obsolete boolean	not null,
+	description text	not null,
+	primary key (scheme, value)
+)
+WITHOUT OIDS;
+
+-- classification of PRs according to the tag schemes
+-- (each PR/scheme pair is listed once for each tag attached to it)
+CREATE TABLE tagclass_data (
+	pr bigint		not null references PRs (id),
+	scheme text		not null references tagclass_names (name),
+	value text		,
+	-- as above
+	constraint (scheme, value) references tagclass_values (scheme, value)
 )
 WITHOUT OIDS;
 
-CREATE VIEW category_classification_text AS
-   SELECT "Category" as scheme, 100 as order, pr, category as value
-   FROM has_categories
+-- view where tags are collected into a string
+-- XXX does concat(value) work, and if not, is there any way to do this?
+CREATE VIEW tagclass_stringdata AS
+   SELECT pr, scheme, concat(value) as value
+   FROM tagclass_data
+   GROUP BY pr, scheme
 ;
 
--- The "class" classification.
--- Each PR has one class that must be taken from the list in the
--- classes table.
-CREATE TABLE has_classes (
-	pr bigint		unique not null references PRs (id),
-	class text		not null references classes (name)
-)
-WITHOUT OIDS;
+------------------------------------------------------------
+-- some views
+
+-- views that pull in the ordering fields.
+--
+-- (If you use only one of the ordering fields, as is likely to be
+-- the case in practice, I think we can rely on the query optimizer
+-- to drop the join that collects in the other ordering field. If
+-- not, maybe this should be restructured.)
+--
+-- XXX what do these do when _data.value is null?
 
-CREATE VIEW class_classification_text AS
-   SELECT "Class" as scheme, 200 as order, pr, class as value
-   FROM has_classes
+CREATE VIEW hierclass_data_ordered AS
+   SELECT hierclass_data.pr as pr,
+          hierclass_data.scheme as scheme,
+          hierclass_names.ordering as schemeordering,
+          hierclass_data.value as value,
+          hierclass_values.ordering as valueordering,
+   FROM hierclass_data, hierclass_names, hierclass_values
+   WHERE
+        hierclass_data.scheme = hierclass_names.name
+    AND hierclass_data.scheme = hierclass_values.scheme
+    AND hierclass_data.value = hierclass_values.value
+;
+
+CREATE VIEW flatclass_data_ordered AS
+   SELECT flatclass_data.pr as pr,
+          flatclass_data.scheme as scheme,
+          flatclass_names.ordering as schemeordering,
+          flatclass_data.value as value,
+          flatclass_values.ordering as valueordering,
+   FROM flatclass_data, flatclass_names, flatclass_values
+   WHERE
+        flatclass_data.scheme = flatclass_names.name
+    AND flatclass_data.scheme = flatclass_values.scheme
+    AND flatclass_data.value = flatclass_values.value
 ;
 
--- The "severity" classification.
--- Each PR has one severity that must be taken from the list in the
--- severities table.
-CREATE TABLE has_severities (
-	pr bigint		unique not null references PRs (id),
-	severity text		not null references severities (name)
-)
-WITHOUT OIDS;
-
-CREATE VIEW severity_classification_text AS
-   SELECT "Severity" as scheme, 300 as order, pr, severity as value
-   FROM has_severities
+--
+-- XXX what I'd like to do here is produce textclass_data.value as
+-- valueordering; then sorting by valueordering would produce the
+-- desired results (the entries sorted by text order of the values).
+-- That produces a type conflict though if we combine this view
+-- with the other ordered views where the ordering is a number.
+--
+-- Next best would be to order by textclass_data.value in this view
+-- and produce the row number of the result (or some other fresh
+-- sequence) as valueordering. However, I don't think that's possible.
+--
+-- What's here (using a fixed value of 1) will run but it's not
+-- particularly desirable.
+--
+CREATE VIEW textclass_data_ordered AS
+   SELECT textclass_data.pr as pr,
+          textclass_data.scheme as scheme,
+          textclass_names.ordering as schemeordering,
+          textclass_data.value as value,
+          1 as valueordering,
+   FROM textclass_data, textclass_names
+   WHERE
+        textclass_data.scheme = textclass_names.name
 ;
 
--- The "priority" classification.
--- Each PR has one priority that must be taken from the list in the
--- priorities table.
-CREATE TABLE has_priorities (
-	pr bigint		unique not null references PRs (id),
-	priority text		not null references priorities (name)
-)
-WITHOUT OIDS;
-
-CREATE VIEW priority_classification_text AS
-   SELECT "Priority" as scheme, 400 as order, pr, priority as value
-   FROM has_priorities
+CREATE VIEW tagclass_data_ordered AS
+   SELECT tagclass_data.pr as pr,
+          tagclass_data.scheme as scheme,
+          tagclass_names.ordering as schemeordering,
+          tagclass_data.value as value,
+          tagclass_values.ordering as valueordering,
+   FROM tagclass_data, tagclass_names, tagclass_values
+   WHERE
+        tagclass_data.scheme = tagclass_names.name
+    AND tagclass_data.scheme = tagclass_values.scheme
+    AND tagclass_data.value = tagclass_values.value
 ;
 
---------------------------------------------------------------
-
---
--- The collected view of all classifications.
---
--- (This does not include tags, as tags should be presented
--- differently.)
---
-CREATE VIEW classification_text AS
-         SELECT * from category_classification_text
-   UNION SELECT * from class_classification_text
-   UNION SELECT * from severity_classification_text
-   UNION SELECT * from priority_classification_text
+-- ordered version of tagclass_stringdata
+-- XXX does concat(value) work, and if not, is there any way to do this?
+CREATE VIEW tagclass_stringdata_ordered AS
+   SELECT pr, scheme, schemeordering,
+          concat(value) as value, first(valueordering) as valueordering
+   FROM tagclass_data
+   ORDER BY valueordering
+   GROUP BY pr, scheme
 ;
 
+
+-- a view that combines all the classification data.
+
+CREATE VIEW classifications AS
+         SELECT * FROM hierclass_data_ordered
+   UNION SELECT * FROM flatclass_data_ordered
+   UNION SELECT * FROM textclass_data_ordered
+   UNION SELECT * FROM tagclass_stringdata_ordered
+   ORDER BY schemeordering
+;
+
--- a/database/schema/config.sql	Sun Feb 19 20:21:51 2012 -0500
+++ b/database/schema/config.sql	Thu Mar 15 04:21:20 2012 -0400
@@ -7,35 +7,38 @@
 -- allowed to exist in the database, but new uses are not permitted.
 --
 
--- Severity ratings of bugs
-CREATE TABLE severities (
-	name text		primary key,
-	obsolete boolean	not null,
-	description text
-)
-WITHOUT OIDS;
+---- These are now handled in classify.sql
+-- 
+-- -- Severity ratings of bugs
+-- CREATE TABLE severities (
+-- 	name text		primary key,
+-- 	obsolete boolean	not null,
+-- 	description text
+-- )
+-- WITHOUT OIDS;
+-- 
+-- -- Priority ratings of bugs
+-- CREATE TABLE priorities (
+-- 	name text		primary key,
+-- 	obsolete boolean	not null,
+-- 	description text
+-- )
+-- WITHOUT OIDS;
+-- 
+-- -- Types of bugs (sw-bug, doc-bug, etc.)
+-- CREATE TABLE classes (
+-- 	name text		primary key,
+-- 	obsolete boolean	not null,
+-- 	description text
+-- ) WITHOUT OIDS;
+-- 
+-- -- Categories of bugs (bin, kern, pkg, etc.)
+-- CREATE TABLE categories (
+-- 	name text		primary key,
+-- 	obsolete boolean	not null,
+-- 	description text	
+-- ) WITHOUT OIDS;
 
--- Priority ratings of bugs
-CREATE TABLE priorities (
-	name text		primary key,
-	obsolete boolean	not null,
-	description text
-)
-WITHOUT OIDS;
-
--- Types of bugs (sw-bug, doc-bug, etc.)
-CREATE TABLE classes (
-	name text		primary key,
-	obsolete boolean	not null,
-	description text
-) WITHOUT OIDS;
-
--- Categories of bugs (bin, kern, pkg, etc.)
-CREATE TABLE categories (
-	name text		primary key,
-	obsolete boolean	not null,
-	description text	
-) WITHOUT OIDS;
 
 -- States of bugs
 --