Mercurial > ~dholland > hg > swallowtail > index.cgi
comparison database/schema/classify.sql @ 11:d42c0db81e28
Update the classification stuff.
There are still some rough edges but it should be mostly workable now.
author | David A. Holland |
---|---|
date | Thu, 15 Mar 2012 04:21:20 -0400 |
parents | 68cc276ac118 |
children | ca4679ac4e02 |
comparison
equal
deleted
inserted
replaced
10:1720f45dd495 | 11:d42c0db81e28 |
---|---|
7 -- taxonomy, we allow multiple arbitrary classification schemes, | 7 -- taxonomy, we allow multiple arbitrary classification schemes, |
8 -- all of which are created equal. (Administratively, some will | 8 -- all of which are created equal. (Administratively, some will |
9 -- be more equal than others, but that does not concern the | 9 -- be more equal than others, but that does not concern the |
10 -- database.) | 10 -- database.) |
11 -- | 11 -- |
12 -- The classification schemes/taxonomies that already exist or are | 12 -- There are several classification schemes/taxonomies that already |
13 -- projected to be wanted, as of this writing, are: | 13 -- exist or have been invented but not yet deployed. These fall into |
14 -- | 14 -- the following categories: |
15 -- 1. The old GNATS system of categories. A PR has one category, which | 15 -- |
16 -- is taken from a fixed list of keywords, currently these: | 16 -- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy |
17 -- bin install kern lib misc pkg port-* security standards | 17 -- is the one dholland uses in the out-of-GNATS PR lists, which |
18 -- toolchain xsrc y2k | 18 -- classifies PRs by location in the system. A second projected one is |
19 -- Also the old "class" field. | 19 -- classification by consequences (crashes, hangs, etc.) The possible |
20 -- | 20 -- classifications are defined in advance and stored in the database. |
21 -- 2. The old GNATS severity and priority fields, which are still | 21 -- |
22 -- useful despite being broad. | 22 -- 2. Flat taxonomy, that is, a selection from an enumeration of |
23 -- | 23 -- values. Existing schemes of this type include: |
24 -- 3. The nearest man page to where the problem appears to be. This is | 24 -- |
25 -- FreeBSD's idea and seems to work fairly well for many things. | 25 -- * the old GNATS scheme of categories (bin, kern, lib, etc.) |
26 -- | 26 -- * the old GNATS scheme of classes (sw-bug, doc-bug, etc.) |
27 -- 4. The taxonomy dholland uses in the out-of-gnats PR lists, which | 27 -- * the old GNATS severity field (critical, serious, non-critical) |
28 -- is related to the previous (it uses man pages for drivers and | 28 -- * the old GNATS priority field (high, medium, low) |
29 -- programs) but is three-layer hierarchical. | 29 -- |
30 -- | 30 -- For these the possible values are defined in advance and stored in |
31 -- 5. Assorted flat tags (e.g. PRs releng considers critical for the | |
32 -- 6.0 release), or equivalently, arbitrary hand-maintained lists. | |
33 -- It is anticipated that every developer will be able to make their | |
34 -- own lists. | |
35 -- | |
36 -- 6. Properties derivable from other data that behave like flat tags | |
37 -- (e.g. "PRs in feedback more than 6 months") - these should not be | |
38 -- stored as tags in the database but it may be desirable to be able | |
39 -- to present them the same way as tags. | |
40 -- | |
41 -- 7. Other hierarchical schemes that have yet to be invented. | |
42 -- | |
43 -- Some of these classification schemes have additional special | |
44 -- semantics. | |
45 -- | |
46 -- It is furthermore assumed that new classification schemes will be | |
47 -- added on the fly, not frequently, but frequently enough that the | |
48 -- list of schemes should not be hard-coded into programs that access | |
49 -- the database. | 31 -- the database. |
50 -- | 32 -- |
51 -- This creates something of a problem; encoding hierarchical data in | 33 -- 3. String value. Most of the time this is really a flat taxonomy |
52 -- SQL is fun enough when the hierarchy is fixed. We need to be able | 34 -- where the number of possible values is large or changes dynamically |
53 -- to handle hierarchical data whose layout hasn't been invented yet, | 35 -- with external circumstances, or both, such that storing a copy of |
54 -- and that's really pushing it. | 36 -- the legal values in the database would create a maintenanc hassle. |
55 -- | 37 -- There are none of these at present; the only projected one at the |
56 -- Arbitrary flat tags we can do. For the rest, I think we need to set | 38 -- moment is FreeBSD's idea of "the nearest man page to where the |
57 -- up a system that allows, essentially, extensions to the schema, | 39 -- problem appears to be". |
58 -- which can be arbitrary SQL, and which provide views that allow | 40 -- |
59 -- nonspecialized client software to read (and maybe update, but maybe | 41 -- 4. Systems of tags, that is, zero or more selections from a list |
60 -- not) the available material in a standardized way. | 42 -- of possible values. Current schemes of this type include the |
61 -- | 43 -- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch |
62 -- We will then have to have an additional set of views that take the | 44 -- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of |
63 -- union of all the extensions; these views will need to be recreated | 45 -- these should be separate flat taxonomy schemes rather than tags, |
64 -- when a new classification scheme is added. | 46 -- and some of them should maybe be states rather than tags too. |
65 -- | 47 -- (And, perhaps some of the current states should be tags...) |
66 | 48 -- In addition we want to allow every developer to have their own |
67 -- Tags. | 49 -- private tags that aren't exposed. And, we might want to have |
68 CREATE TABLE tags ( | 50 -- project-only tags that other developers can see but users can't. |
51 -- (It remains to be seen if that's really a good idea.) | |
52 -- | |
53 -- Note also that a list of PRs is equivalent to a tag on those PRs. | |
54 -- | |
55 -- It is also possible that there may be additional classifications | |
56 -- (possibly of any of the above types, but most likely tags) that are | |
57 -- views rather than data, that is, derived from other information. | |
58 -- One example of this is "PRs in feedback more than 6 months". | |
59 -- | |
60 -- | |
61 -- We assume that new classification schemes will be added on the fly, | |
62 -- not frequently, but frequently enough that the list of schemes | |
63 -- should not be hard-coded into programs that access the database, | |
64 -- and we don't want to have to create a new table for each new scheme | |
65 -- either. | |
66 -- | |
67 -- Therefore, the way I'll do this is to create one table (or rather, | |
68 -- one family of tables) for each type of classification scheme, and | |
69 -- make them able to handle arbitrary instances. | |
70 -- | |
71 -- For hierarchical schemes there is no point trying to encode the | |
72 -- hierarchical structure in SQL; that is a waste of time. Instead | |
73 -- we'll expand the tree of allowed values and rely on the access | |
74 -- software to present the schemes sensibly. | |
75 -- | |
76 -- It remains unclear what views ought to be defined. | |
77 -- | |
78 -- | |
79 -- Notes on the representations: | |
80 -- * a scheme is "total" if it should be defined on all PRs | |
81 -- * if the data contains a null value for a particular scheme, | |
82 -- that means "not specified yet". | |
83 -- * the hierarchical and flat schemes are no different schema- | |
84 -- wise but are separate to allow different handling in software. | |
85 -- * the ordering field in the values tables should be used so that | |
86 -- ORDER BY ordering ASCENDING produces the desired output order | |
87 -- of the legal values. | |
88 -- * the obsolete field for values is for entries that are allowed | |
89 -- to still exist in the database but that should not be used | |
90 -- with new PRs or for new classifications of old PRs. | |
91 -- * the ordering field for the classifications themselves should | |
92 -- be used so that ORDER BY ordering ASCENDING produces the | |
93 -- desired output order of the classification schemes. The | |
94 -- numbers are global across classification scheme types so | |
95 -- schemes of the same type do not need to be sorted together. | |
96 -- | |
97 -- | |
98 | |
99 ------------------------------------------------------------ | |
100 -- Hierarchical schemes. | |
101 | |
102 -- names and properties of the hierarchical schemes | |
103 CREATE TABLE hierclass_names ( | |
69 name text primary key, | 104 name text primary key, |
70 confidential boolean not null | 105 ordering int not null, |
71 -- XXX should have an owner | 106 total boolean not null, |
72 ) | 107 description text not null |
73 WITHOUT OIDS; | 108 ) |
74 | 109 WITHOUT OIDS; |
75 CREATE TABLE has_tags ( | 110 |
111 -- allowed values of the hierarchical schemes | |
112 CREATE TABLE hierclass_values ( | |
113 scheme text not null references hierclass_names (name), | |
114 value text not null, | |
115 ordering int not null, | |
116 obsolete boolean not null, | |
117 description text not null, | |
118 primary key (scheme, value) | |
119 ) | |
120 WITHOUT OIDS; | |
121 | |
122 -- classification of PRs according to the hierarchical schemes | |
123 CREATE TABLE hierclass_data ( | |
76 pr bigint not null references PRs (id), | 124 pr bigint not null references PRs (id), |
77 tag text not null references tags (name), | 125 scheme text not null references hierclass_names (name), |
78 ) | 126 value text , |
79 WITHOUT OIDS; | 127 primary key (pr, scheme), |
80 | 128 -- |
81 -- Classifications. | 129 -- what I want here is for it to reference hierclass_values unless |
82 -- | 130 -- the value is null. or should the values table include a row |
83 -- Each classification scheme should define whatever tables and | 131 -- with null value where that's allowed? I forget how null values |
84 -- goop it needs. It should also define a text view, as follows | 132 -- interoperate with reference constraints except that it's messy. |
85 -- assuming the scheme is called "foo": | 133 -- XXX. (if changing this change the similar logic below as well) |
86 -- | 134 -- |
87 -- . The name of the view should be "foo_classification_text". | 135 constraint (scheme, value) references hierclass_values (scheme, value) |
88 -- . The view should produce tuples (scheme, order, pr, value). | 136 ) |
89 -- . The types should be (text, int, bigint, text). | 137 WITHOUT OIDS; |
90 -- . The "scheme" column should be the scheme name, suitable | 138 |
91 -- for display, e.g. "Foo". | 139 ------------------------------------------------------------ |
92 -- . The "order" column should contain the canonical display | 140 -- Enumerated flat schemes. |
93 -- order number for this scheme. Used with ORDER BY ASCENDING | 141 |
94 -- to get things to appear in a consistent order. | 142 -- names and properties of the flat schemes |
95 -- . The "pr" column should have pr numbers. | 143 CREATE TABLE flatclass_names ( |
96 -- . The "value" column should have a text representation of the | 144 name text primary key, |
97 -- value. | 145 ordering int not null, |
98 -- | 146 total boolean not null, |
99 -- Choose the canonical display order number for a new scheme based on | 147 description text not null |
100 -- the existing values in use. See below for the numbers used by the | 148 ) |
101 -- built-in schemes. | 149 WITHOUT OIDS; |
102 -- | 150 |
103 -- Ideally it should be possible to update the value through the text | 151 -- allowed values of the flat schemes |
104 -- view. Ideally it would also be possible to insert through it, but | 152 CREATE TABLE flatclass_values ( |
105 -- that will never work... have to think about how to handle that | 153 scheme text not null references flatclass_names (name), |
106 -- best. (XXX?) | 154 value text not null, |
107 -- | 155 ordering int not null, |
108 | 156 obsolete boolean not null, |
109 -------------------------------------------------------------- | 157 description text not null, |
110 | 158 primary key (scheme, value) |
111 -- The "category" classification. | 159 ) |
112 -- Each PR has one category that must be taken from the list in the | 160 WITHOUT OIDS; |
113 -- categories table. The category is used to select the default | 161 |
114 -- responsible entity for incoming PRs. | 162 -- classification of PRs according to the flat schemes |
115 CREATE TABLE has_categories ( | 163 CREATE TABLE flatclass_data ( |
116 pr bigint unique not null references PRs (id), | 164 pr bigint not null references PRs (id), |
117 category text not null references categories (name) | 165 scheme text not null references flatclass_names (name), |
118 ) | 166 value text , |
119 WITHOUT OIDS; | 167 primary key (pr, scheme), |
120 | 168 -- as above |
121 CREATE VIEW category_classification_text AS | 169 constraint (scheme, value) references flatclass_values (scheme, value) |
122 SELECT "Category" as scheme, 100 as order, pr, category as value | 170 ) |
123 FROM has_categories | 171 WITHOUT OIDS; |
124 ; | 172 |
125 | 173 ------------------------------------------------------------ |
126 -- The "class" classification. | 174 -- Text schemes. |
127 -- Each PR has one class that must be taken from the list in the | 175 |
128 -- classes table. | 176 -- names and properties of the text schemes |
129 CREATE TABLE has_classes ( | 177 CREATE TABLE textclass_names ( |
130 pr bigint unique not null references PRs (id), | 178 name text primary key, |
131 class text not null references classes (name) | 179 ordering int not null, |
132 ) | 180 total boolean not null, |
133 WITHOUT OIDS; | 181 description text not null |
134 | 182 ) |
135 CREATE VIEW class_classification_text AS | 183 WITHOUT OIDS; |
136 SELECT "Class" as scheme, 200 as order, pr, class as value | 184 |
137 FROM has_classes | 185 -- classification of PRs according to the text schemes |
138 ; | 186 CREATE TABLE textclass_data ( |
139 | 187 pr bigint not null references PRs (id), |
140 -- The "severity" classification. | 188 scheme text not null references textclass_names (name), |
141 -- Each PR has one severity that must be taken from the list in the | 189 value text , |
142 -- severities table. | 190 primary key (pr, scheme), |
143 CREATE TABLE has_severities ( | 191 ) |
144 pr bigint unique not null references PRs (id), | 192 WITHOUT OIDS; |
145 severity text not null references severities (name) | 193 |
146 ) | 194 ------------------------------------------------------------ |
147 WITHOUT OIDS; | 195 -- Tag schemes. |
148 | 196 |
149 CREATE VIEW severity_classification_text AS | 197 -- names and properties of the tag schemes |
150 SELECT "Severity" as scheme, 300 as order, pr, severity as value | 198 -- (total does not make sense here) |
151 FROM has_severities | 199 CREATE TABLE tagclass_names ( |
152 ; | 200 name text primary key, |
153 | 201 ordering int not null, |
154 -- The "priority" classification. | 202 description text not null |
155 -- Each PR has one priority that must be taken from the list in the | 203 ) |
156 -- priorities table. | 204 WITHOUT OIDS; |
157 CREATE TABLE has_priorities ( | 205 |
158 pr bigint unique not null references PRs (id), | 206 -- allowed values of the tag schemes |
159 priority text not null references priorities (name) | 207 -- (each PR can reference zero or more of these) |
160 ) | 208 CREATE TABLE tagclass_values ( |
161 WITHOUT OIDS; | 209 scheme text not null references tagclass_names (name), |
162 | 210 value text not null, |
163 CREATE VIEW priority_classification_text AS | 211 ordering int not null, |
164 SELECT "Priority" as scheme, 400 as order, pr, priority as value | 212 obsolete boolean not null, |
165 FROM has_priorities | 213 description text not null, |
166 ; | 214 primary key (scheme, value) |
167 | 215 ) |
168 -------------------------------------------------------------- | 216 WITHOUT OIDS; |
169 | 217 |
170 -- | 218 -- classification of PRs according to the tag schemes |
171 -- The collected view of all classifications. | 219 -- (each PR/scheme pair is listed once for each tag attached to it) |
172 -- | 220 CREATE TABLE tagclass_data ( |
173 -- (This does not include tags, as tags should be presented | 221 pr bigint not null references PRs (id), |
174 -- differently.) | 222 scheme text not null references tagclass_names (name), |
175 -- | 223 value text , |
176 CREATE VIEW classification_text AS | 224 -- as above |
177 SELECT * from category_classification_text | 225 constraint (scheme, value) references tagclass_values (scheme, value) |
178 UNION SELECT * from class_classification_text | 226 ) |
179 UNION SELECT * from severity_classification_text | 227 WITHOUT OIDS; |
180 UNION SELECT * from priority_classification_text | 228 |
181 ; | 229 -- view where tags are collected into a string |
182 | 230 -- XXX does concat(value) work, and if not, is there any way to do this? |
231 CREATE VIEW tagclass_stringdata AS | |
232 SELECT pr, scheme, concat(value) as value | |
233 FROM tagclass_data | |
234 GROUP BY pr, scheme | |
235 ; | |
236 | |
237 ------------------------------------------------------------ | |
238 -- some views | |
239 | |
240 -- views that pull in the ordering fields. | |
241 -- | |
242 -- (If you use only one of the ordering fields, as is likely to be | |
243 -- the case in practice, I think we can rely on the query optimizer | |
244 -- to drop the join that collects in the other ordering field. If | |
245 -- not, maybe this should be restructured.) | |
246 -- | |
247 -- XXX what do these do when _data.value is null? | |
248 | |
249 CREATE VIEW hierclass_data_ordered AS | |
250 SELECT hierclass_data.pr as pr, | |
251 hierclass_data.scheme as scheme, | |
252 hierclass_names.ordering as schemeordering, | |
253 hierclass_data.value as value, | |
254 hierclass_values.ordering as valueordering, | |
255 FROM hierclass_data, hierclass_names, hierclass_values | |
256 WHERE | |
257 hierclass_data.scheme = hierclass_names.name | |
258 AND hierclass_data.scheme = hierclass_values.scheme | |
259 AND hierclass_data.value = hierclass_values.value | |
260 ; | |
261 | |
262 CREATE VIEW flatclass_data_ordered AS | |
263 SELECT flatclass_data.pr as pr, | |
264 flatclass_data.scheme as scheme, | |
265 flatclass_names.ordering as schemeordering, | |
266 flatclass_data.value as value, | |
267 flatclass_values.ordering as valueordering, | |
268 FROM flatclass_data, flatclass_names, flatclass_values | |
269 WHERE | |
270 flatclass_data.scheme = flatclass_names.name | |
271 AND flatclass_data.scheme = flatclass_values.scheme | |
272 AND flatclass_data.value = flatclass_values.value | |
273 ; | |
274 | |
275 -- | |
276 -- XXX what I'd like to do here is produce textclass_data.value as | |
277 -- valueordering; then sorting by valueordering would produce the | |
278 -- desired results (the entries sorted by text order of the values). | |
279 -- That produces a type conflict though if we combine this view | |
280 -- with the other ordered views where the ordering is a number. | |
281 -- | |
282 -- Next best would be to order by textclass_data.value in this view | |
283 -- and produce the row number of the result (or some other fresh | |
284 -- sequence) as valueordering. However, I don't think that's possible. | |
285 -- | |
286 -- What's here (using a fixed value of 1) will run but it's not | |
287 -- particularly desirable. | |
288 -- | |
289 CREATE VIEW textclass_data_ordered AS | |
290 SELECT textclass_data.pr as pr, | |
291 textclass_data.scheme as scheme, | |
292 textclass_names.ordering as schemeordering, | |
293 textclass_data.value as value, | |
294 1 as valueordering, | |
295 FROM textclass_data, textclass_names | |
296 WHERE | |
297 textclass_data.scheme = textclass_names.name | |
298 ; | |
299 | |
300 CREATE VIEW tagclass_data_ordered AS | |
301 SELECT tagclass_data.pr as pr, | |
302 tagclass_data.scheme as scheme, | |
303 tagclass_names.ordering as schemeordering, | |
304 tagclass_data.value as value, | |
305 tagclass_values.ordering as valueordering, | |
306 FROM tagclass_data, tagclass_names, tagclass_values | |
307 WHERE | |
308 tagclass_data.scheme = tagclass_names.name | |
309 AND tagclass_data.scheme = tagclass_values.scheme | |
310 AND tagclass_data.value = tagclass_values.value | |
311 ; | |
312 | |
313 -- ordered version of tagclass_stringdata | |
314 -- XXX does concat(value) work, and if not, is there any way to do this? | |
315 CREATE VIEW tagclass_stringdata_ordered AS | |
316 SELECT pr, scheme, schemeordering, | |
317 concat(value) as value, first(valueordering) as valueordering | |
318 FROM tagclass_data | |
319 ORDER BY valueordering | |
320 GROUP BY pr, scheme | |
321 ; | |
322 | |
323 | |
324 -- a view that combines all the classification data. | |
325 | |
326 CREATE VIEW classifications AS | |
327 SELECT * FROM hierclass_data_ordered | |
328 UNION SELECT * FROM flatclass_data_ordered | |
329 UNION SELECT * FROM textclass_data_ordered | |
330 UNION SELECT * FROM tagclass_stringdata_ordered | |
331 ORDER BY schemeordering | |
332 ; | |
333 |