comparison database/schema/classify.sql @ 11:d42c0db81e28

Update the classification stuff. There are still some rough edges but it should be mostly workable now.
author David A. Holland
date Thu, 15 Mar 2012 04:21:20 -0400
parents 68cc276ac118
children ca4679ac4e02
comparison
equal deleted inserted replaced
10:1720f45dd495 11:d42c0db81e28
7 -- taxonomy, we allow multiple arbitrary classification schemes, 7 -- taxonomy, we allow multiple arbitrary classification schemes,
8 -- all of which are created equal. (Administratively, some will 8 -- all of which are created equal. (Administratively, some will
9 -- be more equal than others, but that does not concern the 9 -- be more equal than others, but that does not concern the
10 -- database.) 10 -- database.)
11 -- 11 --
12 -- The classification schemes/taxonomies that already exist or are 12 -- There are several classification schemes/taxonomies that already
13 -- projected to be wanted, as of this writing, are: 13 -- exist or have been invented but not yet deployed. These fall into
14 -- 14 -- the following categories:
15 -- 1. The old GNATS system of categories. A PR has one category, which 15 --
16 -- is taken from a fixed list of keywords, currently these: 16 -- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy
17 -- bin install kern lib misc pkg port-* security standards 17 -- is the one dholland uses in the out-of-GNATS PR lists, which
18 -- toolchain xsrc y2k 18 -- classifies PRs by location in the system. A second projected one is
19 -- Also the old "class" field. 19 -- classification by consequences (crashes, hangs, etc.) The possible
20 -- 20 -- classifications are defined in advance and stored in the database.
21 -- 2. The old GNATS severity and priority fields, which are still 21 --
22 -- useful despite being broad. 22 -- 2. Flat taxonomy, that is, a selection from an enumeration of
23 -- 23 -- values. Existing schemes of this type include:
24 -- 3. The nearest man page to where the problem appears to be. This is 24 --
25 -- FreeBSD's idea and seems to work fairly well for many things. 25 -- * the old GNATS scheme of categories (bin, kern, lib, etc.)
26 -- 26 -- * the old GNATS scheme of classes (sw-bug, doc-bug, etc.)
27 -- 4. The taxonomy dholland uses in the out-of-gnats PR lists, which 27 -- * the old GNATS severity field (critical, serious, non-critical)
28 -- is related to the previous (it uses man pages for drivers and 28 -- * the old GNATS priority field (high, medium, low)
29 -- programs) but is three-layer hierarchical. 29 --
30 -- 30 -- For these the possible values are defined in advance and stored in
31 -- 5. Assorted flat tags (e.g. PRs releng considers critical for the
32 -- 6.0 release), or equivalently, arbitrary hand-maintained lists.
33 -- It is anticipated that every developer will be able to make their
34 -- own lists.
35 --
36 -- 6. Properties derivable from other data that behave like flat tags
37 -- (e.g. "PRs in feedback more than 6 months") - these should not be
38 -- stored as tags in the database but it may be desirable to be able
39 -- to present them the same way as tags.
40 --
41 -- 7. Other hierarchical schemes that have yet to be invented.
42 --
43 -- Some of these classification schemes have additional special
44 -- semantics.
45 --
46 -- It is furthermore assumed that new classification schemes will be
47 -- added on the fly, not frequently, but frequently enough that the
48 -- list of schemes should not be hard-coded into programs that access
49 -- the database. 31 -- the database.
50 -- 32 --
51 -- This creates something of a problem; encoding hierarchical data in 33 -- 3. String value. Most of the time this is really a flat taxonomy
52 -- SQL is fun enough when the hierarchy is fixed. We need to be able 34 -- where the number of possible values is large or changes dynamically
53 -- to handle hierarchical data whose layout hasn't been invented yet, 35 -- with external circumstances, or both, such that storing a copy of
54 -- and that's really pushing it. 36 -- the legal values in the database would create a maintenanc hassle.
55 -- 37 -- There are none of these at present; the only projected one at the
56 -- Arbitrary flat tags we can do. For the rest, I think we need to set 38 -- moment is FreeBSD's idea of "the nearest man page to where the
57 -- up a system that allows, essentially, extensions to the schema, 39 -- problem appears to be".
58 -- which can be arbitrary SQL, and which provide views that allow 40 --
59 -- nonspecialized client software to read (and maybe update, but maybe 41 -- 4. Systems of tags, that is, zero or more selections from a list
60 -- not) the available material in a standardized way. 42 -- of possible values. Current schemes of this type include the
61 -- 43 -- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch
62 -- We will then have to have an additional set of views that take the 44 -- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of
63 -- union of all the extensions; these views will need to be recreated 45 -- these should be separate flat taxonomy schemes rather than tags,
64 -- when a new classification scheme is added. 46 -- and some of them should maybe be states rather than tags too.
65 -- 47 -- (And, perhaps some of the current states should be tags...)
66 48 -- In addition we want to allow every developer to have their own
67 -- Tags. 49 -- private tags that aren't exposed. And, we might want to have
68 CREATE TABLE tags ( 50 -- project-only tags that other developers can see but users can't.
51 -- (It remains to be seen if that's really a good idea.)
52 --
53 -- Note also that a list of PRs is equivalent to a tag on those PRs.
54 --
55 -- It is also possible that there may be additional classifications
56 -- (possibly of any of the above types, but most likely tags) that are
57 -- views rather than data, that is, derived from other information.
58 -- One example of this is "PRs in feedback more than 6 months".
59 --
60 --
61 -- We assume that new classification schemes will be added on the fly,
62 -- not frequently, but frequently enough that the list of schemes
63 -- should not be hard-coded into programs that access the database,
64 -- and we don't want to have to create a new table for each new scheme
65 -- either.
66 --
67 -- Therefore, the way I'll do this is to create one table (or rather,
68 -- one family of tables) for each type of classification scheme, and
69 -- make them able to handle arbitrary instances.
70 --
71 -- For hierarchical schemes there is no point trying to encode the
72 -- hierarchical structure in SQL; that is a waste of time. Instead
73 -- we'll expand the tree of allowed values and rely on the access
74 -- software to present the schemes sensibly.
75 --
76 -- It remains unclear what views ought to be defined.
77 --
78 --
79 -- Notes on the representations:
80 -- * a scheme is "total" if it should be defined on all PRs
81 -- * if the data contains a null value for a particular scheme,
82 -- that means "not specified yet".
83 -- * the hierarchical and flat schemes are no different schema-
84 -- wise but are separate to allow different handling in software.
85 -- * the ordering field in the values tables should be used so that
86 -- ORDER BY ordering ASCENDING produces the desired output order
87 -- of the legal values.
88 -- * the obsolete field for values is for entries that are allowed
89 -- to still exist in the database but that should not be used
90 -- with new PRs or for new classifications of old PRs.
91 -- * the ordering field for the classifications themselves should
92 -- be used so that ORDER BY ordering ASCENDING produces the
93 -- desired output order of the classification schemes. The
94 -- numbers are global across classification scheme types so
95 -- schemes of the same type do not need to be sorted together.
96 --
97 --
98
99 ------------------------------------------------------------
100 -- Hierarchical schemes.
101
102 -- names and properties of the hierarchical schemes
103 CREATE TABLE hierclass_names (
69 name text primary key, 104 name text primary key,
70 confidential boolean not null 105 ordering int not null,
71 -- XXX should have an owner 106 total boolean not null,
72 ) 107 description text not null
73 WITHOUT OIDS; 108 )
74 109 WITHOUT OIDS;
75 CREATE TABLE has_tags ( 110
111 -- allowed values of the hierarchical schemes
112 CREATE TABLE hierclass_values (
113 scheme text not null references hierclass_names (name),
114 value text not null,
115 ordering int not null,
116 obsolete boolean not null,
117 description text not null,
118 primary key (scheme, value)
119 )
120 WITHOUT OIDS;
121
122 -- classification of PRs according to the hierarchical schemes
123 CREATE TABLE hierclass_data (
76 pr bigint not null references PRs (id), 124 pr bigint not null references PRs (id),
77 tag text not null references tags (name), 125 scheme text not null references hierclass_names (name),
78 ) 126 value text ,
79 WITHOUT OIDS; 127 primary key (pr, scheme),
80 128 --
81 -- Classifications. 129 -- what I want here is for it to reference hierclass_values unless
82 -- 130 -- the value is null. or should the values table include a row
83 -- Each classification scheme should define whatever tables and 131 -- with null value where that's allowed? I forget how null values
84 -- goop it needs. It should also define a text view, as follows 132 -- interoperate with reference constraints except that it's messy.
85 -- assuming the scheme is called "foo": 133 -- XXX. (if changing this change the similar logic below as well)
86 -- 134 --
87 -- . The name of the view should be "foo_classification_text". 135 constraint (scheme, value) references hierclass_values (scheme, value)
88 -- . The view should produce tuples (scheme, order, pr, value). 136 )
89 -- . The types should be (text, int, bigint, text). 137 WITHOUT OIDS;
90 -- . The "scheme" column should be the scheme name, suitable 138
91 -- for display, e.g. "Foo". 139 ------------------------------------------------------------
92 -- . The "order" column should contain the canonical display 140 -- Enumerated flat schemes.
93 -- order number for this scheme. Used with ORDER BY ASCENDING 141
94 -- to get things to appear in a consistent order. 142 -- names and properties of the flat schemes
95 -- . The "pr" column should have pr numbers. 143 CREATE TABLE flatclass_names (
96 -- . The "value" column should have a text representation of the 144 name text primary key,
97 -- value. 145 ordering int not null,
98 -- 146 total boolean not null,
99 -- Choose the canonical display order number for a new scheme based on 147 description text not null
100 -- the existing values in use. See below for the numbers used by the 148 )
101 -- built-in schemes. 149 WITHOUT OIDS;
102 -- 150
103 -- Ideally it should be possible to update the value through the text 151 -- allowed values of the flat schemes
104 -- view. Ideally it would also be possible to insert through it, but 152 CREATE TABLE flatclass_values (
105 -- that will never work... have to think about how to handle that 153 scheme text not null references flatclass_names (name),
106 -- best. (XXX?) 154 value text not null,
107 -- 155 ordering int not null,
108 156 obsolete boolean not null,
109 -------------------------------------------------------------- 157 description text not null,
110 158 primary key (scheme, value)
111 -- The "category" classification. 159 )
112 -- Each PR has one category that must be taken from the list in the 160 WITHOUT OIDS;
113 -- categories table. The category is used to select the default 161
114 -- responsible entity for incoming PRs. 162 -- classification of PRs according to the flat schemes
115 CREATE TABLE has_categories ( 163 CREATE TABLE flatclass_data (
116 pr bigint unique not null references PRs (id), 164 pr bigint not null references PRs (id),
117 category text not null references categories (name) 165 scheme text not null references flatclass_names (name),
118 ) 166 value text ,
119 WITHOUT OIDS; 167 primary key (pr, scheme),
120 168 -- as above
121 CREATE VIEW category_classification_text AS 169 constraint (scheme, value) references flatclass_values (scheme, value)
122 SELECT "Category" as scheme, 100 as order, pr, category as value 170 )
123 FROM has_categories 171 WITHOUT OIDS;
124 ; 172
125 173 ------------------------------------------------------------
126 -- The "class" classification. 174 -- Text schemes.
127 -- Each PR has one class that must be taken from the list in the 175
128 -- classes table. 176 -- names and properties of the text schemes
129 CREATE TABLE has_classes ( 177 CREATE TABLE textclass_names (
130 pr bigint unique not null references PRs (id), 178 name text primary key,
131 class text not null references classes (name) 179 ordering int not null,
132 ) 180 total boolean not null,
133 WITHOUT OIDS; 181 description text not null
134 182 )
135 CREATE VIEW class_classification_text AS 183 WITHOUT OIDS;
136 SELECT "Class" as scheme, 200 as order, pr, class as value 184
137 FROM has_classes 185 -- classification of PRs according to the text schemes
138 ; 186 CREATE TABLE textclass_data (
139 187 pr bigint not null references PRs (id),
140 -- The "severity" classification. 188 scheme text not null references textclass_names (name),
141 -- Each PR has one severity that must be taken from the list in the 189 value text ,
142 -- severities table. 190 primary key (pr, scheme),
143 CREATE TABLE has_severities ( 191 )
144 pr bigint unique not null references PRs (id), 192 WITHOUT OIDS;
145 severity text not null references severities (name) 193
146 ) 194 ------------------------------------------------------------
147 WITHOUT OIDS; 195 -- Tag schemes.
148 196
149 CREATE VIEW severity_classification_text AS 197 -- names and properties of the tag schemes
150 SELECT "Severity" as scheme, 300 as order, pr, severity as value 198 -- (total does not make sense here)
151 FROM has_severities 199 CREATE TABLE tagclass_names (
152 ; 200 name text primary key,
153 201 ordering int not null,
154 -- The "priority" classification. 202 description text not null
155 -- Each PR has one priority that must be taken from the list in the 203 )
156 -- priorities table. 204 WITHOUT OIDS;
157 CREATE TABLE has_priorities ( 205
158 pr bigint unique not null references PRs (id), 206 -- allowed values of the tag schemes
159 priority text not null references priorities (name) 207 -- (each PR can reference zero or more of these)
160 ) 208 CREATE TABLE tagclass_values (
161 WITHOUT OIDS; 209 scheme text not null references tagclass_names (name),
162 210 value text not null,
163 CREATE VIEW priority_classification_text AS 211 ordering int not null,
164 SELECT "Priority" as scheme, 400 as order, pr, priority as value 212 obsolete boolean not null,
165 FROM has_priorities 213 description text not null,
166 ; 214 primary key (scheme, value)
167 215 )
168 -------------------------------------------------------------- 216 WITHOUT OIDS;
169 217
170 -- 218 -- classification of PRs according to the tag schemes
171 -- The collected view of all classifications. 219 -- (each PR/scheme pair is listed once for each tag attached to it)
172 -- 220 CREATE TABLE tagclass_data (
173 -- (This does not include tags, as tags should be presented 221 pr bigint not null references PRs (id),
174 -- differently.) 222 scheme text not null references tagclass_names (name),
175 -- 223 value text ,
176 CREATE VIEW classification_text AS 224 -- as above
177 SELECT * from category_classification_text 225 constraint (scheme, value) references tagclass_values (scheme, value)
178 UNION SELECT * from class_classification_text 226 )
179 UNION SELECT * from severity_classification_text 227 WITHOUT OIDS;
180 UNION SELECT * from priority_classification_text 228
181 ; 229 -- view where tags are collected into a string
182 230 -- XXX does concat(value) work, and if not, is there any way to do this?
231 CREATE VIEW tagclass_stringdata AS
232 SELECT pr, scheme, concat(value) as value
233 FROM tagclass_data
234 GROUP BY pr, scheme
235 ;
236
237 ------------------------------------------------------------
238 -- some views
239
240 -- views that pull in the ordering fields.
241 --
242 -- (If you use only one of the ordering fields, as is likely to be
243 -- the case in practice, I think we can rely on the query optimizer
244 -- to drop the join that collects in the other ordering field. If
245 -- not, maybe this should be restructured.)
246 --
247 -- XXX what do these do when _data.value is null?
248
249 CREATE VIEW hierclass_data_ordered AS
250 SELECT hierclass_data.pr as pr,
251 hierclass_data.scheme as scheme,
252 hierclass_names.ordering as schemeordering,
253 hierclass_data.value as value,
254 hierclass_values.ordering as valueordering,
255 FROM hierclass_data, hierclass_names, hierclass_values
256 WHERE
257 hierclass_data.scheme = hierclass_names.name
258 AND hierclass_data.scheme = hierclass_values.scheme
259 AND hierclass_data.value = hierclass_values.value
260 ;
261
262 CREATE VIEW flatclass_data_ordered AS
263 SELECT flatclass_data.pr as pr,
264 flatclass_data.scheme as scheme,
265 flatclass_names.ordering as schemeordering,
266 flatclass_data.value as value,
267 flatclass_values.ordering as valueordering,
268 FROM flatclass_data, flatclass_names, flatclass_values
269 WHERE
270 flatclass_data.scheme = flatclass_names.name
271 AND flatclass_data.scheme = flatclass_values.scheme
272 AND flatclass_data.value = flatclass_values.value
273 ;
274
275 --
276 -- XXX what I'd like to do here is produce textclass_data.value as
277 -- valueordering; then sorting by valueordering would produce the
278 -- desired results (the entries sorted by text order of the values).
279 -- That produces a type conflict though if we combine this view
280 -- with the other ordered views where the ordering is a number.
281 --
282 -- Next best would be to order by textclass_data.value in this view
283 -- and produce the row number of the result (or some other fresh
284 -- sequence) as valueordering. However, I don't think that's possible.
285 --
286 -- What's here (using a fixed value of 1) will run but it's not
287 -- particularly desirable.
288 --
289 CREATE VIEW textclass_data_ordered AS
290 SELECT textclass_data.pr as pr,
291 textclass_data.scheme as scheme,
292 textclass_names.ordering as schemeordering,
293 textclass_data.value as value,
294 1 as valueordering,
295 FROM textclass_data, textclass_names
296 WHERE
297 textclass_data.scheme = textclass_names.name
298 ;
299
300 CREATE VIEW tagclass_data_ordered AS
301 SELECT tagclass_data.pr as pr,
302 tagclass_data.scheme as scheme,
303 tagclass_names.ordering as schemeordering,
304 tagclass_data.value as value,
305 tagclass_values.ordering as valueordering,
306 FROM tagclass_data, tagclass_names, tagclass_values
307 WHERE
308 tagclass_data.scheme = tagclass_names.name
309 AND tagclass_data.scheme = tagclass_values.scheme
310 AND tagclass_data.value = tagclass_values.value
311 ;
312
313 -- ordered version of tagclass_stringdata
314 -- XXX does concat(value) work, and if not, is there any way to do this?
315 CREATE VIEW tagclass_stringdata_ordered AS
316 SELECT pr, scheme, schemeordering,
317 concat(value) as value, first(valueordering) as valueordering
318 FROM tagclass_data
319 ORDER BY valueordering
320 GROUP BY pr, scheme
321 ;
322
323
324 -- a view that combines all the classification data.
325
326 CREATE VIEW classifications AS
327 SELECT * FROM hierclass_data_ordered
328 UNION SELECT * FROM flatclass_data_ordered
329 UNION SELECT * FROM textclass_data_ordered
330 UNION SELECT * FROM tagclass_stringdata_ordered
331 ORDER BY schemeordering
332 ;
333