Mercurial > ~dholland > hg > swallowtail > index.cgi
annotate database/schema/classify.sql @ 14:dfd62aad74f4
note that the test environment hasn't been merged into this tree yet.
(from last Feb.)
author | David A. Holland |
---|---|
date | Mon, 03 Sep 2012 13:41:57 -0400 |
parents | d42c0db81e28 |
children | ca4679ac4e02 |
rev | line source |
---|---|
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
1 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
2 -- Classification. |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
3 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
4 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
5 -- Sorting and classifying bug reports usefully is a nontrivial |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
6 -- problem. For this reason, rather than bake in any particular |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
7 -- taxonomy, we allow multiple arbitrary classification schemes, |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
8 -- all of which are created equal. (Administratively, some will |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
9 -- be more equal than others, but that does not concern the |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
10 -- database.) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
11 -- |
11 | 12 -- There are several classification schemes/taxonomies that already |
13 -- exist or have been invented but not yet deployed. These fall into | |
14 -- the following categories: | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
15 -- |
11 | 16 -- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy |
17 -- is the one dholland uses in the out-of-GNATS PR lists, which | |
18 -- classifies PRs by location in the system. A second projected one is | |
19 -- classification by consequences (crashes, hangs, etc.) The possible | |
20 -- classifications are defined in advance and stored in the database. | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
21 -- |
11 | 22 -- 2. Flat taxonomy, that is, a selection from an enumeration of |
23 -- values. Existing schemes of this type include: | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
24 -- |
11 | 25 -- * the old GNATS scheme of categories (bin, kern, lib, etc.) |
26 -- * the old GNATS scheme of classes (sw-bug, doc-bug, etc.) | |
27 -- * the old GNATS severity field (critical, serious, non-critical) | |
28 -- * the old GNATS priority field (high, medium, low) | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
29 -- |
11 | 30 -- For these the possible values are defined in advance and stored in |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
31 -- the database. |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
32 -- |
11 | 33 -- 3. String value. Most of the time this is really a flat taxonomy |
34 -- where the number of possible values is large or changes dynamically | |
35 -- with external circumstances, or both, such that storing a copy of | |
36 -- the legal values in the database would create a maintenanc hassle. | |
37 -- There are none of these at present; the only projected one at the | |
38 -- moment is FreeBSD's idea of "the nearest man page to where the | |
39 -- problem appears to be". | |
40 -- | |
41 -- 4. Systems of tags, that is, zero or more selections from a list | |
42 -- of possible values. Current schemes of this type include the | |
43 -- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch | |
44 -- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of | |
45 -- these should be separate flat taxonomy schemes rather than tags, | |
46 -- and some of them should maybe be states rather than tags too. | |
47 -- (And, perhaps some of the current states should be tags...) | |
48 -- In addition we want to allow every developer to have their own | |
49 -- private tags that aren't exposed. And, we might want to have | |
50 -- project-only tags that other developers can see but users can't. | |
51 -- (It remains to be seen if that's really a good idea.) | |
52 -- | |
53 -- Note also that a list of PRs is equivalent to a tag on those PRs. | |
54 -- | |
55 -- It is also possible that there may be additional classifications | |
56 -- (possibly of any of the above types, but most likely tags) that are | |
57 -- views rather than data, that is, derived from other information. | |
58 -- One example of this is "PRs in feedback more than 6 months". | |
59 -- | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
60 -- |
11 | 61 -- We assume that new classification schemes will be added on the fly, |
62 -- not frequently, but frequently enough that the list of schemes | |
63 -- should not be hard-coded into programs that access the database, | |
64 -- and we don't want to have to create a new table for each new scheme | |
65 -- either. | |
66 -- | |
67 -- Therefore, the way I'll do this is to create one table (or rather, | |
68 -- one family of tables) for each type of classification scheme, and | |
69 -- make them able to handle arbitrary instances. | |
70 -- | |
71 -- For hierarchical schemes there is no point trying to encode the | |
72 -- hierarchical structure in SQL; that is a waste of time. Instead | |
73 -- we'll expand the tree of allowed values and rely on the access | |
74 -- software to present the schemes sensibly. | |
75 -- | |
76 -- It remains unclear what views ought to be defined. | |
77 -- | |
78 -- | |
79 -- Notes on the representations: | |
80 -- * a scheme is "total" if it should be defined on all PRs | |
81 -- * if the data contains a null value for a particular scheme, | |
82 -- that means "not specified yet". | |
83 -- * the hierarchical and flat schemes are no different schema- | |
84 -- wise but are separate to allow different handling in software. | |
85 -- * the ordering field in the values tables should be used so that | |
86 -- ORDER BY ordering ASCENDING produces the desired output order | |
87 -- of the legal values. | |
88 -- * the obsolete field for values is for entries that are allowed | |
89 -- to still exist in the database but that should not be used | |
90 -- with new PRs or for new classifications of old PRs. | |
91 -- * the ordering field for the classifications themselves should | |
92 -- be used so that ORDER BY ordering ASCENDING produces the | |
93 -- desired output order of the classification schemes. The | |
94 -- numbers are global across classification scheme types so | |
95 -- schemes of the same type do not need to be sorted together. | |
96 -- | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
97 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
98 |
11 | 99 ------------------------------------------------------------ |
100 -- Hierarchical schemes. | |
101 | |
102 -- names and properties of the hierarchical schemes | |
103 CREATE TABLE hierclass_names ( | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
104 name text primary key, |
11 | 105 ordering int not null, |
106 total boolean not null, | |
107 description text not null | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
108 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
109 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
110 |
11 | 111 -- allowed values of the hierarchical schemes |
112 CREATE TABLE hierclass_values ( | |
113 scheme text not null references hierclass_names (name), | |
114 value text not null, | |
115 ordering int not null, | |
116 obsolete boolean not null, | |
117 description text not null, | |
118 primary key (scheme, value) | |
119 ) | |
120 WITHOUT OIDS; | |
121 | |
122 -- classification of PRs according to the hierarchical schemes | |
123 CREATE TABLE hierclass_data ( | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
124 pr bigint not null references PRs (id), |
11 | 125 scheme text not null references hierclass_names (name), |
126 value text , | |
127 primary key (pr, scheme), | |
128 -- | |
129 -- what I want here is for it to reference hierclass_values unless | |
130 -- the value is null. or should the values table include a row | |
131 -- with null value where that's allowed? I forget how null values | |
132 -- interoperate with reference constraints except that it's messy. | |
133 -- XXX. (if changing this change the similar logic below as well) | |
134 -- | |
135 constraint (scheme, value) references hierclass_values (scheme, value) | |
136 ) | |
137 WITHOUT OIDS; | |
138 | |
139 ------------------------------------------------------------ | |
140 -- Enumerated flat schemes. | |
141 | |
142 -- names and properties of the flat schemes | |
143 CREATE TABLE flatclass_names ( | |
144 name text primary key, | |
145 ordering int not null, | |
146 total boolean not null, | |
147 description text not null | |
148 ) | |
149 WITHOUT OIDS; | |
150 | |
151 -- allowed values of the flat schemes | |
152 CREATE TABLE flatclass_values ( | |
153 scheme text not null references flatclass_names (name), | |
154 value text not null, | |
155 ordering int not null, | |
156 obsolete boolean not null, | |
157 description text not null, | |
158 primary key (scheme, value) | |
159 ) | |
160 WITHOUT OIDS; | |
161 | |
162 -- classification of PRs according to the flat schemes | |
163 CREATE TABLE flatclass_data ( | |
164 pr bigint not null references PRs (id), | |
165 scheme text not null references flatclass_names (name), | |
166 value text , | |
167 primary key (pr, scheme), | |
168 -- as above | |
169 constraint (scheme, value) references flatclass_values (scheme, value) | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
170 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
171 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
172 |
11 | 173 ------------------------------------------------------------ |
174 -- Text schemes. | |
175 | |
176 -- names and properties of the text schemes | |
177 CREATE TABLE textclass_names ( | |
178 name text primary key, | |
179 ordering int not null, | |
180 total boolean not null, | |
181 description text not null | |
182 ) | |
183 WITHOUT OIDS; | |
184 | |
185 -- classification of PRs according to the text schemes | |
186 CREATE TABLE textclass_data ( | |
187 pr bigint not null references PRs (id), | |
188 scheme text not null references textclass_names (name), | |
189 value text , | |
190 primary key (pr, scheme), | |
191 ) | |
192 WITHOUT OIDS; | |
193 | |
194 ------------------------------------------------------------ | |
195 -- Tag schemes. | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
196 |
11 | 197 -- names and properties of the tag schemes |
198 -- (total does not make sense here) | |
199 CREATE TABLE tagclass_names ( | |
200 name text primary key, | |
201 ordering int not null, | |
202 description text not null | |
203 ) | |
204 WITHOUT OIDS; | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
205 |
11 | 206 -- allowed values of the tag schemes |
207 -- (each PR can reference zero or more of these) | |
208 CREATE TABLE tagclass_values ( | |
209 scheme text not null references tagclass_names (name), | |
210 value text not null, | |
211 ordering int not null, | |
212 obsolete boolean not null, | |
213 description text not null, | |
214 primary key (scheme, value) | |
215 ) | |
216 WITHOUT OIDS; | |
217 | |
218 -- classification of PRs according to the tag schemes | |
219 -- (each PR/scheme pair is listed once for each tag attached to it) | |
220 CREATE TABLE tagclass_data ( | |
221 pr bigint not null references PRs (id), | |
222 scheme text not null references tagclass_names (name), | |
223 value text , | |
224 -- as above | |
225 constraint (scheme, value) references tagclass_values (scheme, value) | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
226 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
227 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
228 |
11 | 229 -- view where tags are collected into a string |
230 -- XXX does concat(value) work, and if not, is there any way to do this? | |
231 CREATE VIEW tagclass_stringdata AS | |
232 SELECT pr, scheme, concat(value) as value | |
233 FROM tagclass_data | |
234 GROUP BY pr, scheme | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
235 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
236 |
11 | 237 ------------------------------------------------------------ |
238 -- some views | |
239 | |
240 -- views that pull in the ordering fields. | |
241 -- | |
242 -- (If you use only one of the ordering fields, as is likely to be | |
243 -- the case in practice, I think we can rely on the query optimizer | |
244 -- to drop the join that collects in the other ordering field. If | |
245 -- not, maybe this should be restructured.) | |
246 -- | |
247 -- XXX what do these do when _data.value is null? | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
248 |
11 | 249 CREATE VIEW hierclass_data_ordered AS |
250 SELECT hierclass_data.pr as pr, | |
251 hierclass_data.scheme as scheme, | |
252 hierclass_names.ordering as schemeordering, | |
253 hierclass_data.value as value, | |
254 hierclass_values.ordering as valueordering, | |
255 FROM hierclass_data, hierclass_names, hierclass_values | |
256 WHERE | |
257 hierclass_data.scheme = hierclass_names.name | |
258 AND hierclass_data.scheme = hierclass_values.scheme | |
259 AND hierclass_data.value = hierclass_values.value | |
260 ; | |
261 | |
262 CREATE VIEW flatclass_data_ordered AS | |
263 SELECT flatclass_data.pr as pr, | |
264 flatclass_data.scheme as scheme, | |
265 flatclass_names.ordering as schemeordering, | |
266 flatclass_data.value as value, | |
267 flatclass_values.ordering as valueordering, | |
268 FROM flatclass_data, flatclass_names, flatclass_values | |
269 WHERE | |
270 flatclass_data.scheme = flatclass_names.name | |
271 AND flatclass_data.scheme = flatclass_values.scheme | |
272 AND flatclass_data.value = flatclass_values.value | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
273 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
274 |
11 | 275 -- |
276 -- XXX what I'd like to do here is produce textclass_data.value as | |
277 -- valueordering; then sorting by valueordering would produce the | |
278 -- desired results (the entries sorted by text order of the values). | |
279 -- That produces a type conflict though if we combine this view | |
280 -- with the other ordered views where the ordering is a number. | |
281 -- | |
282 -- Next best would be to order by textclass_data.value in this view | |
283 -- and produce the row number of the result (or some other fresh | |
284 -- sequence) as valueordering. However, I don't think that's possible. | |
285 -- | |
286 -- What's here (using a fixed value of 1) will run but it's not | |
287 -- particularly desirable. | |
288 -- | |
289 CREATE VIEW textclass_data_ordered AS | |
290 SELECT textclass_data.pr as pr, | |
291 textclass_data.scheme as scheme, | |
292 textclass_names.ordering as schemeordering, | |
293 textclass_data.value as value, | |
294 1 as valueordering, | |
295 FROM textclass_data, textclass_names | |
296 WHERE | |
297 textclass_data.scheme = textclass_names.name | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
298 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
299 |
11 | 300 CREATE VIEW tagclass_data_ordered AS |
301 SELECT tagclass_data.pr as pr, | |
302 tagclass_data.scheme as scheme, | |
303 tagclass_names.ordering as schemeordering, | |
304 tagclass_data.value as value, | |
305 tagclass_values.ordering as valueordering, | |
306 FROM tagclass_data, tagclass_names, tagclass_values | |
307 WHERE | |
308 tagclass_data.scheme = tagclass_names.name | |
309 AND tagclass_data.scheme = tagclass_values.scheme | |
310 AND tagclass_data.value = tagclass_values.value | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
311 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
312 |
11 | 313 -- ordered version of tagclass_stringdata |
314 -- XXX does concat(value) work, and if not, is there any way to do this? | |
315 CREATE VIEW tagclass_stringdata_ordered AS | |
316 SELECT pr, scheme, schemeordering, | |
317 concat(value) as value, first(valueordering) as valueordering | |
318 FROM tagclass_data | |
319 ORDER BY valueordering | |
320 GROUP BY pr, scheme | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
321 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
322 |
11 | 323 |
324 -- a view that combines all the classification data. | |
325 | |
326 CREATE VIEW classifications AS | |
327 SELECT * FROM hierclass_data_ordered | |
328 UNION SELECT * FROM flatclass_data_ordered | |
329 UNION SELECT * FROM textclass_data_ordered | |
330 UNION SELECT * FROM tagclass_stringdata_ordered | |
331 ORDER BY schemeordering | |
332 ; | |
333 |