Mercurial > ~dholland > hg > swallowtail > index.cgi
annotate database/schema/classify.sql @ 46:73e6dac29391
new stuff (checkpoint when moved between machines)
author | David A. Holland |
---|---|
date | Tue, 12 Aug 2014 21:55:08 -0400 |
parents | cd36b49f4437 |
children | 40f64a96481f |
rev | line source |
---|---|
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
1 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
2 -- Classification. |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
3 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
4 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
5 -- Sorting and classifying bug reports usefully is a nontrivial |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
6 -- problem. For this reason, rather than bake in any particular |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
7 -- taxonomy, we allow multiple arbitrary classification schemes, |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
8 -- all of which are created equal. (Administratively, some will |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
9 -- be more equal than others, but that does not concern the |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
10 -- database.) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
11 -- |
11 | 12 -- There are several classification schemes/taxonomies that already |
13 -- exist or have been invented but not yet deployed. These fall into | |
14 -- the following categories: | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
15 -- |
11 | 16 -- 1. Hierarchical taxonomy. The only existing hierarchical taxonomy |
17 -- is the one dholland uses in the out-of-GNATS PR lists, which | |
18 -- classifies PRs by location in the system. A second projected one is | |
19 -- classification by consequences (crashes, hangs, etc.) The possible | |
20 -- classifications are defined in advance and stored in the database. | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
21 -- |
11 | 22 -- 2. Flat taxonomy, that is, a selection from an enumeration of |
23 -- values. Existing schemes of this type include: | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
24 -- |
11 | 25 -- * the old GNATS scheme of categories (bin, kern, lib, etc.) |
26 -- * the old GNATS scheme of classes (sw-bug, doc-bug, etc.) | |
27 -- * the old GNATS severity field (critical, serious, non-critical) | |
28 -- * the old GNATS priority field (high, medium, low) | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
29 -- |
11 | 30 -- For these the possible values are defined in advance and stored in |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
31 -- the database. |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
32 -- |
11 | 33 -- 3. String value. Most of the time this is really a flat taxonomy |
34 -- where the number of possible values is large or changes dynamically | |
35 -- with external circumstances, or both, such that storing a copy of | |
23 | 36 -- the legal values in the database would create a maintenance hassle. |
11 | 37 -- There are none of these at present; the only projected one at the |
38 -- moment is FreeBSD's idea of "the nearest man page to where the | |
39 -- problem appears to be". | |
40 -- | |
41 -- 4. Systems of tags, that is, zero or more selections from a list | |
42 -- of possible values. Current schemes of this type include the | |
43 -- [456]-ONLY tags, the 6-CRITICAL/6-IMPORTANT/etc. release branch | |
44 -- tags, the STUCK tag, EASY, PATCH, PULLUPS-NEEDED, etc. Some of | |
45 -- these should be separate flat taxonomy schemes rather than tags, | |
46 -- and some of them should maybe be states rather than tags too. | |
47 -- (And, perhaps some of the current states should be tags...) | |
48 -- In addition we want to allow every developer to have their own | |
49 -- private tags that aren't exposed. And, we might want to have | |
50 -- project-only tags that other developers can see but users can't. | |
51 -- (It remains to be seen if that's really a good idea.) | |
52 -- | |
53 -- Note also that a list of PRs is equivalent to a tag on those PRs. | |
54 -- | |
55 -- It is also possible that there may be additional classifications | |
56 -- (possibly of any of the above types, but most likely tags) that are | |
57 -- views rather than data, that is, derived from other information. | |
58 -- One example of this is "PRs in feedback more than 6 months". | |
59 -- | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
60 -- |
11 | 61 -- We assume that new classification schemes will be added on the fly, |
62 -- not frequently, but frequently enough that the list of schemes | |
63 -- should not be hard-coded into programs that access the database, | |
64 -- and we don't want to have to create a new table for each new scheme | |
65 -- either. | |
66 -- | |
67 -- Therefore, the way I'll do this is to create one table (or rather, | |
68 -- one family of tables) for each type of classification scheme, and | |
69 -- make them able to handle arbitrary instances. | |
70 -- | |
71 -- For hierarchical schemes there is no point trying to encode the | |
72 -- hierarchical structure in SQL; that is a waste of time. Instead | |
73 -- we'll expand the tree of allowed values and rely on the access | |
74 -- software to present the schemes sensibly. | |
75 -- | |
76 -- It remains unclear what views ought to be defined. | |
77 -- | |
78 -- | |
79 -- Notes on the representations: | |
80 -- * a scheme is "total" if it should be defined on all PRs | |
81 -- * if the data contains a null value for a particular scheme, | |
82 -- that means "not specified yet". | |
83 -- * the hierarchical and flat schemes are no different schema- | |
84 -- wise but are separate to allow different handling in software. | |
85 -- * the ordering field in the values tables should be used so that | |
86 -- ORDER BY ordering ASCENDING produces the desired output order | |
87 -- of the legal values. | |
88 -- * the obsolete field for values is for entries that are allowed | |
89 -- to still exist in the database but that should not be used | |
90 -- with new PRs or for new classifications of old PRs. | |
91 -- * the ordering field for the classifications themselves should | |
92 -- be used so that ORDER BY ordering ASCENDING produces the | |
93 -- desired output order of the classification schemes. The | |
94 -- numbers are global across classification scheme types so | |
95 -- schemes of the same type do not need to be sorted together. | |
96 -- | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
97 -- |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
98 |
11 | 99 ------------------------------------------------------------ |
100 -- Hierarchical schemes. | |
101 | |
102 -- names and properties of the hierarchical schemes | |
103 CREATE TABLE hierclass_names ( | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
104 name text primary key, |
11 | 105 ordering int not null, |
106 total boolean not null, | |
107 description text not null | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
108 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
109 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
110 |
11 | 111 -- allowed values of the hierarchical schemes |
112 CREATE TABLE hierclass_values ( | |
113 scheme text not null references hierclass_names (name), | |
114 value text not null, | |
115 ordering int not null, | |
116 obsolete boolean not null, | |
117 description text not null, | |
118 primary key (scheme, value) | |
119 ) | |
120 WITHOUT OIDS; | |
121 | |
122 -- classification of PRs according to the hierarchical schemes | |
123 CREATE TABLE hierclass_data ( | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
124 pr bigint not null references PRs (id), |
11 | 125 scheme text not null references hierclass_names (name), |
126 value text , | |
127 primary key (pr, scheme), | |
128 -- | |
29 | 129 -- Ok, the intended semantics here are: |
130 -- * For PRs that the classification doesn't apply to, there | |
131 -- should be no row at all in the *class_data table. | |
132 -- * For PRs where the classification is undetermined or | |
133 -- unassigned, the value column should be null, and if | |
134 -- that's allowed there should be a null entry in | |
135 -- the *class_values table. | |
136 -- I believe "MATCH SIMPLE" to be capable of enforcing this, | |
137 -- but that's from reading postgres docs and not from actually | |
138 -- checking it. From the description, "MATCH FULL" will do the | |
139 -- wrong thing, and "MATCH PARTIAL" isn't supported. | |
11 | 140 -- |
29 | 141 foreign key (scheme, value) references hierclass_values (scheme, value) |
142 MATCH FULL | |
143 ON DELETE RESTRICT | |
144 ON UPDATE RESTRICT | |
11 | 145 ) |
146 WITHOUT OIDS; | |
147 | |
148 ------------------------------------------------------------ | |
149 -- Enumerated flat schemes. | |
150 | |
151 -- names and properties of the flat schemes | |
152 CREATE TABLE flatclass_names ( | |
153 name text primary key, | |
154 ordering int not null, | |
155 total boolean not null, | |
156 description text not null | |
157 ) | |
158 WITHOUT OIDS; | |
159 | |
160 -- allowed values of the flat schemes | |
161 CREATE TABLE flatclass_values ( | |
162 scheme text not null references flatclass_names (name), | |
163 value text not null, | |
164 ordering int not null, | |
165 obsolete boolean not null, | |
166 description text not null, | |
167 primary key (scheme, value) | |
168 ) | |
169 WITHOUT OIDS; | |
170 | |
171 -- classification of PRs according to the flat schemes | |
172 CREATE TABLE flatclass_data ( | |
173 pr bigint not null references PRs (id), | |
174 scheme text not null references flatclass_names (name), | |
175 value text , | |
176 primary key (pr, scheme), | |
177 -- as above | |
29 | 178 foreign key (scheme, value) references flatclass_values (scheme, value) |
179 MATCH FULL | |
180 ON DELETE RESTRICT | |
181 ON UPDATE RESTRICT | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
182 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
183 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
184 |
11 | 185 ------------------------------------------------------------ |
186 -- Text schemes. | |
187 | |
188 -- names and properties of the text schemes | |
189 CREATE TABLE textclass_names ( | |
190 name text primary key, | |
191 ordering int not null, | |
192 total boolean not null, | |
193 description text not null | |
194 ) | |
195 WITHOUT OIDS; | |
196 | |
197 -- classification of PRs according to the text schemes | |
198 CREATE TABLE textclass_data ( | |
199 pr bigint not null references PRs (id), | |
200 scheme text not null references textclass_names (name), | |
201 value text , | |
29 | 202 primary key (pr, scheme) |
11 | 203 ) |
204 WITHOUT OIDS; | |
205 | |
206 ------------------------------------------------------------ | |
207 -- Tag schemes. | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
208 |
11 | 209 -- names and properties of the tag schemes |
210 -- (total does not make sense here) | |
211 CREATE TABLE tagclass_names ( | |
212 name text primary key, | |
213 ordering int not null, | |
214 description text not null | |
215 ) | |
216 WITHOUT OIDS; | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
217 |
11 | 218 -- allowed values of the tag schemes |
219 -- (each PR can reference zero or more of these) | |
220 CREATE TABLE tagclass_values ( | |
221 scheme text not null references tagclass_names (name), | |
222 value text not null, | |
223 ordering int not null, | |
224 obsolete boolean not null, | |
225 description text not null, | |
226 primary key (scheme, value) | |
227 ) | |
228 WITHOUT OIDS; | |
229 | |
230 -- classification of PRs according to the tag schemes | |
231 -- (each PR/scheme pair is listed once for each tag attached to it) | |
232 CREATE TABLE tagclass_data ( | |
233 pr bigint not null references PRs (id), | |
234 scheme text not null references tagclass_names (name), | |
235 value text , | |
236 -- as above | |
29 | 237 foreign key (scheme, value) references tagclass_values (scheme, value) |
238 MATCH FULL | |
239 ON DELETE RESTRICT | |
240 ON UPDATE RESTRICT | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
241 ) |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
242 WITHOUT OIDS; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
243 |
11 | 244 -- view where tags are collected into a string |
29 | 245 -- string_agg() concatenates the value strings |
11 | 246 CREATE VIEW tagclass_stringdata AS |
29 | 247 SELECT pr, |
248 tagclass_data.scheme as scheme, | |
249 string_agg(tagclass_data.value, ' ' | |
250 ORDER BY tagclass_values.ordering) as value | |
251 FROM tagclass_data, tagclass_values | |
252 WHERE | |
253 tagclass_data.scheme = tagclass_values.scheme | |
254 AND tagclass_data.value = tagclass_values.value | |
255 GROUP BY pr, tagclass_data.scheme | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
256 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
257 |
11 | 258 ------------------------------------------------------------ |
259 -- some views | |
260 | |
261 -- views that pull in the ordering fields. | |
262 -- | |
263 -- (If you use only one of the ordering fields, as is likely to be | |
264 -- the case in practice, I think we can rely on the query optimizer | |
265 -- to drop the join that collects in the other ordering field. If | |
266 -- not, maybe this should be restructured.) | |
267 -- | |
268 -- XXX what do these do when _data.value is null? | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
269 |
11 | 270 CREATE VIEW hierclass_data_ordered AS |
271 SELECT hierclass_data.pr as pr, | |
272 hierclass_data.scheme as scheme, | |
273 hierclass_names.ordering as schemeordering, | |
274 hierclass_data.value as value, | |
29 | 275 hierclass_values.ordering as valueordering |
11 | 276 FROM hierclass_data, hierclass_names, hierclass_values |
277 WHERE | |
278 hierclass_data.scheme = hierclass_names.name | |
279 AND hierclass_data.scheme = hierclass_values.scheme | |
280 AND hierclass_data.value = hierclass_values.value | |
281 ; | |
282 | |
283 CREATE VIEW flatclass_data_ordered AS | |
284 SELECT flatclass_data.pr as pr, | |
285 flatclass_data.scheme as scheme, | |
286 flatclass_names.ordering as schemeordering, | |
287 flatclass_data.value as value, | |
29 | 288 flatclass_values.ordering as valueordering |
11 | 289 FROM flatclass_data, flatclass_names, flatclass_values |
290 WHERE | |
291 flatclass_data.scheme = flatclass_names.name | |
292 AND flatclass_data.scheme = flatclass_values.scheme | |
293 AND flatclass_data.value = flatclass_values.value | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
294 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
295 |
11 | 296 -- |
297 -- XXX what I'd like to do here is produce textclass_data.value as | |
298 -- valueordering; then sorting by valueordering would produce the | |
299 -- desired results (the entries sorted by text order of the values). | |
300 -- That produces a type conflict though if we combine this view | |
301 -- with the other ordered views where the ordering is a number. | |
302 -- | |
303 -- Next best would be to order by textclass_data.value in this view | |
304 -- and produce the row number of the result (or some other fresh | |
305 -- sequence) as valueordering. However, I don't think that's possible. | |
306 -- | |
307 -- What's here (using a fixed value of 1) will run but it's not | |
308 -- particularly desirable. | |
309 -- | |
310 CREATE VIEW textclass_data_ordered AS | |
311 SELECT textclass_data.pr as pr, | |
312 textclass_data.scheme as scheme, | |
313 textclass_names.ordering as schemeordering, | |
314 textclass_data.value as value, | |
29 | 315 1 as valueordering |
11 | 316 FROM textclass_data, textclass_names |
317 WHERE | |
318 textclass_data.scheme = textclass_names.name | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
319 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
320 |
11 | 321 CREATE VIEW tagclass_data_ordered AS |
322 SELECT tagclass_data.pr as pr, | |
323 tagclass_data.scheme as scheme, | |
324 tagclass_names.ordering as schemeordering, | |
325 tagclass_data.value as value, | |
29 | 326 tagclass_values.ordering as valueordering |
11 | 327 FROM tagclass_data, tagclass_names, tagclass_values |
328 WHERE | |
329 tagclass_data.scheme = tagclass_names.name | |
330 AND tagclass_data.scheme = tagclass_values.scheme | |
331 AND tagclass_data.value = tagclass_values.value | |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
332 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
333 |
11 | 334 -- ordered version of tagclass_stringdata |
29 | 335 -- |
336 -- Note that schemeordering is uniquely defined by the group-by | |
337 -- and the min() on it is there to satisfy the cookiemonster. | |
338 -- | |
11 | 339 CREATE VIEW tagclass_stringdata_ordered AS |
29 | 340 SELECT pr, scheme, min(schemeordering), |
341 string_agg(value, ' ' ORDER BY valueordering) as value, | |
342 min(valueordering) as valueordering | |
343 FROM tagclass_data_ordered | |
11 | 344 GROUP BY pr, scheme |
8
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
345 ; |
68cc276ac118
SQL material from old tree, split up for accessibility.
David A. Holland
parents:
diff
changeset
|
346 |
11 | 347 |
348 -- a view that combines all the classification data. | |
349 | |
350 CREATE VIEW classifications AS | |
351 SELECT * FROM hierclass_data_ordered | |
352 UNION SELECT * FROM flatclass_data_ordered | |
353 UNION SELECT * FROM textclass_data_ordered | |
354 UNION SELECT * FROM tagclass_stringdata_ordered | |
355 ORDER BY schemeordering | |
356 ; | |
357 |