8. Conclusions

Various aspects of the IDIOMS machine have been compared with other designs. We have justified both the use of a DSDL and the rejection of an operating system within the IDIOMS implementation. Extensions to the SQL2 information schema have been described, in which partitioning and control information is stored. Proposals for the use of statistics tables have been suggested, which indicate how these could be used to gather information on MIS queries, in order to enhance the partitioning strategy. An example of multi-column partitioning has been given, along with an MIS query.

We have shown how multi-column partitioning based on attribute values can help to reduce the amount of work needed to retrieve data for MIS queries on OLTP data, and how the IDIOMS architecture matches this partitioning. Much work has been done by others in the field of data partitioning and placement, and many strategies have been devised to aid data retrieval. Our work is limited to a mixed Transaction Processing/MIS environment, and we make no claim for a universal solution. Furthermore, our partitioning strategy is limited by the semantics of the database itself. For example, a column that is frequently used in MIS queries, but which has a high update rate may not be a suitable partitioning column if the updates were to result in a high migration rate between partitions. From the work we have carried out so far, we believe that a practical multi-column partitioning strategy is viable for some applications.

We do not yet have experience of using our SQL catalogue extension tables in a working environment, and until we do, we cannot know how practical they are for storing control data.

The next stage of our work is to complete the cost model pertaining to our partitioning strategy, in order to obtain precise information on the costs and benefits. The problem of defining suitable partition ranges for each of the partitioning columns is complex, due to the relationships between the variables, and it may be that ranges will need to be determined using heuristics. In itself, the cost model will not suffice. We intend to implement our partitioning strategy using real data from our commercial collaborators, in order to evaluate our ideas.

We would like to thank the referees of this paper for their comments. Part of this research has been carried out under funding from SERC.