Abstract

Transaction Processing databases may contain data useful for Management Information Service (MIS) purposes, but there is often a problem of retrieval due to the large data volumes involved.

This thesis investigates the metrics required to partition such a database in order to reduce the scan time of MIS range queries run against that database without causing undue hindrance to Online Transaction Processing (OLTP).

Many multi-dimensional data space partitioning methods have been presented; fundamental limitations of these are discussed. Two such methods, the Grid File and the MDD file structure, are evaluated in detail. An improvement to the method of determination of the number of partitions in the MDD structure is presented.

Assumptions underlying data placement for combined OLTP/MIS in multi-processor database systems are discussed, the conclusion being that balancing of access frequency is not an issue within the application environment.

A well accepted formula relating query predicate size and number of pages accessed (i.e. small bucket size) is shown not to be accurate for the case of very large queries and a small number of partitions (i.e. very large bucket size). Further, it implies that scan time reductions due to partitioning are independent of query size, which is not true under the extreme conditions investigated here. An improved formula is derived, and the implications for scan time reduction are discussed.

A model relating partitioning cost to the resultant benefit of scan time reduction for MIS queries is presented. Heuristic methods are derived to deal with the inherent complexity of computation. A worked example of the model is given.

The major conclusion is that multi-column partitioning is feasible and practical, but a large number of partitions in any dimension is not helpful in aiding MIS range queries.