Chapter 4. Program of research

4.1 FRAMEWORK

The IDIOMS (Intelligent Decision Making In Online Management Systems) machine, which is being developed at The National Transputer Support Centre (NTSC), based in Sheffield, uses some of the concepts of the DRAT machine. It is proposed that IDIOMS will be a test-bed for some of the research carried out in this research on data dictionary control of a database machine.

Data placement and allocation research is based around the Data Storage Description Language designed by Kerridge [KERR91b]. This DSDL has its roots in the SQL DSDL [ZORN87], but expands on the concepts found therein. The major addition is the specification of data placement, so that a DBA can explicitly state the location of data partitions on storage media.

4.2 DETAILED RESEARCH PROGRAM

The detailed program of research which is outlined below is' based on the problems that have been uncovered in the initial period of the research, and which are described in detail in chapter 3. As the work proceeds, and in the light of new information, it may be necessary to change the details, in order that the overall objectives may be met.

Design of SQL tables that interface with SQL2 [IS090], and which enable the following to be described
1. system and user data partitioning and placement
2. the DRAT machine architecture
3. sequencing and allocation of the resources available
4. statistics of queries, the database state, and query response
Design of SQL tables that allow any database machine architecture to be described (we would then be able to generalise the work of a.2, and a.3). (Following on from this, an investigation into the possibility of DBA defined tables that can be appended to the SQL Information Schema which allow a particular DB machine to be described).
In order to facilitate (b), either a taxonomy of database machines (or a subset thereof) will be required, or some method of analysing and describing the functionality of sub-systems commonly found in multi-processor database machines, so that a DBA could map a given physical implementation to SQL tables which reflect fundamental concepts. In other words, we want a device independent description of computational resource, and data storage, in a database machine. It may be that existing taxonomies are suitable, in which case one of these would be used.
An investigation into data allocation (partitioning and placement) strategies of the OLTP, and the relationship between this, the processing resource available, and the response time of OLTP queries. Involved in this is the relationship of indices, cardinalities of relations, hit rate, and query type. Similarly, for MIS.
(As a result of d), the development of algorithms (ie a cost model) which use the statistical information obtained by the system, in conjunction with data allocation information, and knowledge of resource availability to:
1. determine the most suitable resource allocation for a given MIS query, in particular with respect to the access path optimisation (in terms of reducing inter-process communication) of queries involving joins.
2. suggest to the DBA, where appropriate (by means of suitable messages) that in order to improve
  1. average query response time (OLTP and/or MIS)
  2. throughput of MIS for a given OLTP rate
  one or more of the following should be considered:
  1. data should be re-partitioned
  2. further data storage resource is needed
  3. further processing resource is required
(In order to facilitate e.1) the determination of the optimal data flows, and development of a query access path generator which produces a (heuristically determined) set of possible access paths for any given logically optimised query.
Investigation and application of benchmarking standards in order to evaluate (f).
Derivation of rules/guidelines for mapping the DSDL partitioning/placement options to any specific multi-processor database architecture.