A genetic programming approach to query optimization in distributed database systems
Student thesis: Master's Thesis
Related Research Unit(s)
|Award date||17 Nov 1997|
With the emergence of relatively inexpensive and advanced communication technology, Distributed Database Management Systems (DDBMS) have become an integral part of many computer applications. Efficient query processing is one of the most important issues in distributed database systems since accessing data in a decentralized database system differs very much from accessing data in a centralized system. In a distributed environment, it is common that queries extract data from different sites, whereas it is not necessary in the centralized case. It is important to limit the amount of data transfer across different sites in a distributed system. Semijoin is a way to reduce the cost of expensive joins between various sites. A key issue in query processing based on semijoin reduction is to find a good sequence of semijoins. Many algorithms for this purpose have been developed, of which the A* algorithm is one of the better known heuristic search techniques. This thesis proposes a new approach, based on Genetic Programming (GP), to improve the process of database query in Distributed Database Systems. Genetic Programming, which contains a probabilistic element, mimics Darwinian evolution to tackle search problems. We give a comprehensive description of how GP can be applied to perform the search of an optimal and/or suboptimal sequence of semijoins. Also, we describe the design of a simulation experiment to confirm the success of using GP in the search problem. The results are then compared with that of four other established methods, namely, exhaustive search, random search, greedy algorithms, and A* algorithm.
- Distributed databases, Genetic programming (Computer science)