Query processing in data intensive applications

數據密集型應用中的詢問處理

Student thesis: Master's Thesis

View graph of relations

Author(s)

  • Hin Cheung POON

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date3 Oct 2006

Abstract

Two research topics related to query processing in data intensive applications are presented in this thesis. The first one is related to database concurrency control in mobile computing environments and the other is related to operator scheduling in data stream systems. In the research which related to database concurrency control mechanism, experiments are carried out to evaluate the proposed concurrency control algorithm called Read-Write Set Test for broadcast transaction [30] [3 11. On the other hand, two new operator scheduling algorithms for data stream systems are introduced in the research. In mobile computing environments, there is an implicit assumption that the server is able to broadcast consistent data to mobile clients. Unless there is a special algorithm to handle data broadcasting in a consistent and timely manner, this assumption may not be valid. In the work of [30][3 11, data broadcasting at the server is formulated as a broadcast transaction which reads the entire database in a consistent way before broadcasting data to the mobile clients in a broadcast cycle. This issue is not trivial as the broadcast transaction creates high interference to normal update transactions at the server. Some algorithms proposed by previous studies [19] [20:1[25] on reading the entire databases can be candidate solutions. However, those algorithms including Shade Test [25] have some inadequacies in handling update transactions correctly. Recently, an algorithm called Read-Write Set Test (RWST) [30][31] is proposed to fix the loophole. In this study, a series of simulation experiments are performed to evaluate the performance of RWST for broadcast transaction over a wide range of system workloads. In data stream systems, the arrival pattern of data streams is bursty and unpredictable. If a system cannot manage queries by processing tuples in an efficient manner, the system may fall into an overloaded state and may lead to too many backlogged tuples waiting for processing. In the area of data stream systems, there are many investigations in the topic of query or operator scheduling. They mainly focus on how to minimize memory utilization, reduce tuple delay, and increase output rate. In these investigations, the objectives are concentrated on studying the importance of system parameters in operator scheduling and improving the efficiency and effectiveness of query and operator processes by scheduling query operators to an appropriate sequence. Even there are many researches using static factors in operator scheduling, dynamic factor may be important to operator scheduling. So in this research, dynamic factors are considered. Besides, instead of scheduling operators in an individual query as assumed in conventional approaches, new algorithms are designed to schedule all operators as a whole for all queries in a system. Two new scheduling algorithms are introduced in this study. They are Memory Monitoring (MM) and Data Stream Grouping (DSG). MM uses one of the system parameters - memory usage to calculate and assign priority values to operators. MM helps to demonstrate the performance of using dynamic factors in operator scheduling. On the other hand, popular static factors (operator selectivity - s and processing time - t) are used in DSG to perform operator scheduling in multiple-query multiple-data-stream environment.

    Research areas

  • Querying (Computer science), Database management