Efficient and Effective Dynamic Hybrid Data Race Detection in Large-scale Multithreaded Programs


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date12 Jul 2018


A data race occurs when two threads access the same memory location without proper synchronization and at least one of these accesses is a write. Harmful data races may lead to memory corruptions, incorrect outputs, or crashes. This kind of concurrency error is difficult to be exposed by traditional software testing techniques. Dynamic hybrid data race detection is a method in testing multithreaded programs to ensure their correctness. This thesis presents two novel techniques, each addressing a technical challenge in dynamic hybrid data race detection for large-scale multithreaded programs.

For detecting data races in real-world multithreaded applications, keeping all memory accesses and checking every one of them is impractical. To maintain a small subset of memory accesses, state-of-the-art hybrid data race detectors either only keep the last memory access by sacrificing detection effectiveness, or successively remove those memory accesses no longer needed in its analysis state through operations with heavy overhead. Thus, finding a strategy to maintain a subset of memory accesses with high detection effectiveness, low runtime slowdown, and low memory consumption still remains challenging.

The first contribution of this thesis is to address the memory access maintenance challenge of hybrid data race detection. The thesis presents a sound hybrid data race detector called HistLock. HistLock employs a function-based strategy to clear the memory access history whenever the current memory access event associates with a function call different from the function call associating with the last memory access. In the experiment, HistLock outperformed the state-of-the-art sound hybrid data race detector by achieving significant lower runtime slowdown and memory consumption with small loss of data race recall rate.

Complete data race detection guarantees to report at least one data race, if any, on each memory location. To keep the set of non-redundant memory accesses as small as possible, the state-of-the-art complete hybrid data race detector performs exhaustive lockset comparisons between the current memory access and every memory access keeping in its analysis state to determine their lock subset relations, which suffers from severe overhead when analyzing large-scale multithreaded programs.

The second contribution of this thesis is to address the efficiency challenge in determining lock subset relation in complete hybrid data race detection. The thesis presents a sound and complete hybrid data race detector called HistLock+, which is built atop thread segments separated by lock releases to infer lock subset relations without performing lockset comparison. HistLock+ guarantees exactly one racy memory access on every such thread segment to be reported for each memory location, and never reports false positive. In the experiment, HistLock+ was found significantly faster and more memory-efficient than the state-of-the-art complete hybrid date race detector and reported the greatest number of data races among all evaluated detectors.

In summary, this thesis makes two major contributions. First, it is the first work to formulate function-based elimination strategy for memory access maintenance in sound hybrid data race detection. Second, it is the first work to eliminate lockset comparisons for lock subset determination in complete hybrid data race detection.