Effective and Efficient Program Path Encoding and Tracing


Student thesis: Doctoral Thesis

View graph of relations


  • Chunbai YANG

Related Research Unit(s)


Awarding Institution
Award date16 Jan 2017


A program path is a sequence of program statements following the control flow of a program. Execution profiles of program paths have a wide range of applications. Path encoding techniques statically analyze the program, encoding each program path with a unique identifier. In the course of executions, path tracing techniques collect path events holding the identifiers of executed paths and generate execution traces, which model the complete dynamic control flow of these executions. In this thesis, we propose novel techniques to effectively and efficiently encode program paths and perform path tracing to collect and represent the complete dynamic control flow of program executions.

A state-of-the-art path encoding technique supports efficient encoding of an arbitrary subset of all program paths of a program. However, it cannot guarantee to use the smallest range of numbers for encoding, and paths outside this subset may alias with a path in this subset. We propose a novel path encoding technique, named Optimal Path Encoding (OPE). Given a subset of paths of a program, OPE transforms the program in a functionality-equivalent manner such that a subgraph of the control flow graph of the transformed program contains all and only the paths in the specified subset. Path encoding is then performed exclusively on this subgraph, and thus guarantees that program paths in the subset are encoded using the smallest range of numbers and are not aliased with any paths outside this subset. Our experiments show the effectiveness and efficiency of OPE on encoding given sets of interesting paths.

Existing path tracing techniques model the control flow of individual program executions independently, and the control flow semantics of these control flow traces become obscure if further compressed for space-efficiency. It makes control flow querying on these traces highly inefficient. We propose a two-phase path tracing framework, Hierarchical Program Path (HPP), to model the complete dynamic control flow of an arbitrary number of executions of a program. In Phase 1, HPP efficiently generates a stream of events representing a tree-based representation of control flow for each thread in the execution. In Phase 2, given a set of event streams, HPP identifies all the equivalent instances of the same exercised inter-procedural path and represents each such equivalent set of paths with a single subgraph, resulting in a compositional graph-based trace representation, namely HPPDAG. As such, an HPPDAG instance is small in size, and completely preserves the control flow semantics of the traced executions, which supports efficient control flow querying. We have validated the efficiency and effectiveness of HPP framework through an experiment. We have also built an efficient path profiling technique to demonstrate a novel application of HPPDAG.

In conclusion, this thesis makes two major contributions. First, it proposes the first work that optimally encodes an arbitrary subset of program paths of a program. Second, it proposes the first path tracing framework that models the complete dynamic control flow of a set of executions of a program with a compositional graph-based trace representation that supports efficient control flow querying.