Code Size Reduction for Relieving Software Distribution and Storing Cost


Student thesis: Master's Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
  • Chun Jason XUE (Supervisor)
  • Tei-Wei KUO (External person) (External Co-Supervisor)
Award date28 Dec 2023


Software is a pervasive part of our everyday lives, providing information, services, and entertainment. Increasingly, it is distributed to a wide range of devices, from smartphones to other resource-constrained devices. So it is critical to reduce the cost of such distributions. Traditionally, compress tools, such as gzip and zstd, can compress general files, including software packages. While traditional compression tools can help reduce redundancies within files, this thesis demonstrates the presence of considerable redundancies across different files, both in IR and binary levels, that are not addressed by existing methods. Based on that, this thesis presents two methods to relieve the cost of software distribution for different scenarios.

The first work of this thesis is Shared Dictionary Compression, which is for compressing IR (Intermediate Representation) files during the distribution of applications on the iOS platform because the IR format is supported as a format of distribution by Apple. This work identifies and extracts common sub-strings among instructions across all IR files, creating a shared dictionary. When combined with conventional compression tools for packaging, this technique achieves an average mobile software size reduction of 24.49% compared to using zip compression alone, thereby alleviating software distribution costs for small devices.

The second work in this thesis introduces another compression method named Binary Folding Compression, which targets binary files. We first observed that many repeated instruction patterns exist inside binary files after we decompile them. By utilizing this redundancy, we can reduce the size of binary files directly. The Binary Folding Compression is designed to find and locate repeated patterns and try to extract the most frequent ones to reduce the cost of distribution. We achieved about a 6.35% reduction on average when performing our method on MiBench.

Overall, with the works introduced in this thesis, we have a chance to reduce the cost of software distribution in several different scenarios. These works are not replacements for current general compress tools, but to fill the gap that general compress tools cannot utilize features of code files efficiently.