Code Size Reduction for Relieving Software Distribution and Storing Cost
壓縮代碼體積以降低軟件分發及存儲開銷
Student thesis: Master's Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 28 Dec 2023 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(1a32224f-1a2a-4a84-9ec2-2685a5567c77).html |
---|---|
Other link(s) | Links |
Abstract
Software is a pervasive part of our everyday lives, providing information, services, and entertainment. Increasingly, it is distributed to a wide range of devices, from smartphones to other resource-constrained devices. So it is critical to reduce the cost of such distributions. Traditionally, compress tools, such as gzip and zstd, can compress general files, including software packages. While traditional compression tools can help reduce redundancies within files, this thesis demonstrates the presence of considerable redundancies across different files, both in IR and binary levels, that are not addressed by existing methods. Based on that, this thesis presents two methods to relieve the cost of software distribution for different scenarios.
The first work of this thesis is Shared Dictionary Compression, which is for compressing IR (Intermediate Representation) files during the distribution of applications on the iOS platform because the IR format is supported as a format of distribution by Apple. This work identifies and extracts common sub-strings among instructions across all IR files, creating a shared dictionary. When combined with conventional compression tools for packaging, this technique achieves an average mobile software size reduction of 24.49% compared to using zip compression alone, thereby alleviating software distribution costs for small devices.
The second work in this thesis introduces another compression method named Binary Folding Compression, which targets binary files. We first observed that many repeated instruction patterns exist inside binary files after we decompile them. By utilizing this redundancy, we can reduce the size of binary files directly. The Binary Folding Compression is designed to find and locate repeated patterns and try to extract the most frequent ones to reduce the cost of distribution. We achieved about a 6.35% reduction on average when performing our method on MiBench.
Overall, with the works introduced in this thesis, we have a chance to reduce the cost of software distribution in several different scenarios. These works are not replacements for current general compress tools, but to fill the gap that general compress tools cannot utilize features of code files efficiently.
The first work of this thesis is Shared Dictionary Compression, which is for compressing IR (Intermediate Representation) files during the distribution of applications on the iOS platform because the IR format is supported as a format of distribution by Apple. This work identifies and extracts common sub-strings among instructions across all IR files, creating a shared dictionary. When combined with conventional compression tools for packaging, this technique achieves an average mobile software size reduction of 24.49% compared to using zip compression alone, thereby alleviating software distribution costs for small devices.
The second work in this thesis introduces another compression method named Binary Folding Compression, which targets binary files. We first observed that many repeated instruction patterns exist inside binary files after we decompile them. By utilizing this redundancy, we can reduce the size of binary files directly. The Binary Folding Compression is designed to find and locate repeated patterns and try to extract the most frequent ones to reduce the cost of distribution. We achieved about a 6.35% reduction on average when performing our method on MiBench.
Overall, with the works introduced in this thesis, we have a chance to reduce the cost of software distribution in several different scenarios. These works are not replacements for current general compress tools, but to fill the gap that general compress tools cannot utilize features of code files efficiently.