Cross-layer I/O Stack Optimization for Mobile Devices

移動設備上跨層的I/O棧優化

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date10 Mar 2021

Abstract

For mobile devices, I/O performance is crucial to overall system performance. There are several layers in I/O stack of mobile devices, such as application layer, virtual file system (VFS) layer, page cache in main memory, and file system layer. We find each layer has some schemes that impact I/O performance of mobile devices. The main cause is that the Android system inherits Linux kernel, and thus these schemes designed for servers are directly transplanted to mobile devices. However, there are so many differences between servers and mobile devices that make the schemes are not working well on mobile devices. To improve I/O performance, we optimize I/O stack of mobile devices from several layers in this thesis. 

First, high performance and low latency are of paramount importance in running applications over modern mobile devices. It has become a common practice to cache data at the mobile side in achieving the objectives, especially for data re-accessing. However, the caching problem is quickly deteriorated because of the exponential growth of cached data. To solve this problem, we propose to treat cached data differently based on their usage behaviors. In particular, a framework, called CacheSifter, is designed to dynamically sift cache files into three categories based on their usage behaviors: Burn-after-reading (BAR), Transient, and Long-living. Different categories of cache files are managed accordingly in DRAM or flash storage. The overall system performance and the lifetime of the flash storage are boosted as a result. The proposed designs are implemented over Android devices and evaluated over a collection of representative applications. Evaluation results show that CacheSifter can reduce the writebacks of cached files by 88.8% on average, and the write performance of I/O intensive applications could be improved up to 44.6%.

Second, we further analyze the access patterns of application launchings and find the critical data which can be used to reduce the app launch latency. Application launch time is an important user experience metric for mobile devices. The latency of page cache misses occupies a large percentage of application launch time. Previous works reduce the page cache misses by prefetching soon-to-be-used data to the page cache. However, these proposals have to prefetch massive files to the main memory, which can thrash applications' own data, leading to performance degradation. To solve this problem, we investigate the root cause of these page cache misses and systematically study data access behavior of applications in Android. Based on our experiments, we observe that an extremely small set of data (0.3\% on average of the total data for launching an app) is responsible for most of the page cache misses. We define these sets of data as critical data. Based on these observations, we propose FLCD consisting of FindCD and LockCD to accelerate application launching by exploiting critical data. FindCD is designed to systemically identify the critical data for each application. LockCD is designed to ensure that all critical data detected by FindCD are maintained in the page cache. We evaluate FLCD on two different smartphones with twenty commonly-used applications. The evaluation results show that FLCD can reduce the launch time of the targeted application by 11\% on average with negligible performance slowdown to other concurrently running applications. We also show FLCD is effective across multiple mobile devices and kernel versions.

Third, we find readahead scheme of page cache is not suitable for mobile devices. Read-ahead schemes have been widely used in page cache to improve read performance of Linux systems. As Android system inherits the Linux kernel, the traditional read-ahead scheme is directly transplanted to mobile devices. However, request sizes and page cache sizes on mobile devices are much smaller, which may degrade read-ahead efficiency and therefore hurt user experience. To solve this problem, we first observe that many pages pre-fetched by read-ahead are unused, which causes frequent page cache eviction. And these evict operations could induce extra access latency, especially when write-back is conducting. Then,  we propose a new analysis model to characterize the factors that closely relate to the access latency. It is found that there exists a trade-off between read-ahead size and access latency. Finally,  we propose two optimized read-ahead schemes to exploit this trade-off under different situations. Size-tuning scheme aims to find the proper maximum size of read-ahead according to the characteristics of mobile devices. While MobiRA scheme improves the read-ahead efficiency by dynamically tuning read-ahead size and stop-settings. Experimental results on real mobile devices show that the proposed schemes can increase the efficiency of read-ahead scheme and improve the overall performance of mobile devices.

Fourth, we further find the memory reclaim scheme is not suitable for mobile devices. While the Linux memory reclaim scheme is designed to deliver high throughput in server workloads, the scheme becomes inefficient on mobile device workloads. Through carefully designed experiments, we show that the current memory reclaim scheme cannot deliver its desired performance due to two key reasons: page re-fault, which occurs when an evicted page is demanded again soon after, and direct reclaim, which occurs when the system needs to free up pages upon request time. Unlike the server workload where the direct reclaim happens infrequently, multiple direct reclaims can happen in many common Android use cases. We provide further analysis that identifies the major sources of the high number of page re-faults and direct reclaims and propose Acclaim, a foreground aware and size-sensitive reclaim scheme. Acclaim consists of two parts: foreground aware eviction (FAE) and lightweight prediction-based reclaim scheme (LWP). FAE is used to relocate free pages from background applications to foreground applications. While LWP dynamically tunes the size and the amount of background reclaims according to the predicted allocation workloads. Experimental results show Acclaim can significantly reduce the number of page re-faults and direct reclaims with low overheads and delivers better user experiences for mobile devices.

Fifth, we find the TRIM scheme can degrade the I/O performance on mobile devices. TRIM is a recommended command to deliver data invalidation information of the file system to flash storage. It is issued on both system level and device level. Since it can reduce the number of data copies during device-level garbage collection (DGC), TRIM has been widely used to improve the endurance and performance of mobile devices. Contrary to the common belief, we identify that the default TRIM scheme has both merit and drawback to the performance of mobile devices, especially in Flash-friendly file system (F2FS), which is a commonly-used file system in mobile devices. On one hand, TRIM can reduce GC migration to prolong the flash lifetime as well as improving I/O throughput; On the other hand, TRIM may induce I/O contentions. Based on this observation, we propose a new TRIM scheme, iTRIM, to distribute the timing overheads to system idle time. To further reduce I/O contention and improve I/O performance, the design of iTRIM considers the TRIM size and the logical addresses' pattern of victim invalidated data. Experimental results show that iTRIM can minimize I/O contentions while retaining the benefits of the default TRIM scheme for endurance and performance.

Base on these proposed schemes which cross multiple layers, the overall performance of mobile devices and the lifetime of flash storage can be much improved.