Rate Control and Transmission Optimization for Dynamic Adaptive Streaming

動態自適應流的碼率控制和傳輸優化

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date10 Nov 2020

Abstract

Given the significant advances in multimedia and communication technologies, numerous video applications, such as video streaming and video conference, have been brought into the industry and occupy the primary Internet traffic. Performing stable and high-quality streaming services in constrained scenarios is challenging as they are sensitive to the time delay and bandwidth fluctuation. Owing to the increasing demand for high online visual quality, several dynamic adaptive streaming techniques have been proposed to provide low-latency and high-quality video services. As the ultimate consumer of the video stream is the end-user, the perceptual characteristics should be fully considered in video transmission. However, most of the existing algorithms do not consider video rate and transmission control with subjective factors, resulting in quality fluctuation and unnecessary bandwidth waste, which has led to emerging research on the rate control and transmission optimization for dynamic adaptive streaming.

The contribution of this thesis is mainly composed of two parts: the perceptual-based rate control optimization in high efficiency video coding (HEVC) and the design of perceptual-based dynamic adaptive video transmission optimization. On the one hand, rate control (RC) plays a critical role in the transmission of high-quality video in HEVC, which aims to achieve a balance between visual quality, quality, and buffer smoothness under the constraints. On the other hand, dynamic adaptive transmission control technologies have been widely adopted for online video transmission, which aims to provide efficient video transmission services under the inherent large fluctuations characterizing both encoded video sequences and network traces.

As for the perceptual-based rate control optimization part, we aim to figure out the disadvantages of the existing works and discover promising directions for improving the rate control performance in HEVC. First, for normal video sequences, we propose a coding tree unit (CTU)-level rate control scheme from the perspective of structural similarity index (SSIM)-based rate-distortion optimization to improve the coding efficiency. First, we establish the SSIM-based rate-distortion model based on the divisive normalization scheme, which characterizes the relationship between the local visual quality and the coding bits. Then, the developed model is applied to the CTU-level rate control and transformed into a global optimization problem solved by convex optimization. Finally, a new model parameter updating strategy for the CTU-level rate control is presented that is robust to scene variations. Our algorithm can achieve optimal CTU-level bit allocation given the bitrate budget. The experimental results show that our algorithm substantially enhances the coding performance and consistently outperforms both the rate control scheme in the HEVC reference software and existing algorithms in terms of rate-perceptual distortion performance using different test configurations. Second, as most current HEVC RC algorithms based on Spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast-moving objects, significant object occlusion or scene changes. For dynamic video sequences,, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer, and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.

As for the perceptual-based dynamic adaptive video transmission optimization part, we aim to design different types of video transmission control methods that can be used in different application environments. First, an RL-based dynamic adaptive streaming over HTTP (DASH) technique that addresses user quality of experience (QoE) is constructed. The DASH adaptive bitrate (ABR) selection problem is formulated as an MDP problem. Accordingly, an RL-based ABR algorithm is used to solve the MDP problem, in which the DASH client acts as the RL agent, and the network variation constitutes the environment. In this algorithm, the video quality/bitrate levels, as well as the buffer status of the client, are used as the input, and the proposed user QoE is used as the reward by jointly considering the video quality and buffer status. The goal of the RL algorithm is to select a suitable video quality level for each video segment to maximize the total reward. Then, the proposed RL-based ABR algorithm is embedded in the QoE-oriented DASH framework. The experimental results show that the proposed RL-based ABR algorithm outperforms the state-of-the-art schemes in terms of both temporal and visual QoE factors by a noticeable margin while guaranteeing application-level fairness when multiple clients share a bottlenecked network. Second, a 360-degree streaming system can provide immersive, interactive, and autonomous experiences surrounding the user by viewpoint changing to see different angles of the 360-degree video. Due to the limited and highly dynamic cellular network conditions, high-resolution 360-degree video playback over mobile devices often suffer from playback freezing, and inevitable bandwidth waste appears in delivering those out-of-viewpoint parts. In this thesis, a hybrid control scheme is presented for segment-level continuous bitrate selection and tile-level bitrate allocation for 360-degree streaming over mobile devices to increase users' quality of experience. First, a DRL method is proposed to predict the segment bitrate and avoid playback freezing events. Second, a viewpoint-prediction-map-based cooperative bargaining game theory is proposed for bitrate allocation optimization to choose a suitable bitrate for each tile to reduce unreasonable bandwidth waste. The proposed scheme is compared with state-of-the-art approaches over a wide variety of mobile network conditions with multiple viewpoint traces and 360-degree video contents. Experimental results indicate that the proposed method outperforms the state-of-the-art approaches in terms of different experimental objectives over mobile devices.