Fast Coding Algorithm and Efficient Background Modeling for Video Processing
視頻處理中快速編碼算法與高效背景建模技術
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 6 May 2020 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(70a1b0fd-bfa7-4aba-8008-0eb3ad96d59c).html |
---|---|
Other link(s) | Links |
Abstract
Recent years have witnessed the explosion of video applications. The rapid growth of video data raises challenges for new video processing techniques. On the one hand, there exists a growing demand for emerging ultra high definition (UHD) videos with 4K or 8K resolution. As a result, the latest High Efficiency Video Coding (HEVC) standard was released. Compared with H.264/AVC, HEVC reduces 50% of the bit rate with similar video quality. However, it also dramatically increases the encoding complexity by several times. Moreover, with substantial video data, automatic video analysis is highly required. Therefore, efficient and practical approaches targeting real-world applications, including background modeling, object tracking, and content understanding, need to be explored.
This thesis mainly focuses on new approaches for real-world video processing applications. Firstly, a fast video coding algorithm for inter prediction is proposed to address the high encoding complexity issue in the HEVC standard. For video analysis, a robust video background modeling algorithm is presented. Our approach combines both spatial and temporal characteristics to handle complex and dynamic scenarios. Lastly, because of the great success of the convolutional neural network (CNN) and its wide applications in video processing, the thesis also presents a universal compact CNN acceleration framework to promote the deployment of CNN in lightweight video applications.
Strong correlations between neighboring frames and neighboring regions in the same frame are prevalent in video data. Thus the fast video coding algorithm is based on the motion correlations between neighboring coding units (CUs). Firstly, if collocated CUs have high motion diversity and are split as well, the current CU is early split. Secondly, since SKIP mode is prevalent in the final CU decision modes because it indicates a motion sharing relationship, early SKIP mode detection and SKIP mode based CU termination are explored. The Bayesian rule is adopted in decision making with a discriminant function trying to minimize the expected risk. Experimental results validate the proposed algorithm with 48.2% and 45.9% encoding time reduction for random access and low-delay B configurations, respectively. Moreover, the algorithm achieves a satisfying balance between complexity reduction and coding efficiency, with only about 0.5% BDBR increase.
The video background modeling problem still faces some challenges, such as illumination changes, foreground cluttering, intermittent movement, etc. The thesis presents a low-complexity background initialization and subtraction algorithm based on superpixel motion detection. The approach firstly performs an illumination change detection to divide the input sequence. Only subsequences with stable illumination will be processed further. Each frame is then segmented into superpixels presenting spatial characteristics. Foreground objects are identified in the superpixel level with temporal motion detection. The clear background image is generated from density-based clustering, and Otsu's method is used for background subtraction. The approach is evaluated on two major datasets, SBMnet and CDnet 2014, to show its robustness in complex and dynamic scenarios.
Nowadays, CNN has achieved great successes in a wide range of applications, including video coding and video analysis. However, the powerful CNNs are also accompanied by large model size and high computational complexity. To alleviate this problem, both compact network optimization and efficient hardware acceleration system are explored. This thesis introduces a rapid compact neural network acceleration design framework to promote the deployment of CNN in edge computing video applications. Firstly, the improved binary network training approach is proposed with trainable scaling factors for higher inference accuracy. The scaling factors will be trained jointly with other parameters via backpropagation. A long-tailed higher-order derivative estimation is proposed for network training balancing tight approximation and smooth backpropagation. The hardware/software co-design framework supports various compact network optimizations, including flexible binarization and quantization in the data level, depthwise separable convolution in the operation level, compact squeeze layer and residual block in the structure level. With the accelerator optimization strategies such as balanced inter-layer pipeline and parallel parameter selection, the acceleration system achieves up to 2 TOPS performance and 370 GOPS/W power efficiency.
In summary, this thesis proposes several new algorithms and implementations for efficient video processing applications from low-level video coding, middle-level background modeling, to high-level CNN-based video analysis. These proposed approaches help to promote the development and deployment of numerous real-world video applications.
This thesis mainly focuses on new approaches for real-world video processing applications. Firstly, a fast video coding algorithm for inter prediction is proposed to address the high encoding complexity issue in the HEVC standard. For video analysis, a robust video background modeling algorithm is presented. Our approach combines both spatial and temporal characteristics to handle complex and dynamic scenarios. Lastly, because of the great success of the convolutional neural network (CNN) and its wide applications in video processing, the thesis also presents a universal compact CNN acceleration framework to promote the deployment of CNN in lightweight video applications.
Strong correlations between neighboring frames and neighboring regions in the same frame are prevalent in video data. Thus the fast video coding algorithm is based on the motion correlations between neighboring coding units (CUs). Firstly, if collocated CUs have high motion diversity and are split as well, the current CU is early split. Secondly, since SKIP mode is prevalent in the final CU decision modes because it indicates a motion sharing relationship, early SKIP mode detection and SKIP mode based CU termination are explored. The Bayesian rule is adopted in decision making with a discriminant function trying to minimize the expected risk. Experimental results validate the proposed algorithm with 48.2% and 45.9% encoding time reduction for random access and low-delay B configurations, respectively. Moreover, the algorithm achieves a satisfying balance between complexity reduction and coding efficiency, with only about 0.5% BDBR increase.
The video background modeling problem still faces some challenges, such as illumination changes, foreground cluttering, intermittent movement, etc. The thesis presents a low-complexity background initialization and subtraction algorithm based on superpixel motion detection. The approach firstly performs an illumination change detection to divide the input sequence. Only subsequences with stable illumination will be processed further. Each frame is then segmented into superpixels presenting spatial characteristics. Foreground objects are identified in the superpixel level with temporal motion detection. The clear background image is generated from density-based clustering, and Otsu's method is used for background subtraction. The approach is evaluated on two major datasets, SBMnet and CDnet 2014, to show its robustness in complex and dynamic scenarios.
Nowadays, CNN has achieved great successes in a wide range of applications, including video coding and video analysis. However, the powerful CNNs are also accompanied by large model size and high computational complexity. To alleviate this problem, both compact network optimization and efficient hardware acceleration system are explored. This thesis introduces a rapid compact neural network acceleration design framework to promote the deployment of CNN in edge computing video applications. Firstly, the improved binary network training approach is proposed with trainable scaling factors for higher inference accuracy. The scaling factors will be trained jointly with other parameters via backpropagation. A long-tailed higher-order derivative estimation is proposed for network training balancing tight approximation and smooth backpropagation. The hardware/software co-design framework supports various compact network optimizations, including flexible binarization and quantization in the data level, depthwise separable convolution in the operation level, compact squeeze layer and residual block in the structure level. With the accelerator optimization strategies such as balanced inter-layer pipeline and parallel parameter selection, the acceleration system achieves up to 2 TOPS performance and 370 GOPS/W power efficiency.
In summary, this thesis proposes several new algorithms and implementations for efficient video processing applications from low-level video coding, middle-level background modeling, to high-level CNN-based video analysis. These proposed approaches help to promote the development and deployment of numerous real-world video applications.