LEC-Codec : Learning-Based Genome Data Compression
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Journal / Publication | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Publication status | Online published - 3 Oct 2024 |
Link(s)
Abstract
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications. © 2024 IEEE.
Research Area(s)
- Data compression, learning-based method, lossless genome compression, non-reference method
Citation Format(s)
LEC-Codec: Learning-Based Genome Data Compression. / Sun, Zhenhao; Wang, Meng; Wang, Shiqi et al.
In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 03.10.2024.
In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 03.10.2024.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review