TY - JOUR
T1 - PHDFS
T2 - Optimizing I/O performance of HDFS in deep learning cloud computing platform
AU - Zhu, Zongwei
AU - Tan, Luchao
AU - Li, Yinzhen
AU - Ji, Cheng
PY - 2020/10
Y1 - 2020/10
N2 - For deep learning cloud computing platforms, file system is a fundamental and critical component. Hadoop distributed file system (HDFS) is widely used in large scale clusters due to its high performance and high availability. However, in deep learning datasets, the number of files is huge but the file size is small, making HDFS suffer a severe performance penalty. Although there have been many optimizing methods for addressing the small file problem, none of them take the file correlation in deep learning datasets into consideration. To address such problem, this paper proposes a Pile-HDFS (PHDFS) based on a new file aggregation approach. Pile is designed as the I/O unit merging a group of small files according to their correlation. In order to effectively access small files, we design a two-layer manager and add the inner organization information to data blocks. Experimental results demonstrate that, compared with the original HDFS, PHDFS can dramatically decrease the latency when accessing small files and improve the FPS (Frames Per Second) of typical deep learning models by 40%.
AB - For deep learning cloud computing platforms, file system is a fundamental and critical component. Hadoop distributed file system (HDFS) is widely used in large scale clusters due to its high performance and high availability. However, in deep learning datasets, the number of files is huge but the file size is small, making HDFS suffer a severe performance penalty. Although there have been many optimizing methods for addressing the small file problem, none of them take the file correlation in deep learning datasets into consideration. To address such problem, this paper proposes a Pile-HDFS (PHDFS) based on a new file aggregation approach. Pile is designed as the I/O unit merging a group of small files according to their correlation. In order to effectively access small files, we design a two-layer manager and add the inner organization information to data blocks. Experimental results demonstrate that, compared with the original HDFS, PHDFS can dramatically decrease the latency when accessing small files and improve the FPS (Frames Per Second) of typical deep learning models by 40%.
KW - Cloud computing
KW - Deep learning
KW - Distributed file system
KW - Small files
KW - Cloud computing
KW - Deep learning
KW - Distributed file system
KW - Small files
KW - Cloud computing
KW - Deep learning
KW - Distributed file system
KW - Small files
UR - http://www.scopus.com/inward/record.url?scp=85086499548&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85086499548&origin=recordpage
U2 - 10.1016/j.sysarc.2020.101810
DO - 10.1016/j.sysarc.2020.101810
M3 - RGC 21 - Publication in refereed journal
SN - 1383-7621
VL - 109
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 101810
ER -