TY - JOUR
T1 - Moving Big Data to The Cloud
T2 - An Online Cost-Minimizing Approach
AU - Zhang, Linquan
AU - Wu, Chuan
AU - Li, Zongpeng
AU - Guo, Chuanxiong
AU - Chen, Minghua
AU - Lau, Francis C.M.
PY - 2013/12
Y1 - 2013/12
N2 - Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. An important open issue here is to efficiently move the data, from different geographical locations over time, into a cloud for effective processing. The de facto approach of hard drive shipping is not flexible or secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework. Targeting at a cloud encompassing disparate data centers, we model a cost-minimizing data migration problem, and propose two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm , for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. Careful comparisons among these online and offline algorithms in realistic settings are conducted through extensive experiments, which demonstrate close-to-offline-optimum performance of the online algorithms. © 2012 IEEE.
AB - Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. An important open issue here is to efficiently move the data, from different geographical locations over time, into a cloud for effective processing. The de facto approach of hard drive shipping is not flexible or secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework. Targeting at a cloud encompassing disparate data centers, we model a cost-minimizing data migration problem, and propose two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm , for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. Careful comparisons among these online and offline algorithms in realistic settings are conducted through extensive experiments, which demonstrate close-to-offline-optimum performance of the online algorithms. © 2012 IEEE.
KW - Big Data
KW - Cloud Computing
KW - Online Algorithms
UR - http://www.scopus.com/inward/record.url?scp=84890546248&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84890546248&origin=recordpage
U2 - 10.1109/JSAC.2013.131211
DO - 10.1109/JSAC.2013.131211
M3 - RGC 21 - Publication in refereed journal
SN - 0733-8716
VL - 31
SP - 2710
EP - 2721
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 12
M1 - 6678116
ER -