Abstract
In warehouse-scale cloud datacenters, co-locating online services and offline batch jobs is an efficient approach to improving datacenter utilization. In this paper, we perform a deep analysis on the released Alibaba workload dataset, from the perspective of anomaly analysis and diagnosis. we first performed raw data preprocessing, including data supplementing, filtering, correlation and aggregation, and generating the container-level, batch-level and server-level resource usage data finally. Then based on the summary data, we illustrate the overall cluster usage distribution of online container services and batch jobs. Obviously, there are several abnormal nodes in the co-located cluster, and we explore the causes of anomalies from three aspects: (1) unbalanced co-located workloads distribution; (2) skew co-located workload resource utilization; (3) system failures or job instance failures. In addition, we also give some cases of abnormal nodes, which show that frequent system failures and unbalanced workload distribution have a great impact on abnormal nodes, the skew co-located workload resource utilization and frequent instance failures are the causes of abnormalities, too.
| Original language | English |
|---|---|
| Title of host publication | Benchmarking, Measuring, and Optimizing |
| Editors | Wanling Gao, Jianfeng Zhan, Geoffrey Fox, Xiaoyi Lu, Dan Stanzione |
| Publisher | Springer |
| Pages | 278-291 |
| ISBN (Electronic) | 9783030495565 |
| ISBN (Print) | 9783030495558 |
| DOIs | |
| Publication status | Published - 2020 |
| Event | 2019 BenchCouncil International Symposium on Benchmarking, Measuring, and Optimization (Bench'19) - Denver, United States Duration: 14 Nov 2019 → 16 Nov 2019 Conference number: 2nd https://www.benchcouncil.org/bench19/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 12093 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 2019 BenchCouncil International Symposium on Benchmarking, Measuring, and Optimization (Bench'19) |
|---|---|
| Place | United States |
| City | Denver |
| Period | 14/11/19 → 16/11/19 |
| Internet address |
Research Keywords
- Alibaba trace
- Anomaly analysis
- Causes diagnosis
- Co-located workloads
Fingerprint
Dive into the research topics of 'Anomaly analysis and diagnosis for co-located datacenter workloads in the alibaba cluster'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver