Skip to main navigation Skip to search Skip to main content

Anomaly analysis and diagnosis for co-located datacenter workloads in the alibaba cluster

Rui Ren*, Jinheng Li, Lei Wang, Yan Yin, Zheng Cao

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

In warehouse-scale cloud datacenters, co-locating online services and offline batch jobs is an efficient approach to improving datacenter utilization. In this paper, we perform a deep analysis on the released Alibaba workload dataset, from the perspective of anomaly analysis and diagnosis. we first performed raw data preprocessing, including data supplementing, filtering, correlation and aggregation, and generating the container-level, batch-level and server-level resource usage data finally. Then based on the summary data, we illustrate the overall cluster usage distribution of online container services and batch jobs. Obviously, there are several abnormal nodes in the co-located cluster, and we explore the causes of anomalies from three aspects: (1) unbalanced co-located workloads distribution; (2) skew co-located workload resource utilization; (3) system failures or job instance failures. In addition, we also give some cases of abnormal nodes, which show that frequent system failures and unbalanced workload distribution have a great impact on abnormal nodes, the skew co-located workload resource utilization and frequent instance failures are the causes of abnormalities, too.
Original languageEnglish
Title of host publicationBenchmarking, Measuring, and Optimizing
EditorsWanling Gao, Jianfeng Zhan, Geoffrey Fox, Xiaoyi Lu, Dan Stanzione
PublisherSpringer 
Pages278-291
ISBN (Electronic)9783030495565
ISBN (Print)9783030495558
DOIs
Publication statusPublished - 2020
Event2019 BenchCouncil International Symposium on Benchmarking, Measuring, and Optimization (Bench'19) - Denver, United States
Duration: 14 Nov 201916 Nov 2019
Conference number: 2nd
https://www.benchcouncil.org/bench19/

Publication series

NameLecture Notes in Computer Science
Volume12093
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2019 BenchCouncil International Symposium on Benchmarking, Measuring, and Optimization (Bench'19)
PlaceUnited States
CityDenver
Period14/11/1916/11/19
Internet address

Research Keywords

  • Alibaba trace
  • Anomaly analysis
  • Causes diagnosis
  • Co-located workloads

Fingerprint

Dive into the research topics of 'Anomaly analysis and diagnosis for co-located datacenter workloads in the alibaba cluster'. Together they form a unique fingerprint.

Cite this