How to characterize Alibaba workloads in a way that we can simulate various production workload in a representative way for scheduling and resource management strategy studies. ![]() We distill the challenges as the following topics: You may use trace however you want as long as it is for reseach or study purpose.įrom our perspective, the data is provided to address the challenges Alibaba face in IDC's where online services and batch jobs are collocated. Our motivationĪs said at the beginning, our motivation on publishing the data is to help people in related field get a better understanding of modern data centers and provide production data for researchers to varify their ideas. In future, we will try to release new traces at a regular pace, please stay tuned. It would be much appreciated if you could tell us once any publication using our trace is available, as we are maintaining a list of related publicatioins for more researchers to better communicate with each other. Note that the more clearly you ask the question, the more likely you would get a clear answer. Filing an issue is recommanded as the discussion would help all the community. We encourage anyone to use the traces for study or research purposes, and if you had any question when using the trace, please contact us via email: alibaba-clusterdata, or file an issue on Github. See the subdirectory ( trace_2022) for more details. Based AMTrace, researchers can analysis: CPU performance, microarchitecture contention, memory bandwidth contention and so on. AMTrace is the first fine-granulairty and large-scale microarchitectural metrics of Alibaba Colocation Datacenter. cluster-trace-microarchitecture-v2022 first provides AMTrace (Alibaba Microarchitecutre Trace).Our analysis paper, accepted by SoCC '21, is available here. See the subdirectory ( trace_2021) for more details. The traces the first released to introduce the runtime metrics of microservices in the production cluster, including call dependencies, respond time, call rates, and so on. cluster-trace-microservices-v2021 contains 20000+ microservices in a period of 12 hours.Our analysis paper, published in NSDI '22, is also available here. See the subdirectory ( pai_gpu_trace_2020) for the released data, schema, and scripts for processing and visualization. It describe the AI/ML workloads in the MLaaS (Machine-Learning-as-a-Service) provided by the Alibaba PAI (Platform for Artificial Intelligence) on GPU clusters. cluster-trace-gpu-v2020 includes over 6500 GPUs (on ~1800 machines) in a period of 2 months.Download link is available after a survey (less than a minute, survey link). See related documents for more details ( trace_2018). Besides having larger scaler than trace-v2017, this piece trace also contains the DAG information of our production batch workloads. cluster-trace-v2018 includes about 4000 machines in a period of 8 days.Download link is available after a short survey ( survey link). To see more about this trace, see related documents ( trace_2017). ![]() The trace-v2017 firstly introduces the collocation of online services (aka long running applications) and batch workloads.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |