Disk Failure Prediction

According to the observation from Microsoft Research (Vishwanath & Nagappan, 2010), disk failures accounted for a majority (78%) of replacements/faults in datacenters. In this project, we built an infrastructure to collect the long-term S.M.A.R.T. data of HDDs from over 10,000 servers in Tencent data centers and monitor disk health in real-time. The historical disk failures constituted positive samples for disk failure prediction. We applied several machine learning algorithms to train the prediction model. Our experiments indicated that our model achieved both high precision and high recall.

Avatar
Yifei Liu
Ph.D. Candidate of Computer Science

My research interests include file and storage systems and operating systems.