Zhaolong Jian - Authorea

Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud-native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube-scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. To improve resource utilization and reduce load imbalance, we first present the Kubernetes scheduling problem as a Markov decision process and elaborately designed the state, action, and reward. Then, we design and implement DRS mointor to perceive six metrics about resource utilization to construct a comprehensive global resource view. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube-scheduler and achieve the expected scheduling target with three workloads. Compared with Kube-scheduler, DRS brings an improvement of 27.29% in resource utilization and reduce the load imbalance by 2 .90× on average, with only 3.27% CPU overhead and 0.648% communication latency.