Opportunities for container environments on Cray XC30 with GPU devices
Lucas Benedicic, Miguel Gila, Sadaf Alam, Thomas C. Schulthess
Thanks to the significant popularity gained lately by Docker, the HPC community has recently started exploring container technology and potential benefits its use would bring to the users of supercomputing systems like the Cray XC series. In this paper, we explore feasibility of diverse, nontraditional data and computing oriented use cases with practically no overhead thus achieving native execution performance. Working in close collaboration with NERSC and an engineering team at Nvidia, CSCS is working on extending the Shifter framework in order to enable GPU access to containers at scale. We also briefly discuss the implications of using containers within a shared HPC system from the security point of view to provide service that does not compromise the stability of the system or the privacy of the use. Furthermore, we describe several valuable lessons learned through our analysis and share the challenges we encountered.
Keywords Linux containers; Docker; GPU; GPGPU; HPC systems
In contrast with the now long known hypervisor-based virtualization technologies, containers provide a level of virtualization which allows running multiple isolated user-space instances on top of a common host kernel. Since a hypervisor emulates the hardware, both the “guest” operating system and the “host” operating system run different kernels. The communication of the “guest” system with the actual hardware is implemented through an abstraction layer provided by the hypervisor. Clearly, this software layer creates a performance overhead due to the mapping between the emulated and bare-metal hardwares. Containers, on the other hand, are light, flexible and easy to deploy. Their size is measured in megabytes, which is much less than hypervisors that require a much larger software stack and gigabytes of memory. This characteristic makes containers easily transferable across nodes within an HPC system (horizontal scaling), and deployable within one compute node and thereby increasing its density (vertical scaling).
As the role of graphics processing units (GPUs) is becoming increasingly important in providing power-efficient and massively-parallel computational power to the scientific community in general and HPC in particular. It is well known that even a single GPU-CPU framework provides advantages that multiple CPUs on their own do not offer due to the distinguished design of discrete GPUs.
Despite previous studies on GPU virtualization, the possibilities provided by different virtualization approaches in a strict HPC context still remain unclear. The lack of standardized designs and tools that would enable container access to GPU devices means this is still an active area of research. For this reason, it is important to understand the tradeoffs and the technical requirements that container-based technology imposes on GPU devices when deployed in a hybrid supercomputing system. One example of such a system is the Cray XC30 called Piz Daint, which is in production at the Swiss National Supercomputing Center (CSCS) in Lugano, Switzerland. The system features 28 cabinets with a total of 5,272 compute nodes, each of which is equipped with an 8-core 64-bit Intel SandyBridge CPU (Intel® Xeon® E5-2670), an Nvidia® Tesla® K20X with 6 gigabytes of GDDR5 memory, and 32 gigabytes of host memory.
Working in close collaboration with the National Energy Research Scientific Computing Center (NERSC) and an engineering team of the Nvidia CUDA division, CSCS is working on extending the Shifter framework (Jacobsen 2015) in order to enable scalable GPU access from containers. Container environment opened up opportunities to enable workload that were typically constrained by a specialized light weight operating system. It allows CSCS to consolidate workloads and workflows that currently require dedicated clusters and specialized systems. As an example, by closely collaborating with the Large Hadron Collider (LHC) experiments ATLAS, CMS and LHCb and their Swiss representatives in the Worldwide LHC Computing Grid (WLCG), CSCS is able to utilize the Shifter framework to enable complex, specific High Energy Physics (HEP) workflows on our Cray supercomputers.
The preliminary results of this work show an improvement in vertical scaling of the system and consolidation of complex workflows with minimal impact to users and performance. This is possible thanks to the deployment of multiple independent containers (processes) sharing the same GPU device. The increased density can significantly improve the overall performance of distributed, GPU-enabled applications by increasing GPU utilization and, at the same time, reducing their communication footprint. Additionally, it is also possible to tailor specific versions of the CUDA toolkit and scientific libraries to different applications without having to perform a complex configuration at the system level. This use case is even feasible for different applications sharing the same compute node. Using examples and results of a subset of LHC experiments workflows, we demonstrate that there is a minimal impact to user interface (job submission script) and utilization of resources as compared to a dedicated environment.
The layout of the paper is as follows: we begin with the motivation for this work, specifically extension of containers to include GPU resources and design challenges that are associated with incorporating one and more GPU devices. This will be followed by implementation details for GPU and LHC workflows in the Cray environment. In section 4, we describe vertical scaling of the solution to accommodate GPU and node sharing for multiple containers. We conclude with future plans and opportunities to build on our efforts.