File:LUG2019-Lustre Public Clouds-Purushothaman.pdf

An opportunistic trend that is enticing enterprises, national lab and other scientific institutions to move applications out of on-premises infrastructure is the advent of cloud computing. But not every workload and application workflow is suited for the cloud. Many traditional HPC workloads, like scientific computing, modelling and simulation need large and dynamically changing compute resources, and cloud offers a cheaper alternative to on-premises. There are many efforts by application vendors in HPC (Rescale) and EDA (Synopsys, Cadence). In this session, we will go over the challenges and opportunities of running these workloads, especially opportunities and challenges and lessons learned from running Lustre on the public cloud for achieving HPC application requirements. Following are the examples of challenges that we will be discussing in the session.

Challenge #1 Lustre complexity : In every high-performance file system evaluation, Lustre gets lower ratings for complex setup, configuration and maintenance. We will go over these challenges in the context of the public cloud. We will discuss some cloud-specific Lustre architectures that can simplify the installation and maintenance of Lustre. In addition, we will also discuss how features like DoM & Project Quotas can be leveraged to achieve performance and operational goals on the public cloud.

Challenge #2: Cloud Provisioning: Cloud consumption of resources is fairly simple compared to on-premises resources. Bringing traditional applications, like HPC and EDA, has its own challenges. We will discuss some opportunities which cloud provides to simplify and extend overall HPC application architecture. We will discuss efforts in terms of moving broader HPC applications to cloud, including the complexities of choosing cloud infrastructure based on price/performance. We will also discuss some lessons learned in terms of optimizing the cloud resource consumption while achieving the best performance.

Challenge #3 Data synchronization: Application data is generated in numerous locations. The challenge is to bring data close to applications. This is more important with applications running on the cloud. We will discuss challenges of bringing data to HPC applications on the cloud but also sharing data and results with other applications and consumers who may not be in the same cloud.

Throughout the presentation, we will be showing performance results and live or recorded demos to illustrate the points that are discussed.