File:LUG2019-Lustre Compute Canada Deployment Beluga-Guilbault.pdf

The first part of the presentation will present the current landscape of Lustre in Compute Canada. The services available to researchers will be presented, with a quick overview on the Canada-wide scientific software stack and general user environment, common across the clusters. A new near-line storage service will be available to researchers in 2019. This service is based on the Lustre HSM feature, with tape libraries based on TSM. A HSM connector to TSM tapes was developed and is now used in production. Experience on this new service will be presented.

The second part of the presentation will focus on the newest deployment of a general purpose cluster in Canada called Beluga, and managed by Calcul Quebec. This cluster is planned to be in production for the researcher in April 2019.

The presentation will list and explain the choices leading to the adoption of some of the new features of Lustre. Theses new features are used in production for the first time on a Compute Canada system: ZFS on OST and MDT, disk encryption, SAS multipath and DNE.

The provisioning system and the modification needed to the OS will be presented. Some issues and workaround encountered with the new system hardware will be discussed. For example problems with the scalability of the mpt3sas drivers, and the development of custom scripts to manage and monitor the JBODs.

Finally, a few benchmarks results will be shown using VDBench, obdfilter-survey, IOR and mdtest. A limitation in performance was observed during theses benchmark, some measurements point to a bottleneck with the memory bandwidth of the Skylake OSS with ZFS and/or LUKS.