OST Pool Quota Test Plan

Introduction
The document is intended to describe the test plan for the Quota Pools(QP) Lustre feature.

QP are based on already existed OST pools and targeted to set limit per pool of OSTs instead of the whole FS.

More details could be found at OST Pool Quotas HLD and at OST Pool Quotas.

Authors and Inspectors
Test Plan was written by Sergey Cheremencev(sergey.cheremencev@hpe.com) and reviewed by Nathan Rutman(nathan.rutman@hpe.com) and Elena Gryaznova .

Feature Installation and Set-UP
The Quota Pools feature is new to Lustre 2.14 and does not require any commands to enable the feature. Quota limits could be set to already existed pool for user/group/project with command lfs setquota with only one new key "--pool". This should point to a pool, example: lfs setquota --pool "qpool1" -u "quota_user1" -B 10M.

Test Items

 * 1) Regression testing - acceptance testing need to be passed without regressions
 * 2) New feature testing - new tests should be implemented to verify all functionality provided by Quota Pools
 * 3) Performance testing - estimate workload added by Quota Pools
 * 4) Stress testing - check that user's granted space per each Quota Pool is correct despite of "stress" load
 * 5) Failover testing
 * 6) Interoperability testing
 * 7) Compatibility testing - check QP together with other features(DoM, PFL, DNE ethc.)

Regression testing
No new failures should be faced during t-f testing started automatically via Jenkins build.

Pass criteria: No new failed tests / regressions.

Fail criteria: Any new failing found.

New feature testing
Extra tests should be added to check that new feature works as expected. Tests should be able to create files with different striping.

Quota pools tests are introduced in sanity-quota.sh test suite.

Performance testing
This kind of testing should demonstrate probable performance regressions. As QP feature was designed to limit space consumption in a different groups(pools) of OSTs, it is desirable to implement this testing on a cluster with high number of OSTs(at least 8 osts). As QP can't be disabled I suggest to estimate 3 cases: previous lustre version without QP code, lustre with QP code but without pools and lustre with several pools and limits for each of the pool.

Below table describes performance testing for QP on a cluster with 8 OSTs. I propose to make load with IOR. As QP at current phase can't limit inode consumption, no reasons to perform any metadata testing.

To make test configuration easier I suggest to use default striping - it scopes all pools independent of OSTs assigned to each pool.

max_dirty_mb should be set to 10M for each OST.

As soft and hard limit use the same mechanisms, I guess we can skip performance testing of soft limit with quota pools. Furthermore, when client reaches soft limit, writes become synchronous that could significantly affect performance.

Stress testing
Test that QP works fine on high loaded system. Propose to mix active write with file removing or truncate. The goal is to randomly hit QP limits.

Configuration and settings: cluster with high number OSTs, several pools: 0-2, 3-5, 6-7; different limits for each user per pool, max_dirty_mb should be set depending on pool limit(for example max_dirty_mb=100 when pool hard limit is 1G).

Pass criteria: no panics, granted space for each user should be correct at the end of testing, no one user granted greater than it's minimum QP limit.

This kind of testing should be done with high number of users with at least one user per client. It is needed to make enough load at MDS. Each user with quota limit in a pool or pools causes OST to acquire or release space from MDT during IOR. That's why we need high number of users - to hit possible race conditions at new feature code. The number of clients should be also enough to avoid bottleneck effect.

Failover testing
The same as stress testing plus random OST and MDT failovers.

Pass criteria: no panics, no evictions, granted space for each user should be correct at the end of testing, no one user granted greater than it's minimum QP limit.

New lustre client with old server
Below test should be added to sanity-quota.

Old lustre client with new server
Old lustre client with new server that has QP. Pass criteria:
 * no panics and erros
 * lfs commands "quota" and "setquota" work properly

I've already done such kind of testing in WC using Test-Parameters keys: Test-Parameters: clientversion=2.12.3 testlist=sanity-quota Test-Parameters: serverversion=2.12.3 testlist=sanity-quota Test-Parameters: clientversion=2.10.8 testlist=sanity-quota clientdistro=el7.6 Test-Parameters: serverversion=2.10.8 testlist=sanity-quota serverdistro=el7.6 "Old" client doesn't fail with "new" server, i.e. old staff works as expected.

New tests that checks new lfs options running from the "new" client fail with ENOTSUPP as older server doesn't know such options.

Results could be found at https://review.whamcloud.com/#/c/35615/.

Compatibility testing - check QP together with other features(DoM, PFL, DNE etc.)
Part of this work is already done in sanity-quota. It has for example test for DOM(sanit-quota_63) - pass of this test means QP don't break DOM functionality.

New tests should be added to sanity-quota to test QP with new features.

Quota Pools with DNE
DNE feature goal is distributing metadata between MDTs. As currently Quota Pools works only for OSTs and can't control metadata, DNE test cases are not needed.

From OST quota pools point of view there is no reason where stored metadata - it takes into account only quota acquiring requests from OSTs.

Docs
OST Pool Quotas

OST Pool Quotas HLD

OST Pool Quotas Test Report

Tickets
[LU-11023]

[LU-13359]