OST Pool Quota Test Plan

From Lustre Wiki
Revision as of 07:34, 7 October 2020 by Sergey (talk | contribs) (→‎Docs)
Jump to navigation Jump to search

Introduction

The document is intended to describe the test plan for the Quota Pools(QP) Lustre feature.

QP are based on already existed OST pools and targeted to set limit per pool of OSTs instead of the whole FS.

More details could be found at OST Pool Quotas HLD and at OST Pool Quotas.

Authors and Inspectors

Test Plan was written by Sergey Cheremencev([email protected]) and reviewed by Nathan Rutman([email protected]) and Elena Gryaznova <[email protected]>.

Feature Installation and Set-UP

The Quota Pools feature is new to Lustre 2.14 and does not require any commands to enable the feature. Quota limits could be set to already existed pool for user/group/project with command lfs setquota with only one new key "--pool". This should point to a pool, example: lfs setquota --pool "qpool1" -u "quota_user1" -B 10M.

Test Items

  1. Regression testing - acceptance testing need to be passed without regressions
  2. New feature testing - new tests should be implemented to verify all functionality provided by Quota Pools
  3. Performance testing - estimate workload added by Quota Pools
  4. Stress testing - check that user's granted space per each Quota Pool is correct despite of "stress" load
  5. Failover testing
  6. Interoperability testing
  7. Compatibility testing - check QP together with other features(DoM, PFL, DNE ethc.)

Testing Tasks with Pass/Fail Criteria

Regression testing

No new failures should be faced during t-f testing started automatically via Jenkins build.

Pass criteria: No new failed tests / regressions.

Fail criteria: Any new failing found.

New feature testing

Extra tests should be added to check that new feature works as expected. Tests should be able to create files with different striping.

Quota pools tests are introduced in sanity-quota.sh test suite.

number name description exists
1b Quota pools: Block hard limit (normal use and out of quota) Create 1 pool, add all OSTs there.

Set 2 hard limits for a user - 20M at global pool and 10M at quota pool. Check that user gets EDQUOT when reaches limit set for quota pool. Check the same for group and project.

yes
1c Quota pools: check 3 pools with hardlimit only for global Create 2 pools, add all OSTs there.

Set hard limit only for global. Set limit 0M for each pool. Check that user gets EDQUOT when reaches global limit (2 pools with zero limit don't affect global pool).

yes
1d Quota pools: check block hardlimit on different pools Create 2 pools, add all OSTs there.

Set 3 hard limits for a user: 10M - pool1, 12M - pool2 and 20 for global pool. Check that user gets EDQUOT when reaches less limit(10M).

yes
1e Quota pools: global pool with high block limit vs quota pool with small Create a pool that includes only OST1.

Set hard limit for a user: 10M - pool, 200M - global pool. Create file with a stripe only at OST1 and write there to hit limit for the pool. Then create a file with stripe only at OST0 and write there over qpool1 limit - write shouldn't fail.

yes
- - The same with 1b, but now check the file with wide striping. no
3b Quota pools: block soft limit (start timer, timer goes off, stop timer) Create 1 pool, add OSTs 0,1.

Set grace time - 20 s for pool and 20*3 for global pool. Set soft limit for user: (soft_least_qunit/1024)M(4M) for pool and (soft_least_qunit*2/1024)M for global pool. Check that user gets EDQUOT after pool grace time when reaches (soft_least_qunit/1024)M. Check the same for group and project.

yes
3c Quota pools: check block soft limit on different pools Create 2 pools, add OSTs 0,1.

Limits per pool for user: Limit1 = (soft_least_qunit/1024)M, limit2 = (limit1 + 4)M, global_limit = (limit1 + 8)M. Set grace per pool: pool2_grace = 20s, pool1_grace = pool2_grace + 10s, global_grace = pool2_grace + 20s. Write up to limit2. After pool2_grace seconds write should fail with EDQUOT.

yes
67 quota pools recalculation Check that granted space is correct after each pool change(create/destroy pool, add/rem OST). yes
68 slave number in quota pool changed after each add/remove OST Add and remove OSTs to the pool and check that slave number for this pool proceeds to be correct. yes
69 EDQUOT at one of pools shouldn't affect DOM Check that DOM files are not affected when user hits hard limit in a quota pool -

write up to limit in a pool to get EDQUOT, write to DOM files shouldn't fail.

yes
70 check lfs setquota/quota with a pool option Create a pool with OST0. Set for user global limit 100M and 20M for the pool. Write 20M to a file with striping -c 1 -i 0. Do sync and check used for this pool through lfs quota -u quota_usr --pool qpool1 ... Used should be 20M. Now "used" is incorrect - it is a sum of used space gathered from all OSTs(should be only OSTs that belong to appropriate pool). There is a separate ticket for this problem LU-13359. yes

Performance testing

This kind of testing should demonstrate probable performance regressions. As QP feature was designed to limit space consumption in a different groups(pools) of OSTs, it is desirable to implement this testing on a cluster with high number of OSTs(at least 8 osts). As QP can't be disabled I suggest to estimate 3 cases: previous lustre version without QP code, lustre with QP code but without pools and lustre with several pools and limits for each of the pool.

Below table describes performance testing for QP on a cluster with 8 OSTs. I propose to make load with IOR. As QP at current phase can't limit inode consumption, no reasons to perform any metadata testing.

To make test configuration easier I suggest to use default striping - it scopes all pools independent of OSTs assigned to each pool.

max_dirty_mb should be set to 10M for each OST.

version pools configuration quota settings
without QP(2.12) no pools block hard limit(200M) for each user
3 pools: 0-2, 3-5, 6-7
with QP(2.14??) no pools
3 pools: 0-2, 3-5, 6-7
100 pools: 0-2, 3-5, 6-7 but with different names
3 pools: 0-2, 3-5, 6-7 global block hard limit for each user and limits for each user per pool, example:

glbl block hard limit 1T for each user, block hard limit 200M for each user per pool

100 pools: 0-2, 3-5, 6-7 but with different names

As soft and hard limit use the same mechanisms, I guess we can skip performance testing of soft limit with quota pools. Furthermore, when client reaches soft limit, writes become synchronous that could significantly affect performance.

Stress testing

Test that QP works fine on high loaded system. Propose to mix active write with file removing or truncate. The goal is to randomly hit QP limits.

Configuration and settings: cluster with high number OSTs, several pools: 0-2, 3-5, 6-7; different limits for each user per pool, max_dirty_mb should be set depending on pool limit(for example max_dirty_mb=100 when pool hard limit is 1G).

Pass criteria: no panics, granted space for each user should be correct at the end of testing, no one user granted greater than it's minimum QP limit.

This kind of testing should be done with high number of users with at least one user per client. It is needed to make enough load at MDS. Each user with quota limit in a pool or pools causes OST to acquire or release space from MDT during IOR. That's why we need high number of users - to hit possible race conditions at new feature code. The number of clients should be also enough to avoid bottleneck effect.

Failover testing

The same as stress testing plus random OST and MDT failovers.

Pass criteria: no panics, no evictions, granted space for each user should be correct at the end of testing, no one user granted greater than it's minimum QP limit.

Interoperability testing

New lustre client with old server

Below test should be added to sanity-quota.

number name description exists
70 check lfs setquota/quota with a pool option New lustre client that has QP feature(i.e. has new lfs with key --pool) with old server without QP.

Pass criteria:

  • no panics
  • error EFAULT when do lfs setquota -u quota_user --pool pool1 -B /mnt/lustre
  • error EFAULT when do lfs quota -u quota --pool pool1 /mnt/lustre
yes

Old lustre client with new server

Old lustre client with new server that has QP. Pass criteria:

  • no panics and erros
  • lfs commands "quota" and "setquota" work properly

I've already done such kind of testing in WC using Test-Parameters keys:

Test-Parameters: clientversion=2.12.3 testlist=sanity-quota
Test-Parameters: serverversion=2.12.3 testlist=sanity-quota
Test-Parameters: clientversion=2.10.8 testlist=sanity-quota clientdistro=el7.6
Test-Parameters: serverversion=2.10.8 testlist=sanity-quota serverdistro=el7.6

"Old" client doesn't fail with "new" server, i.e. old staff works as expected.

New tests that checks new lfs options running from the "new" client fail with ENOTSUPP as older server doesn't know such options.

Results could be found at https://review.whamcloud.com/#/c/35615/.

Compatibility testing - check QP together with other features(DoM, PFL, DNE etc.)

Part of this work is already done in sanity-quota. It has for example test for DOM(sanit-quota_63) - pass of this test means QP don't break DOM functionality.

New tests should be added to sanity-quota to test QP with new features.

Quota Pools with SEL

number name description exists
71b Check SEL with quota pools Create 2 components SEL file, where 1st component belongs to QPOOL1(OST0, OST1) second to QPOOL2(OST2, OST3).

Limit for quota_usr1 in QPOOL1 should be larger than 1st component size, for example: QPOOL1 limit is 100M, while 1st component size is 64.

For second QPOOL2 limit is 200M and 2nd component size is 100M.

Pass criteria:

  • Write 100M to SEL file shouldn't fail
yes
71b Check SEL with quota pools Create 2 components SEL file, where 1st component belongs to QPOOL1(OST0, OST1) second to QPOOL2(OST2, OST3).

Limit for quota_usr1 in QPOOL1 should be larger than 1st component size, for example: QPOOL1 limit is 100M, while 1st component size is 64.

For second QPOOL2 limit is 64M and 2nd component size is 100M.

Pass criteria:

  • Write 100M to SEL file shouldn't fail
  • Write 30M to SEL file with offset 100M should fail with EDQUOT - doing the math: 36M already in QPOOL2, +30M = 66M > 64M limit so EDQUOT. This also tests that SEL is properly working.
yes

Quota Pools with DOM

number name description exists
69 EDQUOT at one of pools shouldn't affect DOM Check that DOM files are not affected when user hits hard limit in a quota pool -

write up to limit in a pool to get EDQUOT, write to DOM files shouldn't fail.

yes

Quota Pools with DNE

DNE feature goal is distributing metadata between MDTs. As currently Quota Pools works only for OSTs and can't control metadata, DNE test cases are not needed.

From OST quota pools point of view there is no reason where stored metadata - it takes into account only quota acquiring requests from OSTs.

Quota Pools with PFL

number name description exists
71a Check PFL with quota pools Create a file with composite layout: offset 0-9M at OST0, offset 9M - end of file at OST1

2 pools: qpool1: OST0, qpool2: OST1

limits for user "quota_usr": qpool1 limit 10M, qpool2 limit 10M

quota_usr does several writes:

  • write 10M , expect success
  • write 10M offset 9M, expect success
  • sync than write 10M at offset 19M, expect EDQUOT
yes
71a Check PFL with quota pools Create a file with composite layout: offset 0-9M at OST0, offset 9M - end of file at OST1

2 pools: qpool1: OST0, qpool2: OST1

limits for user "quota_usr": qpool1 limit 10M, qpool2 limit 10M

quota_usr does several writes:

  • write 10M offset 9M, expect success
  • write 10M offset 19M, expect EDQUOT
  • write 10M offset 0, expect success
yes

References

Docs

OST Pool Quotas

OST Pool Quotas HLD

OST Pool Quotas Test Report

Tickets

[LU-11023]

[LU-13359]