OST Pool Quotas Test Report
Regression testing
All issues that were found during work at "LU-11023 quota: quota pools for OSTs" were fixed before landing.
Below are links to the test results from the latest patchset before landing(https://review.whamcloud.com/#/c/35615/51):
Passed enforced test review-ldiskfs on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-47vm1: https://testing.whamcloud.com/test_sessions/bcfa089d-338e-4d33-9a1d-c73d053f072a ran 5 tests. Passed enforced test review-zfs on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-9vm1: https://testing.whamcloud.com/test_sessions/4d96f1b7-c651-4a50-b4b7-a8d51cb7ffcb ran 8 tests. Passed enforced test review-dne-part-1 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-10vm6: https://testing.whamcloud.com/test_sessions/4794d3c7-f2ba-4eb6-91e5-d1d8dd1d1d0b ran 6 tests. Passed enforced test review-dne-part-2 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-5vm5: https://testing.whamcloud.com/test_sessions/f91519d8-1cd0-403a-9972-43f30ac39629 ran 11 tests. Passed enforced test review-dne-selinux on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-38vm1: https://testing.whamcloud.com/test_sessions/70e2ccc7-5c9b-45bf-b02e-946b26a67832 ran 5 tests. Passed enforced test review-dne-zfs-part-2 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-19vm1: https://testing.whamcloud.com/test_sessions/2890691e-2230-44cc-bfbe-c4b8eef434b0 ran 11 tests. Passed enforced test review-dne-zfs-part-3 on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-40vm1: https://testing.whamcloud.com/test_sessions/9f1fd8a4-e676-45a3-b9cd-3dd020c56e3e ran 3 tests. Passed enforced test review-dne-part-3 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-3vm1: https://testing.whamcloud.com/test_sessions/4b48f1bf-b7db-406f-96be-6509b33e17b5 ran 3 tests. Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64 uploaded by Onyx Autotest from onyx-61vm6: https://testing.whamcloud.com/test_sessions/19adeb6b-7658-4c08-9da8-b7474a328dfc ran 10 tests. Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64 uploaded by Onyx Autotest from onyx-49vm1: https://testing.whamcloud.com/test_sessions/885887f7-e0e0-486c-a830-8993e5f284f5 ran 10 tests. Passed enforced test review-dne-zfs-part-1 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-6vm6: https://testing.whamcloud.com/test_sessions/75bec5ab-37ed-4f46-b70b-b4204c539a76 ran 6 tests. Passed enforced test review-ldiskfs-arm on CentOS 7.0/x86_64, CentOS 8.0/aarch64 uploaded by Onyx Autotest from onyx-90vm27: https://testing.whamcloud.com/test_sessions/de70c50f-fe3e-44cb-8961-e205ee6a3d1c ran 5 tests. Passed enforced test review-dne-zfs-part-4 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-5vm5: https://testing.whamcloud.com/test_sessions/32aa0561-a287-437f-8ac3-577f351d571e ran 10 tests.
Now there is no known issues related to OST Pool Quotas.
New feature testing
To test new feature were added following tests in sanity-quota.sh: 1b,1c,1d,1e,1f,1g,3b,3c,67,68,69,70,71a,71b,72. See tests description in OST Pool Quota Test Plan.
71a and 71b sometimes failed resulting in https://jira.whamcloud.com/browse/LU-13677.
Now this ticket is closed after landing "LU-13677 quota: qunit sorting doesn't work".
The last one was landed sanity-quota_1g. See the results of https://review.whamcloud.com/#/c/39469/7:
review-zfs on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/8909687c-6505-4df8-ac3b-ee5060698872 review-dne-part-4 on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/c9b86b3a-619e-4eff-9266-2c999ec4552c review-dne-zfs-part-4 on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/5798be5c-701e-40a7-a62a-b8f97621d381
Cluster that was used for performance, stress and failover testing
To this types of testing was used internal HPE cluster with following configuration:
-------------------------------------------------------------------------------------- Hostname Role Power State Service State Targets HA Partner HA Resources -------------------------------------------------------------------------------------- cslmo1700 MGMT On N/a 0 / 0 cslmo1701 None cslmo1701 (MGMT) On N/a 0 / 0 cslmo1700 None cslmo1702 MGS,(MDS) On Started 1 / 1 cslmo1703 Local cslmo1703 MDS,(MGS) On Started 1 / 1 cslmo1702 Local cslmo1704 OSS On Started 1 / 1 cslmo1705 Local cslmo1705 OSS On Started 1 / 1 cslmo1704 Local cslmo1706 OSS On Started 1 / 1 cslmo1707 Local cslmo1707 OSS On Started 1 / 1 cslmo1706 Local --------------------------------------------------------------------------------------
Lustre version at server side - lustre-2.13.56_3.10.0_957.1.3957.1.3.x4.4.35.x86_64.
All server nodes had about 62GB of total memory.
Disk sizes:
cslmo1702: /dev/md65 806G 5.2M 798G 1% /data/cslmo1703:md65 cslmo1703: /dev/md66 2.9T 117M 2.8T 1% /data/cslmo1703:md66 cslmo1706: /dev/md0 112T 11G 111T 1% /data/cslmo1706:md0 cslmo1707: /dev/md1 112T 11G 111T 1% /data/cslmo1706:md1 cslmo1704: /dev/md0 112T 17G 111T 1% /data/cslmo1704:md0 cslmo1705: /dev/md1 112T 16G 111T 1% /data/cslmo1704:md1
Load was created from 5 clients - lustre-client-2.12.4.1_cray_180_gee19431_3.10.0_957.5.1.el7.x86_64.
Each client had about 62GB of total memory.
Performance testing
As a baseline was used d0452cf - the patch before "LU-11023 quota: quota pools for OSTs"(09f9fb32).
It was compared with 09f9fb32 + 8704d14c(LU-13677 quota: qunit sorting doesn't work). 8704d14c was applied directly at 09f9fb32 as this patch solves a problem introduced in the main pool quotas patch.
1 user from 5 clients made load with mdtest and ior. Pools configuration:
global0: OST0000, OST0001 global1: OST0002, OST0003
max_dirty_mb was set to 10MB at each client
lctl set_param osc.*.max_dirty_mb=10
Quota limits:
Each user: block hard limit 1T to enforce quota. Each user: block hard limit 200G per each pool.
cslmo17 HDD IOR Throughput (DIO) | d0452cf | dd35e17 | Delta |
---|---|---|---|
Max Write: | 3670.7 MiB/sec | 3586.99 MiB/sec | -2.28% |
Max Read: | 10687.23 MiB/sec | 10694.86 MiB/sec | -0.07% |
cslmo17 HDD IOR Throughput (BIO) | d0452cf | dd35e17 | Delta |
---|---|---|---|
Max Write: | 1515.42 MiB/sec | 2109.58 MiB/sec | 39.21% |
Max Read: | 16530.82 MiB/sec | 16180.79 MiB/sec | -2.12% |
Stress testing
5 users from 5 clients make load with mdtest and ior. Pools configuration:
global0: OST0000, OST0001 global1: OST0002, OST0003
max_dirty_mb was set to 10MB at each client
lctl set_param osc.*.max_dirty_mb=10
Quota limits:
Each user: block hard limit 1T to enforce quota. Each user: block hard limit 1G per each pool.
Typical quota output for one of users:
Disk quotas for usr quota15_1 (uid 61501): Filesystem kbytes quota limit grace files quota limit grace /mnt/cslmo17 1527304 0 1073741824 - 0 0 0 - cslmo17-MDT0000_UUID 0 - 0 - 0 - 0 - cslmo17-OST0000_UUID 773000* - 773000 - - - - - cslmo17-OST0001_UUID 754304* - 754304 - - - - - cslmo17-OST0002_UUID 0 - 0 - - - - - cslmo17-OST0003_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 1527304 uid 61501 is using default file quota setting Disk quotas for usr quota15_1 (uid 61501): Filesystem kbytes quota limit grace files quota limit grace /mnt/cslmo17 1527304* 0 1048576 - 0 0 0 - Pool: cslmo17.global0 cslmo17-OST0000_UUID 773000* - 773000 - - - - - cslmo17-OST0001_UUID 754304* - 754304 - - - - - Total allocated inode limit: 0, total allocated block limit: 1527304 Disk quotas for usr quota15_1 (uid 61501): Filesystem kbytes quota limit grace files quota limit grace /mnt/cslmo17 0 0 1048576 - 0 0 0 - Pool: cslmo17.global1 cslmo17-OST0002_UUID 0 - 0 - - - - - cslmo17-OST0003_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0
When IOR makes load from uneven clients(1,3,5), it writes 7.8GB over pool limit(1G).
When IOR makes load from even clients(2,4), it writes 400MB.
Pass criteria:
- clients 1,3,5 should fail with EDQUOT on each IOR iteration
- no write failures on clients 2,4
- no kernel panics
- used space for each user should be correct at the end of testing, i.e. 0 when all files are removed
Results: passed.
Failover testing
Was used the same quota and pool configuration with stress testing. Except periodical failover/failback of servers. Depending on test "victim node" was only MDS, MDS and OSS or 2 OSS.
During failover testing was faced LU-14033. It is a FS corruption caused by IO failure at ldiskfs layer. Now there is no arguments that Pool Quotas may be connected with the issue. Furthermore, LU-14033 was also faced in a test where IOR and mdtest made load from a user without quota limits(ha-ost-ior-mdtest).
ha-mds-ior-QP
-v cslmo1703 (MDS) IORP="$(echo '"-a POSIX -b 1G -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 50M -t 1M -v -C -w -r -W -i 25 -T 15 -k"')" ha_mpi_loads="ior" POWER_DOWN="sysrqcrash"
ha-mdsostConcurrent-ior-mdtest-QP
-v cslmo1703 (MDS),cslmo1706(OSS) ha_mpi_loads="ior mdtest" IORP="$(echo '"-a PO SIX -b 3G -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 1G -t 1M -v -C -w -r -W -i 1 -T 15 -k"')" MDTESTP="$(echo '" -r -u -L -V 2 -i 10 -I 50 -z 1 -b 10 " "-T -F -r -C -u -L -V 2 -i 10 -I 50 -z 1 -b 10"')" POWER_DOWN="sysrqcrash"
Failed with LU-14033.
ha-ost-ior-QP
-v cslmo1704(OSS),cslmo1706(OSS) ha_mpi_loads="ior" IORP="$ (echo '"-a POSIX -b 1GM -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 50M -t 1M -v -C -w -r -W -i 20 -T 15 -k"')" POWER_DOWN="sysrqcrash"
Results: passed.
ha-ost-ior-mdtest
Note this test was started from user that hasn't any quota limits.
-v cslmo1704(OSS),cslmo1706(OSS) IORP="$(echo '"-f /storage/shared/cslmo17/ssf" "-f /storage/shared/cslmo17/fpp"')" MDTESTP="$(echo '"-T -F -r -C -u -L -V 1 -i 500 -I 50 -z 1 -b 10"')" ha_mpi_loads="ior mdtest" POWER_DOWN="sysrqcrash"
Failed with LU-14033.
ha-ost-ior-mdtest-QP-no-FAIL
-v cslmo1704(OSS),cslmo1706(OSS) ha_mpi_loads="ior mdtest" IORP="$(echo '"-f /storage/shared/cslmo17/ssf" "-f /storage/shared/cslmo17/fpp"')" MDTESTP="$(echo '"-T -F -r -C -u -L -V 1 -i 500 -I 50 -z 1 -b 10"')" POWER_DOWN="sysrqcrash"
Failed with LU-14033.
Interoperability testing
To check interoperability was submitted https://review.whamcloud.com/#/c/40175/ with following test parameters:
Test-Parameters: clientversion=2.12.3 testlist=sanity-quota clientdistro=el7.6 Test-Parameters: serverversion=2.12.3 testlist=sanity-quota serverdistro=el7.6 Test-Parameters: clientversion=2.10.8 testlist=sanity-quota clientdistro=el7.6 Test-Parameters: serverversion=2.10.8 testlist=sanity-quota serverdistro=el7.6
If server doesn't support PQ, newly added sanity-quota tests(1b,1c,1d,1e,1f,1g,3b,3c,67,68,69,71a,71b,72) are skipped. Only sanity-quota 70 is started even if server doesn't support QP. It checks that "old" server works correctly with a client that support QP.
Below are links to results:
server 2.13.56.7, client 2.10.8 - https://testing.whamcloud.com/test_sets/016a537c-cf20-4f57-a7b9-e13c7624438f server 2.13.56.7, client 2.12.3 - https://testing.whamcloud.com/test_sets/32dc544b-df98-4df0-9669-b87130166af4 server 2.12.3, client 2.13.56.7 - https://testing.whamcloud.com/test_sets/15326174-13f4-4970-873c-dc830e1ea1d1 server 2.10.8, client 2.13.56.7 - https://testing.whamcloud.com/test_sets/0b1c0133-2040-41f4-bc35-68a035aa60bd
Similar testing was also performed during preparing patch for landing, see patch set 39(https://review.whamcloud.com/#/c/39469/39).
No new issues was found during testing.
Compatibility testing
OST Pool Quotas with PFL
sanity-quota_71a was added to check Pool Quotas with PFL.
Passed custom-101 on CentOS 7.0/x86_64, sanity-quota_71a: https://testing.whamcloud.com/sub_tests/6418f188-5602-4036-87ae-5f7dda454c1d Passed review-dne-part-4 on RHEL 7.8/x86_64, sanity-quota_71a: https://testing.whamcloud.com/sub_tests/b92aa52f-d3d5-4ced-b15a-820ef5da8b16
OST Pool Quotas with SEL
sanity-quota_71b was added to check Pool Quotas with SEL.
Passed custom-101 on CentOS 7.0/x86_64, sanity-quota_71b: https://testing.whamcloud.com/sub_tests/d8667a2d-2ca4-4e90-8844-916a323aaebf Passed review-dne-part-4 on RHEL 7.8/x86_64, sanity-quota_71b: https://testing.whamcloud.com/sub_tests/0e822ed2-1308-4d02-9b97-9bd549878c82
OST Pool Quotas with DOM
sanity-quota_69 was added to check Pool Quotas with DOM.
Passed enforced test review-zfs on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/7283ef98-fc5d-4cbc-a2c2-890ae742b4d5 Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/81fed51d-f50b-4bc5-ad02-80bc5cff9570 Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/ba00ae21-17af-4cdb-94c0-1a078b9f5aa2 Passed enforced test review-dne-zfs-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/59439aec-fa4b-4cd4-a658-3f77136f7362
OST Pool Quotas with DNE
DNE feature goal is distributing metadata between MDTs. As currently Pool Quotas work only for OSTs and can't control metadata, DNE test cases are not needed. From OST Pool Quotas point of view there is no reason where stored metadata - it takes into account only quota acquiring requests from OSTs.