OST Pool Quotas Test Report

From Lustre Wiki
Jump to navigation Jump to search

Regression testing

All issues that were found during work at "LU-11023 quota: quota pools for OSTs" were fixed before landing.

Below are links to the test results from the latest patchset before landing(https://review.whamcloud.com/#/c/35615/51):

Passed enforced test review-ldiskfs on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-47vm1: https://testing.whamcloud.com/test_sessions/bcfa089d-338e-4d33-9a1d-c73d053f072a ran 5 tests.
Passed enforced test review-zfs on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-9vm1: https://testing.whamcloud.com/test_sessions/4d96f1b7-c651-4a50-b4b7-a8d51cb7ffcb ran 8 tests.
Passed enforced test review-dne-part-1 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-10vm6: https://testing.whamcloud.com/test_sessions/4794d3c7-f2ba-4eb6-91e5-d1d8dd1d1d0b ran 6 tests.
Passed enforced test review-dne-part-2 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-5vm5: https://testing.whamcloud.com/test_sessions/f91519d8-1cd0-403a-9972-43f30ac39629 ran 11 tests.
Passed enforced test review-dne-selinux on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-38vm1: https://testing.whamcloud.com/test_sessions/70e2ccc7-5c9b-45bf-b02e-946b26a67832 ran 5 tests.
Passed enforced test review-dne-zfs-part-2 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-19vm1: https://testing.whamcloud.com/test_sessions/2890691e-2230-44cc-bfbe-c4b8eef434b0 ran 11 tests.
Passed enforced test review-dne-zfs-part-3 on CentOS 7.0/x86_64 uploaded by Trevis Autotest2 from trevis-40vm1: https://testing.whamcloud.com/test_sessions/9f1fd8a4-e676-45a3-b9cd-3dd020c56e3e ran 3 tests.
Passed enforced test review-dne-part-3 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-3vm1: https://testing.whamcloud.com/test_sessions/4b48f1bf-b7db-406f-96be-6509b33e17b5 ran 3 tests.
Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64 uploaded by Onyx Autotest from onyx-61vm6: https://testing.whamcloud.com/test_sessions/19adeb6b-7658-4c08-9da8-b7474a328dfc ran 10 tests.
Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64 uploaded by Onyx Autotest from onyx-49vm1: https://testing.whamcloud.com/test_sessions/885887f7-e0e0-486c-a830-8993e5f284f5 ran 10 tests.
Passed enforced test review-dne-zfs-part-1 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-6vm6: https://testing.whamcloud.com/test_sessions/75bec5ab-37ed-4f46-b70b-b4204c539a76 ran 6 tests.
Passed enforced test review-ldiskfs-arm on CentOS 7.0/x86_64, CentOS 8.0/aarch64 uploaded by Onyx Autotest from onyx-90vm27: https://testing.whamcloud.com/test_sessions/de70c50f-fe3e-44cb-8961-e205ee6a3d1c ran 5 tests.
Passed enforced test review-dne-zfs-part-4 on CentOS 7.0/x86_64 uploaded by Trevis Autotest from trevis-5vm5: https://testing.whamcloud.com/test_sessions/32aa0561-a287-437f-8ac3-577f351d571e ran 10 tests.

Now there is no known issues related to OST Pool Quotas.

New feature testing

To test new feature were added following tests in sanity-quota.sh: 1b,1c,1d,1e,1f,1g,3b,3c,67,68,69,70,71a,71b,72. See tests description in OST Pool Quota Test Plan.

71a and 71b sometimes failed resulting in https://jira.whamcloud.com/browse/LU-13677.

Now this ticket is closed after landing "LU-13677 quota: qunit sorting doesn't work".

The last one was landed sanity-quota_1g. See the results of https://review.whamcloud.com/#/c/39469/7:

review-zfs on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/8909687c-6505-4df8-ac3b-ee5060698872
review-dne-part-4 on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/c9b86b3a-619e-4eff-9266-2c999ec4552c
review-dne-zfs-part-4 on CentOS 7.0/x86_64 https://testing.whamcloud.com/test_sets/5798be5c-701e-40a7-a62a-b8f97621d381

Cluster that was used for performance, stress and failover testing

To this types of testing was used internal HPE cluster with following configuration:

--------------------------------------------------------------------------------------
 Hostname   Role       Power State  Service State  Targets  HA Partner  HA Resources  
--------------------------------------------------------------------------------------
 cslmo1700  MGMT       On           N/a            0 / 0    cslmo1701   None          
 cslmo1701  (MGMT)     On           N/a            0 / 0    cslmo1700   None          
 cslmo1702  MGS,(MDS)  On           Started        1 / 1    cslmo1703   Local         
 cslmo1703  MDS,(MGS)  On           Started        1 / 1    cslmo1702   Local         
 cslmo1704  OSS        On           Started        1 / 1    cslmo1705   Local         
 cslmo1705  OSS        On           Started        1 / 1    cslmo1704   Local         
 cslmo1706  OSS        On           Started        1 / 1    cslmo1707   Local         
 cslmo1707  OSS        On           Started        1 / 1    cslmo1706   Local         
--------------------------------------------------------------------------------------

Lustre version at server side - lustre-2.13.56_3.10.0_957.1.3957.1.3.x4.4.35.x86_64.

All server nodes had about 62GB of total memory.

Disk sizes:

cslmo1702: /dev/md65                                                     806G  5.2M  798G   1% /data/cslmo1703:md65
cslmo1703: /dev/md66                                                     2.9T  117M  2.8T   1% /data/cslmo1703:md66
cslmo1706: /dev/md0                                                      112T   11G  111T   1% /data/cslmo1706:md0
cslmo1707: /dev/md1                                                      112T   11G  111T   1% /data/cslmo1706:md1
cslmo1704: /dev/md0                                                      112T   17G  111T   1% /data/cslmo1704:md0
cslmo1705: /dev/md1                                                      112T   16G  111T   1% /data/cslmo1704:md1

Load was created from 5 clients - lustre-client-2.12.4.1_cray_180_gee19431_3.10.0_957.5.1.el7.x86_64.

Each client had about 62GB of total memory.

Performance testing

in progress

Stress testing

5 users from 5 clients make load with mdtest and ior. Pools configuration:

global0: OST0000, OST0001
global1: OST0002, OST0003

max_dirty_mb was set to 10MB at each client

lctl set_param osc.*.max_dirty_mb=10

Quota limits:

Each user: block hard limit 1T to enforce quota.
Each user: block hard limit 1G per each pool.

Typical quota output for one of users:

Disk quotas for usr quota15_1 (uid 61501):
    Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
   /mnt/cslmo17 1527304       0 1073741824       -       0       0       0       -
cslmo17-MDT0000_UUID
                      0       -       0       -       0       -       0       -
cslmo17-OST0000_UUID
                 773000*      -  773000       -       -       -       -       -
cslmo17-OST0001_UUID
                 754304*      -  754304       -       -       -       -       -
cslmo17-OST0002_UUID
                      0       -       0       -       -       -       -       -
cslmo17-OST0003_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 1527304
uid 61501 is using default file quota setting
Disk quotas for usr quota15_1 (uid 61501):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
   /mnt/cslmo17 1527304*      0 1048576       -       0       0       0       -
Pool: cslmo17.global0
cslmo17-OST0000_UUID
                 773000*      -  773000       -       -       -       -       -
cslmo17-OST0001_UUID
                 754304*      -  754304       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 1527304
Disk quotas for usr quota15_1 (uid 61501):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
   /mnt/cslmo17       0       0 1048576       -       0       0       0       -
Pool: cslmo17.global1
cslmo17-OST0002_UUID
                      0       -       0       -       -       -       -       -
cslmo17-OST0003_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0

When IOR makes load from uneven clients(1,3,5), it writes 7.8GB over pool limit(1G).

When IOR makes load from even clients(2,4), it writes 400MB.

Pass criteria:

  • clients 1,3,5 should fail with EDQUOT on each IOR iteration
  • no write failures on clients 2,4
  • no kernel panics
  • used space for each user should be correct at the end of testing, i.e. 0 when all files are removed

Results: passed.

Failover testing

Was used the same quota and pool configuration with stress testing. Except periodical failover/failback of servers. Depending on test "victim node" was only MDS, MDS and OSS or 2 OSS.

During failover testing was faced LU-14033. It is FS corruption caused by IO failure at ldiskfs layer. Now there is no arguments that Pool Quotas may be connected with the issue. Furthermore, LU-14033 was also faced in a test where IOR and mdtest made load from user without quota limits(ha-ost-ior-mdtest).

ha-mds-ior-QP

-v cslmo1703 (MDS)
IORP="$(echo '"-a POSIX -b 1G -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 50M -t     1M -v -C -w -r -W -i 25  -T 15 -k"')"
ha_mpi_loads="ior"
POWER_DOWN="sysrqcrash"

ha-mdsostConcurrent-ior-mdtest-QP

-v cslmo1703 (MDS),cslmo1706(OSS)
ha_mpi_loads="ior mdtest"
IORP="$(echo '"-a PO    SIX -b 3G -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 1G -t 1M -v -C -w -r -W -i 1 -T 15 -k"')"
MDTESTP="$(echo '" -r -u -L -V 2 -i 10 -I 50 -z 1 -b 10 " "-T -F -r -C -u -L -V 2 -i 10 -I 50 -z 1 -b 10"')" 
POWER_DOWN="sysrqcrash"

Failed with LU-14033.

ha-ost-ior-QP

-v cslmo1704(OSS),cslmo1706(OSS)
ha_mpi_loads="ior"
IORP="$    (echo '"-a POSIX -b 1GM -t 1M -v -C -w -r -W -i 1 -T 15 -k" "-a POSIX -b 50M -t 1M -v -C -w -r -W -i 20  -T 15 -k"')"
POWER_DOWN="sysrqcrash"

Results: passed.

ha-ost-ior-mdtest

Note this test was started from user that hasn't any quota limits.

-v cslmo1704(OSS),cslmo1706(OSS)
IORP="$(echo '"-f /storage/shared/cslmo17/ssf" "-f /storage/shared/cslmo17/fpp"')"
MDTESTP="$(echo '"-T -F -r -C -u -L -V 1 -i 500 -I 50 -z 1 -b 10"')"
ha_mpi_loads="ior mdtest"
POWER_DOWN="sysrqcrash"

Failed with LU-14033.

ha-ost-ior-mdtest-QP-no-FAIL

-v cslmo1704(OSS),cslmo1706(OSS)
ha_mpi_loads="ior mdtest"
IORP="$(echo '"-f /storage/shared/cslmo17/ssf" "-f /storage/shared/cslmo17/fpp"')"
MDTESTP="$(echo '"-T -F -r -C -u -L -V 1 -i 500 -I 50 -z 1 -b 10"')"
POWER_DOWN="sysrqcrash"

Results: passed.

Interoperability testing

To check interoperability were used following test parameters:

Test-Parameters: clientversion=2.12.3 testlist=sanity-quota
Test-Parameters: serverversion=2.12.3 testlist=sanity-quota
Test-Parameters: clientversion=2.10.8 testlist=sanity-quota clientdistro=el7.6
Test-Parameters: serverversion=2.10.8 testlist=sanity-quota serverdistro=el7.6

These parameters last were used at patchset 39 (https://review.whamcloud.com/#/c/39469/39). Below are links to results:

Compatibility testing

OST Pool Quotas with PFL

sanity-quota_71a was added to check Pool Quotas with PFL.

Passed custom-101 on CentOS 7.0/x86_64, sanity-quota_71a: https://testing.whamcloud.com/sub_tests/6418f188-5602-4036-87ae-5f7dda454c1d
Passed review-dne-part-4 on RHEL 7.8/x86_64, sanity-quota_71a: https://testing.whamcloud.com/sub_tests/b92aa52f-d3d5-4ced-b15a-820ef5da8b16

OST Pool Quotas with SEL

sanity-quota_71b was added to check Pool Quotas with SEL.

Passed custom-101 on CentOS 7.0/x86_64, sanity-quota_71b: https://testing.whamcloud.com/sub_tests/d8667a2d-2ca4-4e90-8844-916a323aaebf
Passed review-dne-part-4 on RHEL 7.8/x86_64, sanity-quota_71b: https://testing.whamcloud.com/sub_tests/0e822ed2-1308-4d02-9b97-9bd549878c82

OST Pool Quotas with DOM

sanity-quota_69 was added to check Pool Quotas with DOM.

Passed enforced test review-zfs on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/7283ef98-fc5d-4cbc-a2c2-890ae742b4d5
Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/81fed51d-f50b-4bc5-ad02-80bc5cff9570
Passed enforced test review-dne-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/ba00ae21-17af-4cdb-94c0-1a078b9f5aa2
Passed enforced test review-dne-zfs-part-4 on CentOS 7.0/x86_64, sanity-quota_69: https://testing.whamcloud.com/sub_tests/59439aec-fa4b-4cd4-a658-3f77136f7362

OST Pool Quotas with DNE

DNE feature goal is distributing metadata between MDTs. As currently Pool Quotas work only for OSTs and can't control metadata, DNE test cases are not needed. From OST Pool Quotas point of view there is no reason where stored metadata - it takes into account only quota acquiring requests from OSTs.

References

OST Pool Quota Test Plan