TestingLustreCode: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
(Fix URL)
(→‎Creating a Test Case: link to more comprehensive "Test Descriptions" page)
Line 25: Line 25:
The first step in fixing a bug is to create or find a test case that causes the bug to be reproduced, then fix the bug, and finally verify that the code containing the bug passes the test.
The first step in fixing a bug is to create or find a test case that causes the bug to be reproduced, then fix the bug, and finally verify that the code containing the bug passes the test.


Often if a defect is found in Lustre, it is because a test script in the Lustre acceptance test suite is not testing the failing code. Before starting work on a bug, first check if a test script reproduces the bug. If not, you will need to create a new test and add it to the test suite as a scripted sub-test of one of these tests:
Often if a defect is found in Lustre, it is because a test script in the Lustre acceptance test suite is not testing the failing code. Before starting work on a bug, first check if a test script reproduces the bug. If not, you will need to create a new test and add a sub-test to the test suite (normally) in one of the existing test scripts.  See [[Test Descriptions]] for a full list of tests.  The list below contains only the most commonly run tests.


{| class="wikitable"
{| class="wikitable"
Line 37: Line 37:
|-
|-
| '''lnet-selftest.sh''' || Test basic LNet functionality and the LNet Self Test tools.
| '''lnet-selftest.sh''' || Test basic LNet functionality and the LNet Self Test tools.
|-
| '''lustre-rsync-test.sh''' || Test the lustre_rsync tool and ChangeLogs.
|-
| '''mmp.sh''' || Test ldiskfs multi-mount protection functionality.
|-
| '''ost-pools.sh''' || Tests OST pools (named groupings of OSTs).
|-
|-
| '''recovery-small.sh''' || Tests to verify RPC replay after communications failure (message loss).
| '''recovery-small.sh''' || Tests to verify RPC replay after communications failure (message loss).
Line 51: Line 45:
|-
|-
| '''replay-ost-single.sh''' || Tests to verify recovery after OSS failure.
| '''replay-ost-single.sh''' || Tests to verify recovery after OSS failure.
|-
| '''sanity-gss.sh''' || Test GSSAPI used by Kerberos.
|-
| '''sanity-krb5.sh''' || Test Kerberos authentication and encryption.
|-
|-
| '''sanity-hsm.sh''' || Test Hierarchical Storage Management and ChangeLog functionality.
| '''sanity-hsm.sh''' || Test Hierarchical Storage Management and ChangeLog functionality.
Line 63: Line 53:
|-
|-
| '''sanity-quota.sh''' || Test quota accounting and enforcement.
| '''sanity-quota.sh''' || Test quota accounting and enforcement.
|-
| '''sanity-sec.sh''' || Tests security and authentication (root squash, UID/GID mapping, etc).
|-
| '''insanity.sh''' || Tests multiple concurrent failure conditions (clients and servers).
|-
|}
|}



Revision as of 14:20, 20 August 2015

(Copied from old wiki June 2015 - last update Dec 2009)

We recommend a "test early, test often" approach to testing.

  • If you are developing a new feature for Lustre™, designing tests to exercise the new feature early in the development process will allow you to test your code as you develop it.
  • If you are fixing a bug in Lustre, creating a regression test up front will ensure that you can reproduce the reported problem and then verify that it has been fixed. And it will save you the effort of testing the fix manually and then creating a separate regression test later to submit with your bug fix.

We provide several tools to help with testing Lustre code:

To find out more about testing for upcoming Lustre releases, see Lustre Test Plans.

Using the Lustre Testing Framework

Before you submit code, it must pass the acceptance-small acceptance test suite. We recommend you run the test suite often so that you can find out as soon as possible if your code changes result in a regression.

The acceptance-small test suite is run using the script acceptance-small.sh, which is located in the lustre/tests directory of a compiled Lustre tree. For more details, see Acceptance Small (acc-sm) Testing on Lustre.

Note: When Submitting Changes, in Gerrit it will automatically launch a series of tests for that patch with different configurations (ldiskfs vs. ZFS, one vs. multiple MDS nodes) and will mark the patch with Verified: +1 to indicate that it passed.

Creating a Test Case

The first step in fixing a bug is to create or find a test case that causes the bug to be reproduced, then fix the bug, and finally verify that the code containing the bug passes the test.

Often if a defect is found in Lustre, it is because a test script in the Lustre acceptance test suite is not testing the failing code. Before starting work on a bug, first check if a test script reproduces the bug. If not, you will need to create a new test and add a sub-test to the test suite (normally) in one of the existing test scripts. See Test Descriptions for a full list of tests. The list below contains only the most commonly run tests.

runtests Very basic test of mounting, copying files, remounting, verifying files are still correct, remounting, and unlinking files.
sanity.sh Tests to verify basic functionality under normal operating conditions.
sanityn.sh Tests to verify operations from two clients under normal operating conditions.
conf-sanity.sh Tests configuration issues (formatting, mounting, different tunables).
lnet-selftest.sh Test basic LNet functionality and the LNet Self Test tools.
recovery-small.sh Tests to verify RPC replay after communications failure (message loss).
replay-single.sh Tests to verify recovery after MDS failure.
replay-dual.sh Tests to verify recovery from two clients after server failure.
replay-ost-single.sh Tests to verify recovery after OSS failure.
sanity-hsm.sh Test Hierarchical Storage Management and ChangeLog functionality.
sanity-lfsck.sh Tests LFSCK to detect and fix Lustre-level inter-server consistency issues.
sanity-scrub.sh Tests LFSCK to detect and fix OSD-level filesystem consistency/corruption issues.
sanity-quota.sh Test quota accounting and enforcement.

Bypassing Failures

If one or more tests in acceptance-small are regularly failing due to an issue not related to the bug you are fixing, you may want to bypass these tests so that you can test your bug fix. Complete these steps:

1. Check to see if a bug has been logged for the failure and file a new bug if one has not yet been opened.

2. Set environment variables to skip these specific tests until you or someone else fixes them. For example, to skip sanity.sh subtest 36g and 65, replay-single.sh subtest 42, and all of insanity.sh, set in your environment:

export SANITY_EXCEPT="36g 65"
export REPLAY_SINGLE_EXCEPT=42
export INSANITY=no
You can also skip these tests using a command. For example, when running acceptance-small, enter:
SANITY_EXCEPT="36g 65" REPLAY_SINGLE_EXCEPT=42 INSANITY=no ./acceptance-small.sh

The test framework is very flexible and provides an easy "hands-off" way of running tests while you are doing other things like coding. By running the entire test suite regularly, you will soon know whether your code has introduced a new bug or revived an old one.

Note: Questions or problems with the test framework should be emailed to the lustre-discuss mailing list so that all Lustre users can benefit from the solution.

Test Framework Options

The examples below show how to run a full test or sub-tests from the acceptance-small suite.

  • Run all tests including the standard tests (sanity*) with the default (local.sh) setup.
$ cd lustre/tests
$ sh acceptance-small.sh
  • Run only the recovery-small.sh, replay-single.sh, and conf-sanity.sh tests.
$ ACC_SM_ONLY="recovery-small replay-single conf-sanity" sh acceptance-small.sh
  • Run acceptance-small with a different configuration. Looks for configuration file myth.sh under lustre/tests/cfg.
$ NAME="myth" sh acceptance-small.sh
  • Run only tests 1, 3, 4, 6, 9 in sanity.sh with a custom configuration example1.sh.
$ ONLY="1 3 4 6 9" NAME=example1 sh sanity.sh
  • Skip tests 1 ... 30 and run remaining tests in sanity.sh.
$ EXCEPT="`seq 1 30`" sh sanity.sh
  • Clean up after an example1.sh test failure (normally the system is left mounted for debugging after a failure).
$ NAME=example1 sh llmountcleanup.sh
  • Clean up replay-single.sh after a test failure (normally the system is left mounted for debugging after a failure).
$ ONLY=cleanup sh replay-single.sh

Note: The acceptance-small suite includes two configuration files: lustre/tests/cfg/local.sh is used as the default configuration file and lustre/tests/cfg/ncli.sh is used for environments with multiple clients.

Adding New Tests

You can easily add a test to one of the above scripts. Failures can be injected into the Lustre kernel code and monitored using:

  • OBD_FAIL_CHECK()
  • OBD_FAIL_RACE()
  • OBD_FAIL_TIMEOUT()

Or you can make runtime changes using lctl set_param fail_loc=....