A lot of work has been put into stabilizing Lustre over the years, and we have evolved processes to ensure we maintain the stability that so many sites worldwide now rely on for their site-wide production filesystems. In many sites, when Lustre stops working many millions of dollars of computing equipment sits idle, so stability of Lustre is the top criterion for all development.
The processes described below ensure that careful attention is paid prior to proposed changes being landed.
Landing a Patch to a Lustre Release
Note that landings for a release will freeze one month prior to GA to ensure sufficient time to do large-scale stress testing, performance benchmarking, and interoperability testing.
Patch Landing Checklist
- A JIRA ticket has been opened to track the issue
- Test your change locally using acceptance-small.sh
- Commit the change after verifying it follows the requirements for patch submission:
- The Commit Comments are well formatted and useful
- Verify patch follows Lustre Coding Style Guidelines (at least git show | contrib/scripts/checkpatch.pl - passes)
- A regression test has been created that fails without the patch and passes with the patch
- The patch has the appropriate Signed-off-by: line
- Upload the patch to Gerrit and review the test results
- Check that the newly-added test(s) for the change are passing (new or modified tests will be run repeatedly to ensure they are passing consistently)
- Review any other test failures (there may be some intermittent failures hit)
- Associate known failures with existing LU tickets (by searching Jira and/or Maloo for similar subtest failures
- Raise bug(s) for failures that are 'also seen on other patches that do not have an existing LU ticket
- Fix failures that are associated with only your ticket even if you think they are unrelated, as patches that cause failures cannot be landed
- Retest any sessions that failed with known issues, or resubmit your patch if it needs changes
- Request at least two Patch Inspection approvals on the Gerrit change (preferably ones with experience in this area of code as indicated by 'contrib/scripts/get_maintainer.pl, or on the Code Reviewers page)
- Record the Gerrit change URL in the Jira ticket (this is normally done automatically by Gerrit)
- Additional test results appropriate for the patch being landed (interoperability, performance, etc) should be attached to the Jira ticket.
- Once Maloo has reported the tests all pass and the patch has two positive reviews (other than the Author), the Gatekeeper will automatically be notified that the patch is ready to be merged
- The Gatekeeper will review the patch, confirm the test results and inspections, run additional merge testing, and submit it after local merge testing. This will take about a week after the patch is first ready to land.
- If the submission failed due to patch conflict(s) or regressions, the Gatekeeper will ask you to rebase your patch with the target branch and repeat the above steps.
- If your patch is needed for other branch(es), please repeat above steps against corresponding branch(es):
- Use the same Jira ticket and Gerrit Change-Id labels
- Add Lustre-commit: and Lustre-change: labels should
- Remove the Reviewed-on: and Tested-by: labels from the previous commit message
- If the patch has no conflicts, it can be Cherry-picked to the other branches directly from Gerrit after editing the commit message as described.
Landing a Feature to a Feature Release
There are typically new releases made every 6 months. Check the Projects page to find out if someone is already working on your project. At the beginning of the development cycle, the features that will be included into the upcoming release are decided, and a landing schedule is worked out to ensure that not all of the features try to land in the week before the code freeze. The feature code freeze will be 3 months prior to the release date, depending on the number and scope of features that are to be landed.
Before starting to think about the logistics associated with developing your feature it is imperative to share your plans with the Lustre community before you start work.
You should check the Projects page to see if anyone is already working on something similar. If someone is then add yourself as a watcher to the JIRA ticket and offer to collaborate.
If you are unable to find a match, then open a new JIRA ticket outlining your plans (more detail is better than less), including the intended purpose of the development and any initial thoughts on design. An entry should be added to the Projects page under Projects#Future_Projects with a brief summary of your feature/project. Then, mail the lustre-devel mailing list to draw attention to the ticket. This will alert the community members of your intentions and may well result in potential collaborators stepping forward.
If you choose not to do this you may find that you are either duplicating work with someone else, or that your code needs to be reworked to accommodate other changes occurring in the same part of Lustre code.
While features will not be scheduled for landing into a release until they are already close to completion, it is still important that the features themselves be discussed before or during early development. This allows developers to take into account other changes that are being worked on, to avoid conflicts in network protocol changes, code restructuring, and to ensure interoperability between releases.
It is also strongly suggested that you gain experience in the Lustre landing process by fixing one or more bugs for a maintenance release before attempting to tackle writing a Lustre feature. Look at the easy projects list, and/or feel free to ask for suggestions on the lustre-devel mailing list for a suitable bug to get started with.
Schedule and Timing
Lustre releases operate to a "train model". The schedule is fixed and will not wait for features that are not ready in time - they are deferred to the next release.
History has shown that a lengthy stabilization period is needed after all features have landed to work through any bugs introduced by the new code or due to interactions with other features that were not caught by normal regression testing. If there is sufficient testing of intermediate development releases at a large enough scale, and the release branch is stable, additional features may be landed as time permits.
For a feature release scheduled for release in month T the schedule is roughly as follows. For more precise dates, keep up to date on the lustre-devel mailing list .
- T-7 A call for features is sent out to the lustre-devel mailing list. The amount of change that can be landed for a given release is limited so it is prudent to respond early if you feel that you will have a feature that warrants consideration for inclusion. Expect to be asked to provide the information on the #Feature Landing Checklist below - either completed or with estimates as to when any missing portions will be completed. Typically, feature development is already well underway before a feature is scheduled for landing.
- T-6 Initial review of candidate features to define the scope of the release. A test plan is created and the Projects page is updated. A landing schedule is created so that feature landings are spaced out to make it easier for intermediate testing to identify when features introduce regressions. If serious regressions are found when a feature is landed then it will be reverted from master until the problems have been addressed. It should be obvious that not all changes can land in the last weeks before the feature freeze, hence the requirement that features already be close to completion before scheduling them for landing.
- T-3 Feature Freeze - feature landing is finished, bug fixes only from now on. The appropriate release page is updated.
- T-1 Code Freeze - critical bug fixes only from now on. A release candidate (RC) is tagged and release testing commences
- T0 GA announced and RPMs available for download from the download site
Feature Landing Checklist
- High level design has been reviewed and signed off by a senior Lustre engineer
- Test plan has been reviewed and signed off by a senior Lustre engineer. The test plan should include performance testing, version interoperability for both old and new servers and clients, and any feature-specific tests that may fall outside normal testing.
- Results from executing the test plan uploaded into Maloo.
- Proposed revisions to the manual have been provided.
- The criteria from the Patch Landing Checklist (above) are met.
Please alert other Lustre developers via the lustre-devel mailing list if you need some extra guidance in getting your patch submitted for the release.