Difference between revisions of "Using Maloo"

From Lustre Wiki
Jump to navigation Jump to search
(Created page with "__NOTOC__ = Using Maloo = == Introduction == Maloo is Whamcloud's shared database of Lustre testing results, provided to make it easy for those testing Lustre to collaborat...")
 
(describe patch failure review process, general improvements/updates)
Line 5: Line 5:
 
== Introduction ==
 
== Introduction ==
  
Maloo is Whamcloud's shared database of Lustre testing results, provided to make it easy for those testing Lustre to collaborate and track progress. Maloo is available for interested parties to upload and view test data.
+
Maloo is Whamcloud's shared database of Lustre testing results, allowing those testing Lustre to collaborate and track progress. Maloo is available for interested parties to upload and view test data in order to review results for automated patch regression testing, help isolate intermittent test failures, and review past test results.  The Maloo database contains several years of test results, from testing every version of every patch submitted to the Lustre tree.  
  
 
== Getting started ==
 
== Getting started ==
  
The Maloo server is located at https://testing.whamcloud.com. At present, all access to view and upload results requires a user account. You can register for an account using the "Sign up now" link on the login page. A newly created account is not usable until an administrator validates it – we aim to validate accounts within 24 hours (usually much sooner).
+
Maloo is located at [https://testing.whamcloud.com]. At present, all access to view and upload results requires a user account in order to avoid abuse of a publicly-available system resource. You can register for an account using the [https://testing.whamcloud.com/signup New User?] link on the login page. A newly created account is not usable until an administrator validates it – we aim to validate accounts within 24 hours (usually much sooner).
  
Once your account has been validated, when you log in you will see a list of the latest test results.
+
Once your account has been validated, when you log in you will by default see a list of the latest test results.
  
== Viewing test results ==
+
== Viewing Patch Test Results ==
  
The main page view (''sessions'' from the menu at the top of the page) shows all test sessions which have been run, in reverse chronological order. This page shows the list of test runs uploaded, which can be filtered using the criteria available at the top of the page.
+
Most Lustre developers will interact with Maloo because of automatic test session results linked to patches in [[Using_Gerrit|Gerrit]].  Each patch will run several different ''test sessions'' for different system configurations (e.g. ''<code>review-dne-zfs-part-4</code>'', ''<code>review-dne-selinux-ssk-part-1</code>'', ''<code>review-ldiskfs-ubuntu</code>'', etc.) and each test session typically contains a number of ''test scrips'' (e.g. ''<code>sanity</code>'', ''<code>conf-sanity</code>'', ''<code>replay-single</code>'', ''<code>racer</code>'', etc.).  Each test script in a session may have a number of ''subtests'' (e.g. ''<code>test_1b</code>''), which are the individual test items that get marked PASS/FAIL, and their results are aggregated up to the session level.  The purpose of splitting the testing for a single patch into multiple test sessions is to be able to run these tests in parallel and obtain PASS/FAIL results for all sooner.
  
Each test session contains a number of test sets that represent the results for a single patch. Each test set in a session may have a number of sub tests, these are the individual test items that get marked pass/fail and their results are aggregated up to the session level. There will often be multiple test sessions for a single patch build, each one running a different test group. The purpose of splitting a single patch into multiple test groups is to be able to run these tests in parallel and obtain pass/fail results for all sooner.
+
Each session will report its status into Gerrit on the patch that triggered the testing in a format similar to that shown below:
 +
 
 +
    '''Maloo'''                                                                                                      10-16 07:56
 +
   
 +
    Patch Set 4:
 +
   
 +
    Failed enforced test ''review-dne-zfs-part-1'' on CentOS 8.3/x86_64 uploaded by Trevis Autotest 2 from trevis-206vm5:
 +
    https://testing.whamcloud.com/test_sessions/74f65aac-d2c0-47ed-8b3a-85d526cc4f31 ran 4 tests. '''1 tests failed''': sanity.
 +
 
 +
The unique link allows viewing the aggregate results of all test scripts run for that session (<code>PASS</code> results in green, <code>FAIL</code> results in red, <code>SKIP</code> results in orange), and further links allow drilling down into the results of the failing script(s), and the failing subtest(s) to see the test output, client and server console logs, as well as kernel debug logs.
  
 
== Searching Maloo ==
 
== Searching Maloo ==
  
On the top of each Maloo page the [results-&gt;search] menu item allows searching the test results by a variety of parameters. Commonly, one wants to search for specific test results (e.g. sanity) that have FAILed in the past week to see if a test failure is unique to the patch being tested, or if it is being hit intermittently for other patches as well. In addition to searching for the full test results, it is also possible to select the [Sub test] option at the top of the page to search for a specific subtest result (e.g. sanity test_102). Other search parameters include the test group, host distro, host architecture, backing filesystem type (ldiskfs or zfs), and others. 
+
On the top of each Maloo page the [https://testing.whamcloud.com/search Results-&gt;Search] menu item allows searching the test results by a variety of parameters. Commonly, one wants to search for specific <code>Sub test</code> results (e.g. [https://testing.whamcloud.com/search?horizon=518400&status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=ae2fb3e0-f265-11e8-815b-52540065bddc&source=sub_tests#redirect <code>sanity</code> subtest <code>test_418</code> that have <code>FAIL</code>ed in the past week]) to see if a test failure is unique to the patch being tested, or if it is being hit intermittently for other patches as well, and whether an existing JIRA ticket has already been associated with the failure. Available search parameters include the test group, date range, Git development branch, server filesystem type (ldiskfs or zfs), client or server distro, architecture, and others.
 +
 
 +
Because Maloo stores the results for millions of subtests, searches should typically be constrained to the smallest region of interest possible, to reduce the volume of returned results, and to minimize the search time.  Typically, test results are most relevant within the past week, and should be limited to a specific test script and subtest, as well as a failure type (<code>FAIL</code> or <code>TIMEOUT</code>).  When trying to isolate the introduction of an intermittent subtest failure, selecting a specific date range in 4- or 8-week intervals provides results in a reasonable timeframe.
 +
 
 +
== Annotating Test Failures ==
 +
 
 +
When reviewing test results for a patch, any test failures for that patch should be annotated with a specific LU ticket number in Jira to help track the source of failures, and to allow intermittent test failures to be resubmitted for testing.  The <code>Associate bug...</code> link at the bottom of the '''subtest''' failure should be used (e.g. <code>test_418</code> failed with LU-13997), rather than for the whole test script, unless the script reported a failure with no associated subtest failure.  If there is a test failure that is '''not''' associated with an existing LU ticket '''and''' has been seen on multiple different (unrelated) patches, then the <code>Raise bug...</code> button for the '''subtest''' can be used to file a new Jira ticket for tracking that failure.
 +
 
 +
When searching Maloo for specific <code>Sub test</code> failures, it is possible to mass-Associate the failed subtests to a specific Jira LU ticket number, if you are confident that all of the failures have the same cause.  One or more subtests can be selected in the leftmost column, and then <code>Associate bugs...</code> selected to specify the LU ticket number.
  
== Uploading test results ==
+
== Viewing All Sessions ==
  
''Please note that Maloo only accepts properly formatted result uploads.''
+
The main page view ([https://testing.whamcloud.com/test_sessions Sessions] from the menu at the top of the page) shows test sessions which have completed recently, in reverse chronological order. This page shows the list of test sessions uploaded, which can be filtered using the criteria available at the top of the page, such as the build number, client or server distro, CPU architecture, etc.
  
The format of test results is the YAML markup generated by the Lustre test framework. The output of the tests consists of a number of YAML files and log files. Maloo receives these results in a gzipped tar file of the tests results for each test session. The files must be in the root of the tar file.
+
== Uploading test results ==
  
 
=== Autotest ===
 
=== Autotest ===
  
The vast majority of test uploads are initiated by the Whamcloud Autotest application which continually kicks off test runs as patches land in the Lustre code repository, the application then monitors when the test completes and uploads the results automatically to Maloo.
+
The vast majority of test uploads are initiated by the Whamcloud Autotest application which automatically runs test sessions as patches are pushed to Gerrit, the application then monitors when the test completes and uploads the results automatically to Maloo.
  
 
=== Manual upload ===
 
=== Manual upload ===
  
Select [results-&gt;upload] from the menu at the top of the page to access the manual upload page, there are three ways to manually upload results:
+
''Please note that Maloo only accepts properly formatted result uploads.''
 +
 
 +
The format of test results is the YAML markup generated by the Lustre test framework. The output of the tests consists of a number of YAML files and log files. Maloo receives these results in a gzipped tar file of the tests results for each test session. The files must be in the root of the tar file.
 +
 
 +
Select [https://testing.whamcloud.com/import_tasks/new Results-&gt;Upload] link from the menu at the top of the page to access the manual upload page.  There are three ways to manually upload results:
  
 
==== Web Upload ====
 
==== Web Upload ====
Line 41: Line 62:
 
If you have a set of results that need to be uploaded, simply choose the file and select upload from the page, the results will be available a few seconds after the upload completes.
 
If you have a set of results that need to be uploaded, simply choose the file and select upload from the page, the results will be available a few seconds after the upload completes.
  
==== command-line ====
+
==== Command-line ====
  
For command-line uploads you will need to download the .maloorc configuration file which is available from the [results-&gt;upload] page.
+
For command-line uploads you will need to download the <code>.maloorc</code> configuration file which is available from the [https://testing.whamcloud.com/import_tasks/new Results-&gt;Upload] page.
  
 
===== Using auster =====
 
===== Using auster =====
  
The auster script in the lustre source tree can automatically upload results when a test run completes. In order to do this, the .maloorc file is required. This file provides the address of the server, and a unique token associating uploaded results with your user account.
+
The <code>lustre/tests/auster</code> script in the Lustre source tree can automatically upload results when a test run completes. In order to do this, the <code>.maloorc</code> file is required. This file provides the address of the server, and a unique token associating uploaded results with your user account.
  
 
=====  maloo_upload.sh =====
 
=====  maloo_upload.sh =====
  
You can download the maloo_upload.sh shell script and configuration file from the [results-&gt;upload] page which can be used to perform uploads from the command line.
+
You can use the <code>lustre/tests/maloo_upload.sh</code> shell script in the Lustre source tree and configuration file from the [https://testing.whamcloud.com/import_tasks/new Results-&gt;Upload] page which can be used to perform uploads from the command line.
  
  
 
[[Category:Howto]]
 
[[Category:Howto]]

Revision as of 22:56, 19 October 2021


Using Maloo

Introduction

Maloo is Whamcloud's shared database of Lustre testing results, allowing those testing Lustre to collaborate and track progress. Maloo is available for interested parties to upload and view test data in order to review results for automated patch regression testing, help isolate intermittent test failures, and review past test results. The Maloo database contains several years of test results, from testing every version of every patch submitted to the Lustre tree.

Getting started

Maloo is located at [1]. At present, all access to view and upload results requires a user account in order to avoid abuse of a publicly-available system resource. You can register for an account using the New User? link on the login page. A newly created account is not usable until an administrator validates it – we aim to validate accounts within 24 hours (usually much sooner).

Once your account has been validated, when you log in you will by default see a list of the latest test results.

Viewing Patch Test Results

Most Lustre developers will interact with Maloo because of automatic test session results linked to patches in Gerrit. Each patch will run several different test sessions for different system configurations (e.g. review-dne-zfs-part-4, review-dne-selinux-ssk-part-1, review-ldiskfs-ubuntu, etc.) and each test session typically contains a number of test scrips (e.g. sanity, conf-sanity, replay-single, racer, etc.). Each test script in a session may have a number of subtests (e.g. test_1b), which are the individual test items that get marked PASS/FAIL, and their results are aggregated up to the session level. The purpose of splitting the testing for a single patch into multiple test sessions is to be able to run these tests in parallel and obtain PASS/FAIL results for all sooner.

Each session will report its status into Gerrit on the patch that triggered the testing in a format similar to that shown below:

   Maloo                                                                                                       10-16 07:56
   
   Patch Set 4:
   
   Failed enforced test review-dne-zfs-part-1 on CentOS 8.3/x86_64 uploaded by Trevis Autotest 2 from trevis-206vm5:
   https://testing.whamcloud.com/test_sessions/74f65aac-d2c0-47ed-8b3a-85d526cc4f31 ran 4 tests. 1 tests failed: sanity.

The unique link allows viewing the aggregate results of all test scripts run for that session (PASS results in green, FAIL results in red, SKIP results in orange), and further links allow drilling down into the results of the failing script(s), and the failing subtest(s) to see the test output, client and server console logs, as well as kernel debug logs.

Searching Maloo

On the top of each Maloo page the Results->Search menu item allows searching the test results by a variety of parameters. Commonly, one wants to search for specific Sub test results (e.g. sanity subtest test_418 that have FAILed in the past week) to see if a test failure is unique to the patch being tested, or if it is being hit intermittently for other patches as well, and whether an existing JIRA ticket has already been associated with the failure. Available search parameters include the test group, date range, Git development branch, server filesystem type (ldiskfs or zfs), client or server distro, architecture, and others.

Because Maloo stores the results for millions of subtests, searches should typically be constrained to the smallest region of interest possible, to reduce the volume of returned results, and to minimize the search time. Typically, test results are most relevant within the past week, and should be limited to a specific test script and subtest, as well as a failure type (FAIL or TIMEOUT). When trying to isolate the introduction of an intermittent subtest failure, selecting a specific date range in 4- or 8-week intervals provides results in a reasonable timeframe.

Annotating Test Failures

When reviewing test results for a patch, any test failures for that patch should be annotated with a specific LU ticket number in Jira to help track the source of failures, and to allow intermittent test failures to be resubmitted for testing. The Associate bug... link at the bottom of the subtest failure should be used (e.g. test_418 failed with LU-13997), rather than for the whole test script, unless the script reported a failure with no associated subtest failure. If there is a test failure that is not associated with an existing LU ticket and has been seen on multiple different (unrelated) patches, then the Raise bug... button for the subtest can be used to file a new Jira ticket for tracking that failure.

When searching Maloo for specific Sub test failures, it is possible to mass-Associate the failed subtests to a specific Jira LU ticket number, if you are confident that all of the failures have the same cause. One or more subtests can be selected in the leftmost column, and then Associate bugs... selected to specify the LU ticket number.

Viewing All Sessions

The main page view (Sessions from the menu at the top of the page) shows test sessions which have completed recently, in reverse chronological order. This page shows the list of test sessions uploaded, which can be filtered using the criteria available at the top of the page, such as the build number, client or server distro, CPU architecture, etc.

Uploading test results

Autotest

The vast majority of test uploads are initiated by the Whamcloud Autotest application which automatically runs test sessions as patches are pushed to Gerrit, the application then monitors when the test completes and uploads the results automatically to Maloo.

Manual upload

Please note that Maloo only accepts properly formatted result uploads.

The format of test results is the YAML markup generated by the Lustre test framework. The output of the tests consists of a number of YAML files and log files. Maloo receives these results in a gzipped tar file of the tests results for each test session. The files must be in the root of the tar file.

Select Results->Upload link from the menu at the top of the page to access the manual upload page. There are three ways to manually upload results:

Web Upload

If you have a set of results that need to be uploaded, simply choose the file and select upload from the page, the results will be available a few seconds after the upload completes.

Command-line

For command-line uploads you will need to download the .maloorc configuration file which is available from the Results->Upload page.

Using auster

The lustre/tests/auster script in the Lustre source tree can automatically upload results when a test run completes. In order to do this, the .maloorc file is required. This file provides the address of the server, and a unique token associating uploaded results with your user account.

 maloo_upload.sh

You can use the lustre/tests/maloo_upload.sh shell script in the Lustre source tree and configuration file from the Results->Upload page which can be used to perform uploads from the command line.