http://wiki.lustre.org/api.php?action=feedcontributions&user=Green&feedformat=atomLustre Wiki - User contributions [en]2024-03-29T09:37:15ZUser contributionsMediaWiki 1.39.3http://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=3717Simple Gerrit Builder Howto2019-11-25T22:58:08Z<p>Green: Added link to bigger lustretester</p>
<hr />
<div>== Simple build testing framework for Gerrit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Download [http://linuxhacker.ru/lustre-gerrit/gerrit_buildpatch.py gerrit_buildpatch.py] and [http://linuxhacker.ru/lustre-gerrit/run_build.sh run_build.sh]<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the Gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go to the Gerrit account settings and switch to "HTTP Password" tab. If you do not already have a username and password, select a username and generate a new password.<br />
# In the Gerrit working directory create file named GERRIT_AUTH with this content (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# <li value="7">Check out gerrit Lustre tree off Git</li><br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Try to build lustre against the just built kernel to ensure that it actually works and all necessary -devel packages are installed<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too). One attempt at achieving this target was implemented at https://github.com/verygreen/lustretester<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is provided as [http://linuxhacker.ru/lustre-gerrit/tree_update.sh tree_update.sh] that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.<br />
<br />
<br />
[[Category:Development]]</div>Greenhttp://wiki.lustre.org/index.php?title=File:LUG2018-Developer_Day_Client_Writeback_Cache-Drokin.pdf&diff=3161File:LUG2018-Developer Day Client Writeback Cache-Drokin.pdf2018-05-04T17:06:42Z<p>Green: </p>
<hr />
<div></div>Greenhttp://wiki.lustre.org/index.php?title=Testing_Setup_To_Induce_Race_Conditions&diff=829Testing Setup To Induce Race Conditions2015-07-27T01:18:41Z<p>Green: /* My particular setup */</p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream of one core while the other cores in this VM continue at full speed. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily, I cannot find the exact howto I followed, but it was roughly along the lines of [https://fedoraproject.org/wiki/Features/Opensharedroot Open Shared Root]).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel debug options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=828Simple Gerrit Builder Howto2015-07-27T00:07:14Z<p>Green: </p>
<hr />
<div>== Simple build testing framework for gerit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Download [http://linuxhacker.ru/lustre-gerrit/gerrit_buildpatch.py gerrit_buildpatch.py] and [http://linuxhacker.ru/lustre-gerrit/run_build.sh run_build.sh]<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the HPDD gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go o the gerrit account settings and switch to "HTTP Password" tab, select a username and generate a new password.<br />
# In the gerrit working directory create file named GERRIT_AUTH with this conten (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# <li value="7">Check out gerrit Lustre tree off HPDD git</li><br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Try to build lustre against the just built kernel to ensure that it actually works and all necessary -devel packages are installed<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too).<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is provided as [http://linuxhacker.ru/lustre-gerrit/tree_update.sh tree_update.sh] that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.</div>Greenhttp://wiki.lustre.org/index.php?title=Testing_Setup_To_Induce_Race_Conditions&diff=827Testing Setup To Induce Race Conditions2015-07-26T23:58:03Z<p>Green: </p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream of one core while the other cores in this VM continue at full speed. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel debug options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Testing_Setup_To_Induce_Race_Conditions&diff=826Testing Setup To Induce Race Conditions2015-07-26T23:57:06Z<p>Green: /* Opening up a race windows theory */</p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream of one core while the other cores in this VM continue at full speed. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Testing_Setup_To_Induce_Race_Conditions&diff=825Testing Setup To Induce Race Conditions2015-07-26T23:56:36Z<p>Green: /* Opening up a race windows theory */</p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream of one core while the other cores in this VM continue at full speed. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=824Simple Gerrit Builder Howto2015-07-26T23:49:08Z<p>Green: </p>
<hr />
<div>== Simple build testing framework for gerit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Download [http://linuxhacker.ru/lustre-gerrit/gerrit_buildpatch.py gerrit_buildpatch.py] and [http://linuxhacker.ru/lustre-gerrit/run_build.sh run_build.sh]<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the HPDD gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go o the gerrit account settings and switch to "HTTP Password" tab, select a username and generate a new password.<br />
# In the gerrit working directory create file named GERRIT_AUTH with this conten (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# <li value="7">Check out gerrit Lustre tree off HPDD git</li><br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too).<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is provided as [http://linuxhacker.ru/lustre-gerrit/tree_update.sh tree_update.sh] that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.</div>Greenhttp://wiki.lustre.org/index.php?title=How_To&diff=823How To2015-07-26T23:42:46Z<p>Green: </p>
<hr />
<div>A list of Lustre related How-To guides.<br />
<br />
* [http://wiki.lustre.org/KVM_Quick_Start_Guide How to setup a virtual Lustre Clustre on KVM] <br />
* [https://wiki.hpdd.intel.com/display/PUB/Testing+a+Lustre+filesystem How to test a Lustre filesystem] (Intel HPDD Wiki page)<br />
* [http://wiki.lustre.org/TestingLustreCode How to test Lustre Code]<br />
* [http://wiki.lustre.org/LibLustre_How-To_Guide How to use LibLustre]<br />
* [http://wiki.lustre.org/Lustre_with_ZFS_Install How to install Lustre on ZFS]<br />
* [http://wiki.lustre.org/MDT_Mirroring_with_ZFS_and_SRP How to mirror an MDT with ZFS and SRP]<br />
* [http://wiki.lustre.org/Simple_Gerrit_Builder_Howto How to add your own builder or testing bot to Lustre gerrit]</div>Greenhttp://wiki.lustre.org/index.php?title=Testing&diff=822Testing2015-07-26T23:40:33Z<p>Green: </p>
<hr />
<div>* [[Test Descriptions | Test Framework and Descriptions of Unit, Regression, and Feature Tests]]<br />
* [[Test Configuration Variables | Test Configuration and Environment Variables]]<br />
* [[Testing Howto | How To Run Lustre Tests]]<br />
<br />
----<br />
<br />
Intel HPDD's testing results database for patches: [https://testing.hpdd.intel.com/ Maloo]<br />
<br />
Intel HPDD Wiki pages related to testing:<br />
* [https://wiki.hpdd.intel.com/display/PUB/Testing+a+Lustre+filesystem Testing a Lustre filesystem]<br />
* [https://wiki.hpdd.intel.com/display/PUB/Lustre+Test+Tools+Environment+Variables Lustre Test Tools Environment Variables]<br />
* [https://wiki.hpdd.intel.com/display/PUB/Auster Auster]<br />
<br />
wiki.old.lustre.org pages related to testing:<br />
* [http://wiki.old.lustre.org/index.php/Acceptance_Small_%28acc-sm%29_Testing_on_Lustre Acceptance Small (acc-sm) Testing on Lustre]<br />
* [http://wiki.old.lustre.org/index.php/Testing_Lustre_Code Testing Lustre Code]<br />
<br />
[[Category:Testing]]<br />
* [http://wiki.lustre.org/Testing_Setup_To_Induce_Race_Conditions Testing setup to induce otherwise hard to hit race conditions]</div>Greenhttp://wiki.lustre.org/index.php?title=Testing%2BSetup%2BTo%2BInduce_Race%2BConditions&diff=821Testing+Setup+To+Induce Race+Conditions2015-07-26T23:38:13Z<p>Green: Blanked the page</p>
<hr />
<div></div>Greenhttp://wiki.lustre.org/index.php?title=SimpleGerritBuilderHowto&diff=820SimpleGerritBuilderHowto2015-07-26T23:37:39Z<p>Green: Green moved page SimpleGerritBuilderHowto to Simple Gerrit Builder Howto</p>
<hr />
<div>#REDIRECT [[Simple Gerrit Builder Howto]]</div>Greenhttp://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=819Simple Gerrit Builder Howto2015-07-26T23:37:39Z<p>Green: Green moved page SimpleGerritBuilderHowto to Simple Gerrit Builder Howto</p>
<hr />
<div>== Simple build testing framework for gerit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Download [http://linuxhacker.ru/lustre-gerrit/gerrit_buildpatch.py gerrit_buildpatch.py] and [http://linuxhacker.ru/lustre-gerrit/run_build.sh run_build.sh]<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the HPDD gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go o the gerrit account settings and switch to "HTTP Password" tab, select a username and generate a new password.<br />
# In the gerrit working directory create file named GERRIT_AUTH with this conten (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# Check out gerrit Lustre tree off HPDD git<br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too).<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is provided as [http://linuxhacker.ru/lustre-gerrit/tree_update.sh tree_update.sh] that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.</div>Greenhttp://wiki.lustre.org/index.php?title=Testing_Setup_To_Induce_Race_Conditions&diff=818Testing Setup To Induce Race Conditions2015-07-26T23:35:27Z<p>Green: Created page with "=== Rationale === Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large s..."</p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream on every core. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Testing%2BSetup%2BTo%2BInduce_Race%2BConditions&diff=817Testing+Setup+To+Induce Race+Conditions2015-07-26T23:34:07Z<p>Green: Created page with "=== Rationale === Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large s..."</p>
<hr />
<div>=== Rationale ===<br />
Typically developer-scale testing only ensures basic grade correctness of the code, but as complex products such as Lustre get deployed at the really large scale systems and subjected to really high loads, all sorts of unlikely race conditions and failure modes tend to crop up.<br />
This write-up explains how to achieve extra hardness in testing on regular hardware without involving super-scaled systems.<br />
The approach turned out to be a lot more powerful than originally anticipated, for example in it's early life a are race condition that took about a week to manifest itself on Top #10 class supercomputer only took about 15 minutes to hit in this setup.<br />
<br />
=== Opening up a race windows theory ===<br />
Typically race conditions have a very small race window sometimes only 1 CPU instruction long so they are hard to hit. This is where Virtual Machines come to the rescue. While it's feasible to create a full CPU emulator with random delays between every instruction, it's too much labor intensive.<br />
An alternative approach here is to create several virtual machines with many CPU cores allocated such that total number of cores across these VMs are greatly larger than the actual number of CPU cores available on the host. When all of these virtual machines are run at the same time with cpu-heavy loads, host kernel would preempt them at random intervals introducing big delays in execution of a perceived single instruction stream on every core. Additional CPU pressure could be excerted from outside of virtual machines by running some other cpu-heavy loads.<br />
When creating these VMs another important consideration is memory allocation, we don't really want for the virtual machines to dip into swap as that would make them really slow.<br />
<br />
=== My particular setup ===<br />
Initially I had two systems at my disposal. A 4 core i7 with HT (showing 8 cores to the host) desktop with 32G RAM and a 4 core mobile i7 with HT (also showing 8 cores to the host) laptop with 16G RAM.<br />
I have decided that 3G of RAM should be enough for the virtual machines in question, which gave me 7 VMs for the desktop (occupying 21G of RAM) and 4 VMs for the laptop (occupying 12G of RAM). Every VM also got a dedicated "virtual block device" backed by an ssd that it used as a swap. virtual CPU-wise, every VM got 8 cpu cores allocated.<br />
Here's the config of one of such VMs (libvirt on Fedora is used):<br />
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'><br />
<name>centos6-0</name><br />
<uuid>c80b11ad-552b-aaaa-9bdf-48807db09054</uuid><br />
<memory unit='KiB'>3097152</memory><br />
<currentMemory unit='KiB'>3097152</currentMemory><br />
<vcpu>8</vcpu><br />
<os><br />
<type arch='x86_64' machine='pc'>hvm</type><br />
<boot dev='network'/><br />
</os><br />
<features><br />
<acpi/><br />
<apic/><br />
<pae/><br />
</features><br />
<clock offset='utc'/><br />
<on_poweroff>destroy</on_poweroff><br />
<on_reboot>restart</on_reboot><br />
<on_crash>restart</on_crash><br />
<devices><br />
<emulator>/usr/libexec/qemu-kvm</emulator><br />
<disk type='block' device='disk'><br />
<driver name='qemu' type='raw' cache='none'/><br />
<source dev='/dev/vg_intelbox/centos6-0'/><br />
<target dev='vda' bus='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/><br />
</disk><br />
<controller type='usb' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/><br />
</controller><br />
<controller type='virtio-serial' index='0'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/><br />
</controller><br />
<interface type='bridge'><br />
<mac address='52:54:00:a1:ce:de'/><br />
<source bridge='br1'/><br />
<model type='virtio'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/><br />
</interface><br />
<serial type='pty'><br />
<target port='0'/><br />
</serial><br />
<console type='pty'><br />
<target type='serial' port='0'/><br />
</console><br />
<channel type='spicevmc'><br />
<target type='virtio' name='com.redhat.spice.0'/><br />
<address type='virtio-serial' controller='0' bus='0' port='1'/><br />
</channel><br />
<input type='tablet' bus='usb'/><br />
<input type='mouse' bus='ps2'/><br />
<graphics type='spice' autoport='yes'/><br />
<video><br />
<model type='qxl' vram='65536' heads='1'/><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/><br />
</video><br />
<memballoon model='virtio'><br />
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/><br />
</memballoon><br />
</devices><br />
<qemu:commandline><br />
<qemu:arg value='-gdb'/><br />
<qemu:arg value='tcp::1200'/><br />
</qemu:commandline><br />
</domain><br />
<br />
In order to conserve space and effort, all these VMs are network booted and nfs-rooted off the same source NFS root (Redhat-based distros allow this really easily).<br />
The build happens inside of a chrooted session into the NFS-root which allows me to then go into that same dir in every VM and run the tests out of the build tree directly without any need for building RPMs and such.<br />
It's also very important to configure kernel crash dumping support.<br />
<br />
=== Additional protections with kernel options ===<br />
Since we are on the correctness path here, another natural thing to do is to enable all heavy-handed kernel checks that are typically left off for production deployments. Esp. really expensive things like unmapping of freed memory that would allow very easy detection of use after free errors. But also spinlock checking and other useful checks.<br />
This is how my kernel debugging part of kernel config looks like for rhel6 kernel (it's even more extensive for the newer ones):<br />
CONFIG_DEBUG_KERNEL=y<br />
CONFIG_DEBUG_SHIRQ=y<br />
CONFIG_DETECT_SOFTLOCKUP=y<br />
CONFIG_LOCKUP_DETECTOR=y<br />
CONFIG_HARDLOCKUP_DETECTOR=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED=y<br />
CONFIG_BOOTPARAM_HARDLOCKUP_ENABLED_VALUE=1<br />
CONFIG_DETECT_HUNG_TASK=y<br />
CONFIG_SCHED_DEBUG=y<br />
CONFIG_SCHEDSTATS=y<br />
CONFIG_DEBUG_NMI_TIMEOUT=30<br />
CONFIG_TIMER_STATS=y<br />
CONFIG_DEBUG_OBJECTS=y<br />
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set<br />
CONFIG_DEBUG_OBJECTS_FREE=y<br />
# CONFIG_DEBUG_OBJECTS_TIMERS is not set<br />
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1<br />
CONFIG_DEBUG_SLAB=y<br />
CONFIG_DEBUG_SPINLOCK=y<br />
CONFIG_DEBUG_MUTEXES=y<br />
CONFIG_DEBUG_SPINLOCK_SLEEP=y<br />
CONFIG_STACKTRACE=y<br />
CONFIG_DEBUG_BUGVERBOSE=y<br />
CONFIG_DEBUG_INFO=y<br />
# CONFIG_DEBUG_VM is not set<br />
# CONFIG_DEBUG_VIRTUAL is not set<br />
CONFIG_DEBUG_WRITECOUNT=y<br />
CONFIG_DEBUG_MEMORY_INIT=y<br />
CONFIG_DEBUG_LIST=y<br />
CONFIG_ARCH_WANT_FRAME_POINTERS=y<br />
CONFIG_FRAME_POINTER=y<br />
CONFIG_DEBUG_PAGEALLOC=y<br />
<br />
=== Running actual tests ===<br />
Since majority of outcomes of kinds of bugs we are aiming at here are crashes, I decided to have groups of tests that would be run in an infinite loops until something crashes (this is where kernel crashdumps become useful) or hangs (this is where gdb support you can see in my VM config is useful, but you can also forcefully dump core of a hung VM with virsh dump command and crash tool knows how to use this too which is convenient at times).<br />
Due to these choices, discussed testing is not a replacement for a regular sanity testing where you do want to look at regular test failures.<br />
<br />
So far all my testing starts with:<br />
<br />
slogin root@$VMTESTNODE<br />
cd /home/green/git/lustre-release/lustre/tests<br />
<br />
and then a selection of one of the below actual testing lines:<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanity.sh ; sh llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh sanityn.sh ; sh llmountcleanup.sh ;done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done<br />
<br />
SLOW=yes TSTID=500 TSTID2=499 TSTUSR=green TSTUSR2=saslauth sh sanity-quota.sh<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh replay-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-ost-single.sh ; sh llmountcleanup.sh ; SLOW=yes REFORMAT=yes sh replay-dual.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh recovery-small.sh ; sh llmountcleanup.sh ; done<br />
<br />
while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes sh conf-sanity.sh ; sh llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d /dev/loop$i ; done ; rm -rf /tmp/* ; done<br />
<br />
Typically with 11 VMs at my disposal, I have 2 running sanity* loop, and 3 each running racer loop, replay* loop and recovery-small loop.<br />
<br />
conf-sanity loop is rarely used since it's currently broken for running off a build tree.<br />
You can see that quite a bunch of extra tests could also be made to run like this that I am not yet doing.</div>Greenhttp://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=816Simple Gerrit Builder Howto2015-07-26T22:43:13Z<p>Green: /* Simple build testing framework for gerit patches */</p>
<hr />
<div>== Simple build testing framework for gerit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Download [http://linuxhacker.ru/lustre-gerrit/gerrit_buildpatch.py gerrit_buildpatch.py] and [http://linuxhacker.ru/lustre-gerrit/run_build.sh run_build.sh]<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the HPDD gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go o the gerrit account settings and switch to "HTTP Password" tab, select a username and generate a new password.<br />
# In the gerrit working directory create file named GERRIT_AUTH with this conten (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# Check out gerrit Lustre tree off HPDD git<br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too).<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is provided as [http://linuxhacker.ru/lustre-gerrit/tree_update.sh tree_update.sh] that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.</div>Greenhttp://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto&diff=815Simple Gerrit Builder Howto2015-07-26T22:36:33Z<p>Green: Created page with "== Simple build testing framework for gerit patches == To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to..."</p>
<hr />
<div>== Simple build testing framework for gerit patches ==<br />
<br />
To ensure Lustre is properly building and working on other architectures, we need to run some tests on every commit to get early warnings for patches that break those architectures/distributions before the patches get committed to the tree.<br />
<br />
This page explains how to set up a simple build bot for the desired environment. Extending it to perform additional verifications is left as an exercise for the reader.<br />
Step-by-step guide<br />
<br />
# Get attached gerrit_buildpatch.py and run_build.sh<br />
# Install python and python-requests module<br />
# choose a working directory for the framework, this is the directory where the framework will hold its data<br />
# Go to the HPDD gerrit and either create a new account or log into an existing one you'd like to use.<br />
# Go o the gerrit account settings and switch to "HTTP Password" tab, select a username and generate a new password.<br />
# In the gerrit working directory create file named GERRIT_AUTH with this conten (replacing USERNAME and PASSWORD with username and password selected:<br />
{<br />
"review.whamcloud.com": {<br />
"gerrit/http": {<br />
"username": "USERNAME",<br />
"password": "PASSWORD"<br />
}<br />
}<br />
}<br />
# Check out gerrit Lustre tree off HPDD git<br />
# Create kernel against which you want to build Lustre<br />
# apply necessary patches to the kernel, configure and build it<br />
# Edit run_build.sh to update path to your built kernel tree and to where you checked out Lustre source.<br />
# Edit gerrit_buildpatch.py and update at least CHECKPATCH_PATHS to point at run_build.sh and BUILDER_TYPE to explain what it is your builder is testing<br />
# in the gerrit working directory create REVIEW_HISTORY file with echo "0 - - 0" >REVIEW_HISTORY (This will start at the last 500 patches or so, replace leadign 0 with unix time of a change you want to start with, use current unix time in seconds to start from that point - 1 hour, e.g. 1437884668)<br />
# Now run gerrit_buildpatch.py to make sure all is working as intended. Watch /tmp/builder_out.txt for build progress and gerrit_buildpatch.py output for overall progress.<br />
# Stop gerrit_buildpatch.py<br />
# Edit gerrit_buildpatch.py and change self.post_enabled setting from False to True - this will enable posting of the build results<br />
# Update REVIEW_HISTORY - add a line referencing more or less current time as the last line to avoid building stale changes for no good reason.<br />
# Add gerrit_buildpatch.py to your startup scripts or otherwise make it run automatically (mind the current working dir that always should be where the auth and history files are)<br />
# Once you are really-really sure results are working good and you want to make your testing results "binding" you might also update USE_CODE_REVIEW_SCORE in gerrit_buildpatch.py to True and then build failure would set a real -1 checkmark.<br />
<br />
<br />
==== Now some explanations of the constraints. ====<br />
<br />
run_build.sh depends on the prebuilt kernel source tree (on my old PowerMac kernel build takes over 1 hours, so it's impractical to build the kernel for every patch). The outputs are error code and stdout. If errro code is 0, build is considered a success and output is ignored. If the code is not 0, then output would be appended to the gerrit commet by the framework. Currently output last 20 lines preceding the failure, but in fact I guess you can copy the whole build output to some web server and print out a link to that or do something similar. Currently only very minimal Lustre build is performed but you can extend it as much as you want including a full rpm build if desired.<br />
<br />
It's possible to extend run_build.sh to also kick some basic testing in a VM or otherwise (e.g. nfs-root hosted VM that would run a subset of sanity.sh with one or several nodes right out of the build tree) and include the results either into the same run (simple) or have some sort of a dispatcher implemented that would run those tests in parallel and would post test results as they complete (more involved, you'll need to write a new posting module for this too).<br />
<br />
Finally run_build.sh has some extra locking in place that would allow you to pause builds while you perform some maintenance without stopping the gerrit_buildpatch script. A simple maintenance script is included as tree_update.sh that you would run from cron once a day to update git repo. You might want to extend it to e.g. update the kernel you are using with fresh patches every day, update kernel and other system components versions (zfs?) and so on.</div>Greenhttp://wiki.lustre.org/index.php?title=User:Green&diff=814User:Green2015-07-26T22:22:40Z<p>Green: </p>
<hr />
<div>I have been working on Lustre since 2003 when I first joined CFS, transitioning across companies as Lustre changed hands.<br />
<br />
Currently I work for Intel Corporation as part of High Performance Data Division and one of my roles is a Lustre community tree gatekeeper.<br />
<br />
I also performa architectural and code oversight for additions to Lustre.</div>Green