Lustre Coding Style Guidelines

From Lustre Wiki
Revision as of 19:20, 15 June 2015 by Adilger (talk | contribs) (initial import from https://wiki.hpdd.intel.com/display/PUB/Coding+Guidelines using http://extensions.services.openoffice.org/en/project/wikipublisher)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Lustre Coding Guidelines

Beautiful Code

A note from Eric Barton, a Lustre pioneer and Senior Principal Engineer at Intel:

More important than the physical layout of code (which is covered in detail below) is the idea that the code should be beautiful to read.

What makes code beautiful to me? Fundamentally, it's readability and obviousness. The code must not have secrets but should flow easily, pleasurably and accurately off the page and into the mind of the reader.

How do I think beautiful code is written? Like this...

  • The author must be confident and knowledgeable and proud of her work. She must understand what the code should do, the environment it must work in, all the combinations of inputs, all the valid outputs, all the possible races and all the reachable states. She must grok it.
  • Names must be well chosen. The meaning a human reader attaches to a name can be orthogonal to what the compiler does with it, so it's just as easy to mislead as it is to inform. "Does exactly what it says on the tin" is a popular UK English expression describing something that does exactly what it tells you it's going to do, no more and no less. For example, if I open a tin labeled "soap", I expect the contents to help me wash and maybe even smell nice. If it's no good at removing dirt, I'll be disappointed. If it removes the dirt but burns off a layer of skin with it, I'll be positively upset. The name of a procedure, a variable or a structure member should tell you something informative about the entity without misleading - just "what it says on the tin".
  • Names must be well chosen. Local, temporary variables can almost always remain relatively short and anonymous, while names in global scope must be unique. In general, the wider the context you expect to use the name in, the more unique and informative the name should be. Don't be scared of long names if they help to make_the_code_clearer, but do_not_let_things_get_out_of_hand either - we don't write COBOL. Related names should be obvious, unambiguous and avoid naming conflicts with other unrelated names, e.g. by using a consistent prefix. This applies to all API procedures (if not all procedures period) within a given subsystem. Similarly, unique member names for global structures, using a prefix to identify the parent structure type, helps readability.
  • Names must be well chosen. Don't choose names that are easily confused - especially not if the compiler can't even tell the difference when you make a spelling mistake. i and j aren't the worst example - rq_reqmsg and rq_repmsg are much worse (and taken from our own code!!!).
  • Names must be well chosen. I can't emphasize this issue enough - I hope you get the point.
  • Assertions must be used intelligently. They combine the roles of active comment and software fuse. As an active comment they tell you something about the program that you can trust more than a comment. And as a software fuse, they provide fault isolation between subsystems by letting you know when and where invariant assumptions are violated. Overuse must be avoided - it hurts performance without helping readability - and any other use is just plain wrong. For example, assertions must never be used to validate data read from disk or the network. Network and disk hardware does fail and Lustre has to handle that - it can't just crash. The same goes for user input. Checking data copied in from userspace with assertions just opens the door for a denial of service attack.
  • Formatting and indentation rules should be followed intelligently. The visual layout of the code on the page should lend itself to being read easily and accurately - it just looks clean and good.
    • Separate "ideas" should be separated clearly in the code layout using blank lines that group related statements and separate unrelated statements.
    • Procedures should not ramble on. You must be able to take in the meaning of a procedure without scrolling past page after page of code or parsing deeply nested conditionals and loops. The 80-column rule is there for a reason.
    • Declarations are easier to refer to while scanning the code if placed in a block locally to, but visually separate from, the code that uses them. Readability is further enhanced by limiting declarations to one per line and aligning types and names vertically.
    • Parameters in multi-line procedure calls should be aligned so that they are visually contained by their brackets.
    • Brackets should be used in complex expressions to make operator precedence clear, but not excessively.
    • Conditional boolean (if (expr)), scalar (if (val != 0)) and pointer (if (ptr != NULL)) expressions should be written consistently.
    • Formatting and indentation rules should not be followed slavishly. If you're faced with either breaking the 80-chars-per-line rule or the parameter indentation rule or creating an obscure helper function, then the 80-chars-per-line rule might have to suffer. The overriding consideration is how the code reads.

I could go on, but I hope you get the idea. Just think about the poor reader when you're writing, and whether your code will convey its meaning naturally, quickly and accurately, without room for misinterpretation.

I didn't mention clever as a feature of beautiful code because it's only one step from clever to tricky - consider...

t = a; a = b; b = t; /* dumb swap */

a ^= b; b ^= a; a ^= b; /* clever swap */

You could feel quite pleased that the clever swap avoids the need for a local temporary variable - but is that such a big deal compared with how quickly, easily and accurately the reader will read it? This is a very minor example which can almost be excused because the "cleverness" is confined to a tiny part of the code. But when clever code gets spread out, it becomes much harder to modify without adding defects. You can only work on code without screwing up if you understand the code and the environment it works in completely. Or to put it more succinctly...

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan

IMHO, beautiful code helps code quality because it improves communication between the code author and the code reader. Since everyone maintaining and developing the code is a code reader as well as a code author, the quality of this communication can lead either to a virtuous circle of improving quality, or a vicious circle of degrading quality. You, dear reader, will determine which.

Style and Formatting Guidelines

All of our rules for formatting, wrapping, parenthesis, brace placement, etc., are originally derived from the Linux kernel rules, which are basically K&R style. Some of these rules are automatically verified at commit time by the contrib/scripts/checkpatch.pl script (formerly build/checkpatch.pl) included with newer versions of the Lustre code, but many depend on the good judgment of the coder and inspector.  Note that there is also a [../../../display/PUB/Test+Coding+Style Test Coding Style] document that describes the formatting for shell scripts used for testing.

Whitespace and Comments

Whitespace gets its own section because it is critical to helping the reader understand the logic of the code, and if there are inconsistencies between the whitespace used by different coders it can lead to confusion and hidden defects. Fixing incorrect whitespace changes can cause spurious merge conflicts when code is landed and updated in a distributed development environment. Please ensure that you comply with the guidelines in this section to avoid these issues. We've included default formatting rules for emacs and vim to help make it easier.

  • Tabs should be used in all lustre/, lnet/ and libcfs/ files. This matches the upstream Linux kernel coding style, and is the default method of code indentation.
  • NOTE NOTE NOTE The use of tabs for indentation is required since May 2012 (Lustre 2.3 and later). This is being done in order to facilitate code integration with the Linux kernel. All new patches should be submitted using tabs for ALL modified lines in the patch. If there are 6 or fewer lines using spaces for indentation between two lines changed by a patch, then all of the intervening lines should also have the indentation changed to use tabs. Similarly, if there are only a handful of lines (6-8) in or near a modified function or test that are still using spaces for indentation, convert all of the lines in that function or test to use tabs. In this manner, we can migrate consistent chunks of code over to tabs without having a 250kLOC patch breaking the commit history of every line of code, and also avoid breaking code that is in existing branches/patches that still need to merge. In a year or so, lines that still use spaces (i.e. those that are not under active development) will be converted to using tabs for indentation as well.
  • All lines should wrap at 80 characters. If it's getting too hard to wrap at 80 characters, you probably need to rearrange conditional order or break it up into more functions.
  • Tabs are preferred to align variable names in structures and function declarations.  This may temporarily misalign them with other variable names in the structure that are using spaces for indentation, but the other lines will eventually be updated to use tabs as well (maybe in the same patch if there aren't too many of them).  In some cases (long variable names, long comments, nested unions) then it isn't practical to use full tabs to align the fields and spaces can be used for partial indentation.
  • /* right */
  • void func_helper(...)
  • {
  • struct foobar_counter *foo;
  • unsigned long last;
  • int rc = 0;
  •  do_sth2_1;
  • if (cond3)
  • do_sth_which_needs_a_very_long_line_to_read_clearly;
  • do_sth2_2;
  • }
  • void func (...)
  • {
  • long first;
  • int i;
  • if (!cond1)
  • return;
  • do_sth1_1;
  • if (cond 2)
  • func_helper(...)
  • do_sth1_2;
  • }
  • /* wrong */
  • void func(...)
  • {
  • int rc;
  • struct foobar_counter *foo;
  • unsigned long last;
  • if (cond1) {
  • do_sth1_1;
  • if (cond2) {
  • do_sth2_1;
  • if (cond3) {
  • do_sth_which_needs_a_very_long_line_to_read_clearly;
  • }
  • do_sth2_2;
  • }
  • do_sth1_2;
  • }
  • }
  • Do not include spaces or tabs on blank lines or at the end of lines. Please ensure you remove all instances of these in any patches you submit to [../../../display/PUB/Using+Gerrit Gerrit]. You can find them with grep or in vim using the following regexps:
  • /\[ \t\]$/

or use git:

git diff --check

Alternatively, if you use vim, you can put this line in your .vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files):

let c_space_errors=1

Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely):

set list listchars=tab:>\ ,trail:$

In emacs, you can use (whitespace-mode) or (whitespace-visual-mode) depending on the version. You could also consider using (flyspell-prog-mode).

  • Functions that are more than a few lines long should be declared with a leading comment block that describes what the function does, any assumptions for the caller (locks already held, locks that should not be held, etc), and return values. Lustre uses the DOxygen markup style for formatting the code comments, as shown in the example here. "\a argument" can be used in the descriptive text to name a function argument. "\param argument definition" should be used to define each of the function parameters, and can be followed by [in], [out], or [in,out] to indicate whether each parameter is used for input, output, or both. "\retval" should be used to define the function return values.
  • /* right */
  • /**
  • * Initialize or update CLIO structures for the regular file \a inode
  • * when new meta-data arrives from the server.
  • *
  • * \param[in] inode regular file inode
  • * \param[in] md new file metadata from MDS
  • * - the \a md->body must have a valid FID (valid & OBD_MD_FLID)
  • * - allocates cl_object if necessary,
  • * - updated layout, if object was already here.
  • *
  • * \retval 0 if the inode was initialized/updated properly
  • * \retval negative errno if there was a problem
  • */
  • int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
  • {
  • /* wrong */
  • /* finish inode */
  • void
  • cl_inode_fini(struct inode *inode)
  • {

/**

* Implements cl_page_operations::cpo_make_ready() method for Linux.

*

* This is called to yank a page referred to by \a slice from the transfer

* cache and to send it out as a part of transfer. This function try-locks

* the page. If try-lock failed, page is owned by some concurrent IO, and

* should be skipped (this is bad, but hopefully rare situation, as it usually

* results in transfer being shorter than possible).

*

* \param[in] env lu environment for large temporary stack variables

* \param[in] slice per-layer page structure being prepared

*

 * \retval 0 success, page can be placed into transfer

* \retval -EAGAIN page is either used by concurrent IO has been

* truncated. Skip it.

*/

static int vvp_page_make_ready(const struct lu_env *env, const struct cl_page_slice *slice)

{

C Language Features

  • Don't use inline unless you're doing something so performance critical that the function call overhead will make a difference -- in other words - almost never. It makes debugging harder and overuse can actually hurt performance by causing instruction cache thrashing or crashes due to stack overflow.
  • Do not use typedef for new declarations, as this hides the type of a field for very little benefit, and is against Linux kernel coding stye.  Never typedef pointers, as the * makes C pointer declarations obvious. Hiding it inside a typedef just obfuscates the code.
  • Do not embed assignments inside boolean expressions. Although this can make the code shorter, it doesn't make it more understandable and you increase the risk of confusing "=" with "==" or getting operator precedence wrong if you skimp on brackets. It's even easier to make mistakes when reading the code, so it's much safer simply to avoid it altogether.
  • right:
  • ptr = malloc(size);
  • if (ptr != NULL) {
  • ...
  • wrong:
  • if ((ptr = malloc(size)) != NULL) {
  • ...
  • Conditional expressions read more clearly if only boolean expressions are implicit (i.e., non-boolean and pointer expressions compare explicitly with 0 and NULL respectively.)
  • right:
  • void *pointer = NULL;
  • bool writing = false;
  • int ref_count = 0;
  •  if (!writing && /* not writing? */
  • pointer != NULL && /* valid pointer? */
  • ref_count == 0) /* no more references? */
  • do_this();
  • wrong:
  • if (writing == 0 && /* not writing? */
  • pointer && /* valid pointer? */
  • !ref_count) /* no more references? */
  • do_this();
  • Use parentheses to help readability and reduce the chance of operator precedence errors, but not so heavily that it is difficult to determine which parentheses are a matched pair.
  • right:
  • if (a->a_field == 3 ||
  • ((b->b_field & BITMASK1) && (c->c_field & BITMASK2)))
  • do this();
  • wrong:
  • if (a->a_field == 3 || b->b_field & BITMASK1 && c->c_field & BITMASK2)
  • do this()
  • wrong:
  • if ((((a->a_field) == 3) || (((b->b_field) & (BITMASK1)) &&
  • ((c->c_field) & (BITMASK2)))))
  • do this()
  • Function return values should not mix NULL and ERR_PTR() values.  This avoids complexity and bugs in each of the callers of that function.
  • /* right */
  • struct foo *foo_init()
  • {
  • struct foo *foo;
  • int rc;
  • OBD_ALLOC_PTR(foo);
  • if (foo == NULL)
  • RETURN(ERR_PTR(-ENOMEM));
  • rc = init_foo(foo);
  • if (rc != 0) {
  • OBD_FREE_PTR(foo);
  • RETURN(ERR_PTR(rc));
  • }
  • RETURN(foo);
  • }
  • /* wrong */
  • struct foo *foo_init()
  • {
  • struct foo *foo;
  • int rc;
  • OBD_ALLOC_PTR(foo);
  • if (foo == NULL)
  • RETURN(NULL);
  • rc = init_foo(foo);
  •  if (rc != 0)
  • RETURN(ERR_PTR(rc));
  • RETURN(foo);

}

Layout

  • Code can be much more readable and efficient if the simple or common actions are taken first in a set of tests. Re-ordering conditions like this also eliminates excessive nesting.
  • right:
  • list_for_each_entry(...) {
  • if (!condition1) {
  • do_sth1;
  • continue;
  • }
  • do_sth2_1;
  • do_sth2_2;
  • ...
  • do_sth2_N;
  • if (!condition2)
  • break;
  • do_sth3_1;
  • do_sth3_2;
  • if (!condition3)
  • break;
  • ...
  • do_sth3_N;
  • }
  • wrong:
  • list_for_each_entry(...) {
  • if (condition1) {
  • do_sth2_1;
  • do_sth2_2;
  • ...
  • do_sth2_N;
  • if (condition2) {
  • do_sth3_1;
  • do_sth3_2;
  • if (condition3) {
  • ...
  • do_sth3_N;
  • continue;
  • }
  • }
  • break;
  • } else {
  • do_sth1;
  • }
  • }
  • Function bodies that have complex state to clean up in case of an error should do so only once at the end of the function with a single RETURN(), and use GOTO() to jump there in case of an error:

/* right */

struct bar *bar_init(...)

{

foo = alloc_foo();

if (foo == NULL)

RETURN(-ENOMEM);

 rc = setup_foo(foo);

 if (rc != 0)

GOTO(free_foo, rc);

bar = init_bar(foo);

if (IS_ERR(bar))

GOTO(cleanup_foo, rc = PTR_ERR(bar));

RETURN(bar);

cleanup_foo:

cleanup_foo(foo);

free_foo:

free_foo(foo);

return ERR_PTR(rc);

}

/* wrong */

struct bar *bad_func(...)

{

foo = alloc_foo();

if (foo == NULL)

RETURN(NULL);

rc = setup_foo(foo);

if (rc != 0) {

free_foo(foo);

 RETURN(ERR_PTR(rc));

bar = init_bar(foo);

if (IS_ERR(bar)) {

cleanup_foo(foo);

free_foo(foo);

RETURN(PTR_ERR(bar));

}

RETURN(bar);

}

 

  • Variable should be declared one per line, type and name, even if there are multiple variables of the same type. For maximum readability, the names should be aligned on the same column using tabs, preferably with longer or more important declarations at the top. There should always be an empty line after the variable declarations, before the start of the code.
  • right:
  • struct inode *dir_inode;
  • int count;
  • int len;
  • len = path_name(mnt, dir_inode);
  • wrong:
  • int len, count;
  • struct inode *dir_inode;

len = path_name(mnt, dir_inode);

  • Variable declarations should be kept to the most internal scope, if practical and reasonable, to simplify understanding of where these variables are accessed and modified, and to avoid errors of using/reusing variables that only have meaning if certain branches are used:
  • right:
  • int len;
  • if (len > 0) {
  • int count;
  • struct inode *inode = iget(foo);
  • count = inode->i_size;
  • if (count > 32) {
  • int left = count;
  • :
  • }
  • }
  • Even for short conditionals, the operation should be on a separate line:
  • right:
  • if (foo)
  • bar();
  • wrong:
  • if (foo) bar();
  • When you wrap a line containing parenthesis, start the continued line after the parenthesis so that the expression or argument is visually bracketed.
  • right:
  • variable = do_something_complicated(long_argument, longer_argument,
  • longest_argument(sub_argument,
  • foo_argument),
  • last_argument);
  • if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
  • another_long_condition(very_long_argument_name,
  • another_long_argument_name) >
  • second_long_value) {
  • do_something();
  • ...
  • wrong:

variable = do_something_complicated(long_argument, longer_argumen