cl_lock Struct Reference

Layered client lock. More...

#include <cl_object.h>

Data Fields

cfs_atomic_t cll_ref
 Reference counter.
cfs_list_t cll_layers
 List of slices.
cfs_list_t cll_linkage
 Linkage into cl_lock::cll_descr::cld_obj::coh_locks list.
cl_lock_descr cll_descr
 Parameters of this lock.
enum cl_lock_state cll_state
 Protected by cl_lock::cll_guard.
cfs_waitq_t cll_wq
 signals state changes.
cfs_mutex_t cll_guard
 Recursive lock, most fields in cl_lock{} are protected by this.
cfs_task_t * cll_guarder
int cll_depth
cfs_task_t * cll_intransit_owner
 the owner for INTRANSIT state
int cll_error
int cll_holds
 Number of holds on a lock.
int cll_users
 Number of lock users.
unsigned long cll_flags
 Flag bit-mask.
cfs_list_t cll_inclosure
 A linkage into a list of locks in a closure.
 Confict lock at queuing time.
lu_ref cll_reference
 A list of references to this lock, for debugging.
lu_ref cll_holders
 A list of holds on this lock, for debugging.
lu_ref_link * cll_obj_ref
 A reference for cl_lock::cll_descr::cld_obj.

Detailed Description

Layered client lock.


The locking model of the new client code is built around

struct cl_lock

data-type representing an extent lock on a regular file. cl_lock is a layered object (much like cl_object and cl_page), it consists of a header (struct cl_lock) and a list of layers (struct cl_lock_slice), linked to cl_lock::cll_layers list through cl_lock_slice::cls_linkage.

All locks for a given object are linked into cl_object_header::coh_locks list (protected by cl_object_header::coh_lock_guard spin-lock) through cl_lock::cll_linkage. Currently this list is not sorted in any way. We can sort it in starting lock offset, or use altogether different data structure like a tree.

Typical cl_lock consists of the two layers:

lov_lock contains an array of sub-locks. Each of these sub-locks is a normal cl_lock: it has a header (struct cl_lock) and a list of layers:

Each sub-lock is associated with a cl_object (representing stripe sub-object or the file to which top-level cl_lock is associated to), and is linked into that cl_object::coh_locks. In this respect cl_lock is similar to cl_object (that at lov layer also fans out into multiple sub-objects), and is different from cl_page, that doesn't fan out (there is usually exactly one osc_page for every vvp_page). We shall call vvp-lov portion of the lock a "top-lock" and its lovsub-osc portion a "sub-lock".


cl_lock is reference counted. When reference counter drops to 0, lock is placed in the cache, except when lock is in CLS_FREEING state. CLS_FREEING lock is destroyed when last reference is released. Referencing between top-lock and its sub-locks is described in the lov documentation module.


Also, cl_lock is a state machine. This requires some clarification. One of the goals of client IO re-write was to make IO path non-blocking, or at least to make it easier to make it non-blocking in the future. Here `non-blocking' means that when a system call (read, write, truncate) reaches a situation where it has to wait for a communication with the server, it should --instead of waiting-- remember its current state and switch to some other work. E.g,. instead of waiting for a lock enqueue, client should proceed doing IO on the next stripe, etc. Obviously this is rather radical redesign, and it is not planned to be fully implemented at this time, instead we are putting some infrastructure in place, that would make it easier to do asynchronous non-blocking IO easier in the future. Specifically, where old locking code goes to sleep (waiting for enqueue, for example), new code returns cl_lock_transition::CLO_WAIT. When enqueue reply comes, its completion handler signals that lock state-machine is ready to transit to the next state. There is some generic code in cl_lock.c that sleeps, waiting for these signals. As a result, for users of this cl_lock.c code, it looks like locking is done in normal blocking fashion, and it the same time it is possible to switch to the non-blocking locking (simply by returning cl_lock_transition::CLO_WAIT from cl_lock.c functions).

For a description of state machine states and transitions see enum cl_lock_state.

There are two ways to restrict a set of states which lock might move to:

User is used to assure that lock is not canceled or destroyed while it is being enqueued, or actively used by some IO.

Currently, a user always comes with a hold (cl_lock_invariant() checks that a number of holds is not less than a number of users).


This is how lock state-machine operates. struct cl_lock contains a mutex cl_lock::cll_guard that protects struct fields.

Top-lock and sub-lock has separate mutexes and the latter has to be taken first to avoid dead-lock.

To see an example of interaction of all these issues, take a look at the lov_cl.c:lov_lock_enqueue() function. It is called as a part of cl_enqueue_try(), and tries to advance top-lock to ENQUEUED state, by advancing state-machines of its sub-locks (lov_lock_enqueue_one()). Note also, that it uses trylock to grab sub-lock mutex to avoid dead-lock. It also has to handle CEF_ASYNC enqueue, when sub-locks enqueues have to be done in parallel, rather than one after another (this is used for glimpse locks, that cannot dead-lock).


struct cl_lock_operations provide a number of call-backs that are invoked when events of interest occurs. Layers can intercept and handle glimpse, blocking, cancel ASTs and a reception of the reply from the server.

One important difference with the old client locking model is that new client has a representation for the top-lock, whereas in the old code only sub-locks existed as real data structures and file-level locks are represented by "request sets" that are created and destroyed on each and every lock creation.

Top-locks are cached, and can be found in the cache by the system calls. It is possible that top-lock is in cache, but some of its sub-locks were canceled and destroyed. In that case top-lock has to be enqueued again before it can be used.

Overall process of the locking during IO operation is as following:

Striping introduces major additional complexity into locking. The fundamental problem is that it is generally unsafe to actively use (hold) two locks on the different OST servers at the same time, as this introduces inter-server dependency and can lead to cascading evictions.

Basic solution is to sub-divide large read/write IOs into smaller pieces so that no multi-stripe locks are taken (note that this design abandons POSIX read/write semantics). Such pieces ideally can be executed concurrently. At the same time, certain types of IO cannot be sub-divived, without sacrificing correctness. This includes:

Also, in the case of read(fd, buf, count) or write(fd, buf, count), where buf is a part of memory mapped Lustre file, a lock or locks protecting buf has to be held together with the usual lock on [offset, offset + count].

As multi-stripe locks have to be allowed, it makes sense to cache them, so that, for example, a sequence of O_APPEND writes can proceed quickly without going down to the individual stripes to do lock matching. On the other hand, multi-stripe locks shouldn't be used by normal read/write calls. To achieve this, every layer can implement ->clo_fits_into() method, that is called by lock matching code (cl_lock_lookup()), and that can be used to selectively disable matching of certain locks for certain IOs. For exmaple, lov layer implements lov_lock_fits_into() that allow multi-stripe locks to be matched only for truncates and O_APPEND writes.

Interaction with DLM

In the expected setup, cl_lock is ultimately backed up by a collection of DLM locks (struct ldlm_lock). Association between cl_lock and DLM lock is implemented in osc layer, that also matches DLM events (ASTs, cancellation, etc.) into cl_lock_operation calls. See struct osc_lock for a more detailed description of interaction with DLM.

Field Documentation

struct cl_lock_descr cl_lock::cll_descr

Parameters of this lock.

Protected by cl_lock::cll_descr::cld_obj::coh_lock_guard nested within cl_lock::cll_guard. Modified only on lock creation and in cl_lock_modify().

unsigned long cl_lock::cll_flags

Flag bit-mask.

Values from enum cl_lock_flags. Updates are protected by cl_lock::cll_guard.

cfs_mutex_t cl_lock::cll_guard

Recursive lock, most fields in cl_lock{} are protected by this.

Locking rules: this mutex is never held across network communication, except when lock is being canceled.

Lock ordering: a mutex of a sub-lock is taken first, then a mutex on a top-lock. Other direction is implemented through a try-lock-repeat loop. Mutices of unrelated locks can be taken only by try-locking.

See also:
osc_lock_enqueue_wait(), lov_lock_cancel(), lov_sublock_wait().

int cl_lock::cll_holds

Number of holds on a lock.

A hold prevents a lock from being canceled and destroyed. Protected by cl_lock::cll_guard.

See also:
cl_lock_hold(), cl_lock_unhold(), cl_lock_release()

cfs_list_t cl_lock::cll_inclosure

A linkage into a list of locks in a closure.

See also:

cfs_list_t cl_lock::cll_layers

List of slices.

Immutable after creation.

cfs_list_t cl_lock::cll_linkage

Linkage into cl_lock::cll_descr::cld_obj::coh_locks list.

Protected by cl_lock::cll_descr::cld_obj::coh_lock_guard.

struct lu_ref_link* cl_lock::cll_obj_ref

A reference for cl_lock::cll_descr::cld_obj.

For debugging.

int cl_lock::cll_users

Number of lock users.

Valid in cl_lock_state::CLS_HELD state only. Lock user pins lock in CLS_HELD state. Protected by cl_lock::cll_guard.

See also:
cl_wait(), cl_unuse().

The documentation for this struct was generated from the following file:
Generated on Mon Apr 12 04:18:21 2010 for Lustre by doxygen 1.4.7

Contact | About Sun | News | Employment | Privacy | Terms of Use | Trademarks | (C) 2008 Sun Microsystems, Inc.