Lustre NRS Configuration
Overview
The Network Request Scheduler (NRS) controls how Lustre servers order incoming RPCs for processing. By default, requests are handled in FIFO order, but NRS policies allow administrators to reorder requests to improve throughput, fairness, or implement quality-of-service (QoS) controls.
NRS is configured independently for each service partition (e.g., ost_io, mdt, mdt_readpage, ldlm_canceld). Most tuning focuses on ost_io, which handles bulk data RPCs.
For general tuning guidance, see Lustre Tuning.
NRS Policy Types
| Policy | Description | Applicable Services | Notes |
|---|---|---|---|
| fifo | First-in, first-out (default) | All | No tunables. Requests processed in arrival order. |
| crrn | Client Round-Robin | All | Ensures each client gets a fair share of server time. Cycles through clients, processing a configurable number of requests per client before moving to the next. |
| orr | Object-based Round-Robin | ost_io only | Orders requests by object (file) and offset to optimize disk access patterns. Reduces seek overhead on HDDs. |
| trr | Target-based Round-Robin | ost_io only | Similar to ORR but operates at the target (OST) level rather than per-object. Good for mixed workloads with many files. |
| tbf | Token Bucket Filter | All | Advanced QoS policy. Rate-limits requests based on NID, JobID, UID/GID, or opcode. Supports multiple rules with priorities. |
| delay | Delay (testing only) | All | Artificially delays requests. Used for testing and debugging only — never use in production. |
Checking the Current NRS Policy
To see which policy is active on the OSS ost_io service:
lctl get_param ost.OSS.ost_io.nrs_policies
Output shows each policy with its state:
regular_requests:
- name: fifo
state: started
- name: crrn
state: stopped
- name: orr
state: stopped
...
high_priority_requests:
...
Each service has two queues:
- regular_requests (reg) — normal priority RPCs
- high_priority_requests (hp) — high-priority RPCs (e.g., lock callbacks)
Enabling an NRS Policy
To set a policy on the regular queue:
lctl set_param ost.OSS.ost_io.nrs_policies="trr reg"
To set a policy on the high-priority queue:
lctl set_param ost.OSS.ost_io.nrs_policies="crrn hp"
To set a policy on both queues:
lctl set_param ost.OSS.ost_io.nrs_policies="orr both"
To revert to FIFO:
lctl set_param ost.OSS.ost_io.nrs_policies="fifo reg"
Note: Only one policy can be active per queue at a time. Enabling a new policy automatically stops the previous one.
CRR-N (Client Round-Robin)
CRR-N ensures fairness across clients by cycling through them in round-robin order.
lctl set_param ost.OSS.ost_io.nrs_policies="crrn reg"
Tunable: quantum — number of RPCs to process per client before rotating.
lctl set_param ost.OSS.ost_io.nrs_crrn_quantum=16
Default quantum is 16. Higher values improve throughput for individual clients at the cost of fairness.
ORR (Object-based Round-Robin)
ORR reorders RPCs to access the same object (file) sequentially by offset, reducing disk seeks:
lctl set_param ost.OSS.ost_io.nrs_policies="orr reg"
This is most effective on HDD-backed OSTs with large sequential file workloads.
TRR (Target-based Round-Robin)
TRR operates like ORR but groups requests at the target level rather than per-object:
lctl set_param ost.OSS.ost_io.nrs_policies="trr reg"
TRR is better suited for mixed workloads where many different files are accessed concurrently.
TBF (Token Bucket Filter)
TBF is the most flexible NRS policy, providing QoS controls based on client identity. It uses a token bucket algorithm to rate-limit request processing.
Enabling TBF
lctl set_param ost.OSS.ost_io.nrs_policies="tbf reg"
Rate-Limiting by NID
Limit a specific client to 100 RPCs/second:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
"start slow_client nid={10.0.1.50@tcp} rate=100"
Limit a subnet:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
"start slow_subnet nid={10.0.2.[0-255]@tcp} rate=50"
Rate-Limiting by JobID
Limit a specific job:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
"start limit_backup jobid={backup_job.*} rate=200"
This requires JobID to be enabled on clients (lctl set_param jobid_var=procname_uid).
Rate-Limiting by UID
Limit a specific user:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
"start limit_user uid={1001} rate=50"
Compound Rules
TBF supports compound expressions combining NID, JobID, UID, and GID:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
"start limit_user_on_net nid={10.0.1.[0-255]@tcp}&uid={1001} rate=25"
Listing TBF Rules
lctl get_param ost.OSS.ost_io.nrs_tbf_rule
Removing a TBF Rule
lctl set_param ost.OSS.ost_io.nrs_tbf_rule="stop slow_client"
Default Rule
The built-in default rule applies to all requests not matched by any explicit rule. You can change its rate:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\ "change default rate=500"
Rules are evaluated in priority order (most recently added first). The default rule always has the lowest priority.
Applying NRS to MDS Services
NRS policies also apply to MDT services:
lctl get_param mds.MDS.mdt.nrs_policies
lctl set_param mds.MDS.mdt.nrs_policies="tbf reg"
lctl set_param mds.MDS.mdt.nrs_tbf_rule=\
"start limit_metadata nid={10.0.3.[0-255]@tcp} rate=1000"
Making NRS Settings Persistent
NRS settings applied via lctl set_param do not survive a restart. To make them persistent:
lctl set_param -P ost.OSS.ost_io.nrs_policies="tbf reg"
lctl set_param -P ost.OSS.ost_io.nrs_tbf_rule=\
"start slow_subnet nid={10.0.2.[0-255]@tcp} rate=50"