-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Summary
- Using
sas_mpath_snic_alias
script in udev rules - Intermittently missing a handful of symlinks after udev runs
- Adding delay to
sas_mpath_snic_alias
seems to alleviate the problem - Possibly the enclosure controllers are overwhelmed with too many requests at once?
- In this report, some WWNs have been redacted to protect the guilty
Environment
- Hardware
- 2 x Broadcom HBA
9500-16e
- 84 x Seagate
ST20000NM002D
disks - 2 x Supermicro
CSE-847E2C-R1K23JBOD
enclosures (w/ redundant expanders)
- 2 x Broadcom HBA
- Software
- Debian 12.5 "bookworm"
- Kernel
6.1.0-21-amd64
/6.1.90-1
(2024-05-03) - Python
3.11.2
- multipath-tools
0.9.4-3+deb12u1
- sasutils
0.5.0
- SAS topology
- Single SFF-8644 cable from each HBA, to an expander in each enclosure
- Thus: Two SFF-8644 cables from host to each enclosure
- The second host-facing SFF-8644 port on each expander is not used
- Downstream daisy-chain ports on enclosures are not used
Configuration
/etc/multipath.conf
says in part:user_friendly_names no
find_multipaths yes
path_grouping_policy multibus
/etc/udev/rules.d/sasutils.rules
says:KERNEL=="dm-[0-9]*", PROGRAM="/usr/local/bin/sas_mpath_snic_alias_delayed %k", SYMLINK+="mapper/%c"
sg_ses
has been used to assign nicknames to the enclosures, such as:SHLF_1_FRNT_PRI
(disk shelf 1, front backplane, primary expander)SHLF_1_FRNT_SEC
(disk shelf 1, front backplane, secondary expander)SHLF_1_REAR_PRI
(disk shelf 1, rear backplane, primary expander)SHLF_2_FRNT_PRI
(disk shelf 2, front backplane, primary expander)
- The disks are not partitioned
Symptoms
- I am expecting symlinks like
/dev/mapper/SHLF_1_FRNT-bay00
to appear for every physical disk - Intermittently, a handful of these will be missing
Investigation
Good behaviors
- Each disk appears twice at the SAS block layer (
/dev/sd*
) - All
/dev/mapper/35000000000000000
symlinks always appear for all disks - I/O seems completely reliable; it is just the udev aliases that have trouble
sas_devices -v
has always reported all devices and enclosures, with proper slotslsscsi
has always reported all devicesmultipath -l
has always reported all devices, with two disks per map- When the links are there, I/O works fine; almost 1 petabyte written
Problem behaviors
- Not always the same nodes missing
- Persists though multiple reboots and kernel updates
- Persists through a full shutdown, power-off, and power-source-disconnect
- Typically only 2 to 4 nodes missing, but once saw as many as 14
- Lower numbered devices seem slightly more likely to be missing
- For example,
/dev/mapper/SHLF_1_FRNT-bay00
missing several times - However, connecting just one shelf does not make the problem go away
- For example,
- I have tried running
udevadm trigger
a few times; it has always caused the missing nodes to appear - Does not appear to be specific to multipath
- Tried a single SAS cable per enclosure and using just
sas_sd_snic_alias
- Most disks then appeared as
SHLF_1_FRNT_PRI-bay00
- Still randomly and intermittently missing some nodes
- A few links showed up with names like
/dev/disk/by-bay/naa.5000000000000000-bay09
- Tried a single SAS cable per enclosure and using just
- I set
udev_log
todebug
in/etc/udev/udev.conf
but the results have not been particularly illuminating- All I have seen is the occasional
/etc/udev/rules.d/sasutils.rules:11 Command "/usr/local/bin/sas_mpath_snic_alias_delayed dm-0" returned 1 (error)
- Nothing more informative
- Not all
dm-
devices even appear in the log, even when everything is working (???)
- All I have seen is the occasional
Workaround
- Introducing a delay in
sas_mpath_snic_alias
seems to have alleviated the problem - Theory is
- udev events are firing for all 84 disks at once
- Possibly udev events fire for each path, so 168 invocations
- Plus invocations for all 84 multipath maps
- Each invocation would then query the enclosure controllers independently
- Hundreds of inquiries hitting the enclosure controller at once might have been too much for its little brain to handle
- Delay is proportional to the multipath map number, so it should scale with more/fewer disks
- An extremely large number of disks might still lead to udev timeouts
- My implementation is kludgey and fragile
Details
- Modified
sas_mpath_snic_alias
file itself - Added
from time import sleep
near top - Added delay proportional to
dm-NN
number passed as first argument, aftersys.argv
processing, but beforeload_entry_point
, as follows:
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
delay = sys.argv[1] # assuming a single argument, I hope that's right
delay = delay[3::] # extract number out of an argument like "dm-37"
delay = int(delay) # make sure it is an integer
delay = delay * 0.04 # add 40 millisecond delay for each additional map
delay = delay + 0.25 # minimum 250 millisecond delay
sleep(delay)
sys.exit(load_entry_point('sasutils==0.5.0', 'console_scripts', 'sas_mpath_snic_alias')())
A proper solution would likely be in the main part of the code library, but I had neither the time nor the skill to delve that deeply.
Metadata
Metadata
Assignees
Labels
No labels