Skip to content

Intermittent missing symlinks with sas_mpath_snic_alias; possible timing issue? #29

@OsmiumBalloon

Description

@OsmiumBalloon

Summary

  • Using sas_mpath_snic_alias script in udev rules
  • Intermittently missing a handful of symlinks after udev runs
  • Adding delay to sas_mpath_snic_alias seems to alleviate the problem
  • Possibly the enclosure controllers are overwhelmed with too many requests at once?
  • In this report, some WWNs have been redacted to protect the guilty

Environment

  • Hardware
    • 2 x Broadcom HBA 9500-16e
    • 84 x Seagate ST20000NM002D disks
    • 2 x Supermicro CSE-847E2C-R1K23JBOD enclosures (w/ redundant expanders)
  • Software
    • Debian 12.5 "bookworm"
    • Kernel 6.1.0-21-amd64 / 6.1.90-1 (2024-05-03)
    • Python 3.11.2
    • multipath-tools 0.9.4-3+deb12u1
    • sasutils 0.5.0
  • SAS topology
    • Single SFF-8644 cable from each HBA, to an expander in each enclosure
    • Thus: Two SFF-8644 cables from host to each enclosure
    • The second host-facing SFF-8644 port on each expander is not used
    • Downstream daisy-chain ports on enclosures are not used

Configuration

  • /etc/multipath.conf says in part:
    • user_friendly_names no
    • find_multipaths yes
    • path_grouping_policy multibus
  • /etc/udev/rules.d/sasutils.rules says:
    • KERNEL=="dm-[0-9]*", PROGRAM="/usr/local/bin/sas_mpath_snic_alias_delayed %k", SYMLINK+="mapper/%c"
  • sg_ses has been used to assign nicknames to the enclosures, such as:
    • SHLF_1_FRNT_PRI (disk shelf 1, front backplane, primary expander)
    • SHLF_1_FRNT_SEC (disk shelf 1, front backplane, secondary expander)
    • SHLF_1_REAR_PRI (disk shelf 1, rear backplane, primary expander)
    • SHLF_2_FRNT_PRI (disk shelf 2, front backplane, primary expander)
  • The disks are not partitioned

Symptoms

  • I am expecting symlinks like /dev/mapper/SHLF_1_FRNT-bay00 to appear for every physical disk
  • Intermittently, a handful of these will be missing

Investigation

Good behaviors

  • Each disk appears twice at the SAS block layer (/dev/sd*)
  • All /dev/mapper/35000000000000000 symlinks always appear for all disks
  • I/O seems completely reliable; it is just the udev aliases that have trouble
  • sas_devices -v has always reported all devices and enclosures, with proper slots
  • lsscsi has always reported all devices
  • multipath -l has always reported all devices, with two disks per map
  • When the links are there, I/O works fine; almost 1 petabyte written

Problem behaviors

  • Not always the same nodes missing
  • Persists though multiple reboots and kernel updates
  • Persists through a full shutdown, power-off, and power-source-disconnect
  • Typically only 2 to 4 nodes missing, but once saw as many as 14
  • Lower numbered devices seem slightly more likely to be missing
    • For example, /dev/mapper/SHLF_1_FRNT-bay00 missing several times
    • However, connecting just one shelf does not make the problem go away
  • I have tried running udevadm trigger a few times; it has always caused the missing nodes to appear
  • Does not appear to be specific to multipath
    • Tried a single SAS cable per enclosure and using just sas_sd_snic_alias
    • Most disks then appeared as SHLF_1_FRNT_PRI-bay00
    • Still randomly and intermittently missing some nodes
    • A few links showed up with names like /dev/disk/by-bay/naa.5000000000000000-bay09
  • I set udev_log to debug in /etc/udev/udev.conf but the results have not been particularly illuminating
    • All I have seen is the occasional /etc/udev/rules.d/sasutils.rules:11 Command "/usr/local/bin/sas_mpath_snic_alias_delayed dm-0" returned 1 (error)
    • Nothing more informative
    • Not all dm- devices even appear in the log, even when everything is working (???)

Workaround

  • Introducing a delay in sas_mpath_snic_alias seems to have alleviated the problem
  • Theory is
    • udev events are firing for all 84 disks at once
    • Possibly udev events fire for each path, so 168 invocations
    • Plus invocations for all 84 multipath maps
    • Each invocation would then query the enclosure controllers independently
    • Hundreds of inquiries hitting the enclosure controller at once might have been too much for its little brain to handle
  • Delay is proportional to the multipath map number, so it should scale with more/fewer disks
  • An extremely large number of disks might still lead to udev timeouts
  • My implementation is kludgey and fragile

Details

  • Modified sas_mpath_snic_alias file itself
  • Added from time import sleep near top
  • Added delay proportional to dm-NN number passed as first argument, after sys.argv processing, but before load_entry_point, as follows:
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
delay = sys.argv[1]     # assuming a single argument, I hope that's right
delay = delay[3::]      # extract number out of an argument like "dm-37"
delay = int(delay)      # make sure it is an integer
delay = delay * 0.04    # add 40 millisecond delay for each additional map
delay = delay + 0.25    # minimum 250 millisecond delay
sleep(delay)
sys.exit(load_entry_point('sasutils==0.5.0', 'console_scripts', 'sas_mpath_snic_alias')())

A proper solution would likely be in the main part of the code library, but I had neither the time nor the skill to delve that deeply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions