On a node with both Mellanox Infiniband and Intel Omni-path adapter installed, one might encounter the following error while mounting Lustre (most probably when Lustre is served over o2ib using Infiniband).
Error logged on client:
LustreError: 15c-8: MGC10.1.4.64@o2ib: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Lustre: Unmounted lustre-client LustreError: 52174:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-5)
Error logged on mgs:
LNetError: 1079:0:(o2iblnd_cb.c:2325:kiblnd_passive_connect()) Can't accept conn from 10.1.4.7@o2ib, queue depth too large: 128 (<=8 wanted) LNetError: 1079:0:(o2iblnd_cb.c:2325:kiblnd_passive_connect()) Skipped 6 previous similar messages LNet: 1079:0:(o2iblnd_cb.c:2352:kiblnd_passive_connect()) Can't accept conn from 10.1.4.7@o2ib (version 12): max_frags 32 incompatible without FMR pool (256 wanted) LNet: 1079:0:(o2iblnd_cb.c:2352:kiblnd_passive_connect()) Skipped 11 previous similar messages
This was due to modprobe script ko2iblnd.conf applying a different set of tuning with the presence of OPA interface.
To skip applying OPA tunings, you just need to comment out the line starting with “options ko2iblnd-opa".
Restarting lnet service should cause ko2iblnd to be reloaded with default settings and restores connectivity to mgs.