给 R7515 装了一块 ConnectX-5 网卡 ,具体型号 MCX542B-ACAN
,因为 R7515 的 OCP 网卡槽为 OCP 2.0 Type1 类型,PCIE3.0*8 通道,这个型号已经算最顶配了。卖家说是浪潮服务器的拆机卡,实际推测可能是百度 退役下来的服务器拆机的。
驱动安装 Proxmox 7 是基于 Debian 11 的发行版,但是其内核版本不同于 Debian 11 默认使用的 5.10,而是使用的 Ubuntu 22.04 使用的 5.15 版本,因此并不能使用官网的用于 Debian 系统的 Repo 来安装 MLNX-OFED 驱动,亦不能使用 Ubuntu 系统的 Repo,因为依赖版本完全不同。这里需要按照官方文档,生成用于适用于本机内核版本的 deb 包。
参考:Installing MLNX_OFED
下载驱动源码包,打开官网 -> 选择一个合适的版本,我选择目前最新的 LTS 版本 5.8-3.0.7.0-LTS
->Debian->Debian 11.3
->x86_64
->tgz
,Proxmox 8 要使用最新的 24.04 版本才支持 6.8
版本的内核。
1 2 wget https://content.mellanox.com/ofed/MLNX_OFED-5.8-3.0.7.0/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
生成本地 Repo
1 2 3 4 5 6 7 ./mlnx_add_kernel_support.sh -m $(pwd) cd tmp/ tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext.tgz cd /usr/local/src mv /tmp/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext ./
apt 添加本地 Repo
1 2 cd /etc/apt/sources.list.d echo "deb [trusted=yes] file:/usr/local/src/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext/DEBS ./" > mlnx_ofed.list
安装 mlnx-ofed 驱动
1 2 apt update apt install mlnx-ofed-basic
更新固件 拿到手的网卡的 PSID
和官方的不同,因此不能通过官方的固件更新工具在线自动更新,推测这是一批百度使用的网卡。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # 直接在线更新会发现没有固件,因为PSID不匹配 $ mlxfwmanager --online -u -d 02:00.0 Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACAN_C07_Ax Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free PSID: BAI0000000010 PCI Device Name: 02:00.1 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.25.4062 N/A PXE 3.5.0701 N/A UEFI 14.18.0019 N/A Status: No matching image found
参考:Updating Firmware After Installation
下载固件,打开官网 -> 选择一个合适的版本,我选择目前最新的 LTS 版本 16.35.3006-LTS
->MCX542B-ACA
->MT_0000000248
1 2 wget https://www.mellanox.com/downloads/firmware/fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip unzip fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
备份当前固件,更新固件后,型号描述等都会变,这些东西也可以备份下,参考 ,我手太快没备份 T_T
1 flint -d /dev/mst/mt4119_pciconf0 ri BAI0000000010.bin
强刷新固件,修改 PSID 的情况需要用 flint
命令,请三思而后行,确保型号没错
1 2 3 4 5 6 7 8 9 10 11 12 13 14 $ flint --allow_psid_change -d /dev/mst/mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin burn Done. Current FW version on flash: 16.25.4062 New FW version: 16.35.3006 You are about to replace current PSID on flash - "BAI0000000010" with a different PSID - "MT_0000000248". Note: It is highly recommended not to change the PSID. Do you want to continue ? (y/n) [n] : y Burning FW image without signatures - OK Burning FW image without signatures - OK Restoring signature - OK -I- To load new FW run mlxfwreset or reboot machine.
查询一下信息发现已成功刷上。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 $ mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACAN_C07_Ax Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free PSID: BAI0000000010 PCI Device Name: /dev/mst/mt4119_pciconf0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.25.4062 FW (Running) 16.25.4062 N/A PXE 3.5.0701 3.5.0701 UEFI 14.18.0019 14.18.0019 Status: Up to date
硬件重置一下
1 2 3 4 5 6 7 8 9 10 11 12 $ mlxfwreset -d /dev/mst/mt4119_pciconf0 reset Minimal reset level for device, /dev/mst/mt4119_pciconf0: 3: Driver restart and PCI reset Continue with reset?[y/N] y -I- Sending Reset Command To Fw -Done -I- Stopping Driver -Done -I- Resetting PCI -Done -I- Starting Driver -Done -I- Restarting MST -Done -I- FW was loaded successfully.
已经是新固件了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 $ mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACA_Ax_Bx Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free PSID: MT_0000000248 PCI Device Name: /dev/mst/mt4119_pciconf0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.35.3006 PXE 3.6.0902 3.6.0902 UEFI 14.29.0015 14.29.0015 Status: Up to date
在线更新 输入新的 PSID 后,后续就可以直接使用 mlxfwmanager
在线更新固件了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 $ mlxfwmanager --online -u -d 02:00.0 Querying Mellanox devices firmware ... Device ---------- Device Type: ConnectX5 Part Number: MCX542B-ACA_Ax_Bx Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free PSID: MT_0000000248 PCI Device Name: 02:00.0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.35.3502 PXE 3.6.0902 3.6.0902 UEFI 14.29.0015 14.29.0015 Status: Update required Release notes for the available Firmware: ----------------------------------------- For more details, please refer to the following FW release notes: 1- ConnectX3 (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf 2- ConnectX3Pro (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_42_5000-release_notes.pdf 3- Connect-IB (10.16.1200): http://www.mellanox.com/pdf/firmware/ConnectIB-FW-10_16_1200-release_notes.pdf 4- ConnectX4 (12.28.2006): http://docs.mellanox.com/display/ConnectX4Firmwarev12282006 5- ConnectX4Lx (14.32.1010): http://docs.mellanox.com/display/ConnectX4LxFirmwarev14321010 6- ConnectX5 (16.35.3502): http://docs.mellanox.com/display/ConnectX5Firmwarev16353502 7- ConnectX6 (20.41.1000): http://docs.mellanox.com/display/ConnectX6Firmwarev20411000 8- ConnectX6Dx (22.41.1000): http://docs.mellanox.com/display/ConnectX6DxFirmwarev22411000 9- ConnectX6Lx (26.41.1000): http://docs.mellanox.com/display/ConnectX6LxFirmwarev26411000 10- BlueField2 (24.41.1000): http://docs.mellanox.com/display/BlueField2Firmwarev24411000 11- ConnectX7 (28.41.1000): http://docs.mellanox.com/display/ConnectX7Firmwarev28411000 12- BlueField3 (32.41.1000): http://docs.mellanox.com/display/BlueField3Firmwarev32411000 --------- Found 1 device(s) requiring firmware update... Perform FW update? [y/N]: y Please wait while downloading MFA(s) 100% Device FSMST_INITIALIZE - OK Writing Boot image component - OK Done Restart needed for updates to take effect.