Skip to main content

System Config Info

This tool is to collect the system information automatically on the tested GPU nodes including the following hardware categories:

Usage#

Usage on local machine#

  1. Install SuperBench on the local machine using root privilege.

  2. Start to collect the sys info using sb node info --output-dir ${output-dir} command using root privilege.

  3. After the command finished, you can find the output system info json file sys-info.json of local node under \${output_dir}.

Usage on multiple remote machines#

  1. Install SuperBench on the local machine.

  2. Deploy SuperBench onto the remote machines.

  3. Prepare the host file of the tested GPU nodes using Ansible Inventory on the local machine.

  4. After installing the Superbnech and the host file is ready, you can start to collect the sys info automatically using sb run --get-info command. The detailed command can be found from SuperBench CLI.

    sb run --get-info -f host.ini --output-dir ${output-dir} -C superbench.enable=none
  5. After the command finished, you can find the output system info json file sys-info.json of each node under \${output_dir}/nodes/${node_name}.

Parameter and Details#

System#

SubCategoryKeyCommandDescriptionExample
OSsystem-manufacturerdmidecode -s system-manufacturermanufacturer of the systemMicrosoft Corporation
system-product name(virtual machine)dmidecode -s system-product-nameproduct name or virtual machineVirtual Machine
operating_systemcat /proc/versionversion of current running osUbuntu 9.3.0-17ubuntu1~20.04
unameunameshort for system informationLinux sb-test-wu-000000 5.8.0-1039-azure #42~20.04.1-Ubuntu
Dockerdocker_server_versiondocker versionserver version of docker engine20.10.3
docker_client_versiondocker versionclient version of docker engine20.10.3
VMvmbuslsvmbusdevices attached to the Hyper-V VMBus"VMBUS ID 1": "[Dynamic Memory]",
"VMBUS ID 2": "Synthetic mouse",
...
Kernelkernel_moduleslsmodlist of active kernel modules"Module": "binfmt_misc",
"Size": "24576",
"Used": "1"
...
kernel_parameterssysctlkernel parameters"abi.vsyscall32": "1",
"debug.exception-trace": "1",
...
DMIdmidecodedmidecodeDMI table dump (info on hardware components)"dmidecode": "# dmidecode 3.2\nGetting SMBIOS data from sysfs..."

Memory#

SubCategoryKeyCommandDescriptionExample
Generalmodeldmidecode -t memorydistinct model name of the memorySamsung M393A4K40DB3-CWE
typedmidecode -t memorydistinct type of memoryDDR4-3200
clock frequencydmidecode -t memorydistinct clock frequency of memory3200 MT/s
channelsdmidecode -t memorythe number of memory chips16
capacitylsmemthe total capacity of memory511.9G
block_sizelsmemthe block size of memory128M

CPU#

SubCategoryKeyCommandDescriptionExample
Generalarcheticturelscpuarchitecture of cpux86_64
model namelscpumodel name of cpuAMD EPYC 7662 64-Core Processor
cpu op-modelscpucpu mode: 32bit/64bit32-bit, 64-bit
byte orderlscpubyte orderLittle Endian
address sizelscpusize of address48 bits physical, 48 bits virtual
cpuslscpulogical cpu cores count256
On-line CPU(s) listlscpuon-line logical cpu cores0-255
Thread(s) per corelscputhread per core2
Core(s) per socketlscpucore per socket64
Socket(s)lscpusocket count2
NUMA node(s)lscpunuma node count4
L&ltx&gt cacheslscpucache size"L1d cache": "4 MiB", "L1i cache": "4 MiB", "L2 cache": "64 MiB", "L3 cache": "512 MiB"
NUMA node&ltx&gt CPU(s)lscpucpu core list of the numa node"NUMA node0 CPU(s)": "0-31,128-159", "NUMA node1 CPU(s)": "32-63,160-191", "NUMA node2 CPU(s)": "64-95,192-223", "NUMA node3 CPU(s)": "96-127,224-255"
Flagslscpucpu flags fpu vme de pse tsc msr pae mce cx8 apic ...
max_speedsudo dmidecode -t processor | grep "Speed"distinct cpu max frequency3700 MHz
current_speedsudo dmidecode -t processor | grep "Speed"distinct cpu current frequency2000 MHz

Disk#

SubCategoryKeyCommandDescriptionExample
FileSystemfilesystemdf -Ththe name/path of the filesystem/dev/nvme0n1p2
availdf -Thavail size of the filesystem1.4T
sizedf -Thtotal size of the filesystem1.8T
typedf -Ththe type of the filesystemext4
block_sizeblockdev --getbsz /dev/&ltdevice&gtthe block size of the filesytem4096
4k_alignment4kDEVICE=/dev/sdb1 do parted $DEVICE align-check opt 1; done_alignmentwhether the file system is 4k alignment1 aligned
BlockDevicenamelsblk -e 7 -o NAME,ROTA,SIZE,MODEL the name of the block devicenvme0n1
modellsblk -e 7 -o NAME,ROTA,SIZE,MODEL the model name of the block deviceVO001920KXAVP
rotationallsblk -e 7 -o NAME,ROTA,SIZE,MODEL whether rotational, thai is HDD or SSD0
sizelsblk -e 7 -o NAME,ROTA,SIZE,MODEL the total size of the block device1.8T
block_sizefdisk -l -u /dev/ | grep "Sector size"the sector size of the block deviceSector size (logical/physical): 512 bytes / 512 bytes
Generalmappingmountmount relationship between filesystem and block device

Networking#

SubCategoryKeyCommandDescriptionExample
NICnic_logical_namelshw -c networklogical name of the nicib1
nic_modellshw -c networkmodel name of the nicMellanox Technologies MT28908 Family [ConnectX-6]
nic_firmwarelshw -c networkfw version20.30.1004 (MT_0000000594)
nic_driverlshw -c networkdriver versionmlx5_core[ib_ipoib] 5.3-1.0.0
nic_speedlshw -c networkspeed spec of the nic200 Gbit/s
nic_disabledlshw -c networkwhether diabledfalse
IBdevice_infoibv_devinfo -vlist of device information for each ib device"hca_id:\tmlx5_0": ...
device_statusibstatlist of device status for each ib device"CA 'mlx5_0'": ...
Generalofed_versionofed_info  -sthe version of ofedMLNX_OFED_LINUX-5.3-1.0.5.0:

Accelerator#

SubCategoryKeyCommandDescriptionExample(NVIDIA)Example(AMD)
Generaldriver_versionnvidia-smi -q -x/rocm-smi -adriver version460.27.045.9.25
topologynvidia-smi topo -m/rocm-smi --showtopogpu connection topology (nvidia only)//
nvidia-container-runtime_versionnvidia-container-runtime -vversion of nvidia-container-runtime (nvidia only)1.0.0-rc92N/A
nvidia-fabricmanager_versionnv-fabricmanager --versionversion of nvidia-fabricmanager (nvidia only)460.27.04N/A
nv_peer_mem_versiondpkg -l | grep 'nvidia-peer-memory'version of nv_peer_mem (nvidia only)1.1-0N/A
GPUCardrocm_inforocm-smi -a & rocm-smi --showmeminfo vramamd gpu info of each gpu&lsindex&gt, including firmware, frequency, memory, etc. (amd only)N/A"card0": ...
"card1": ...
nvidia_infonvidia-smi -qnvidia gpu info list of each gpu, including firmware, frequency, memory, etc. (nvidia only)"timestamp": "Fri Aug 20 05:36:24 2021",
"driver_version": "460.27.04",
"cuda_version": "11.2",
"attached_gpus": "8",
"gpu": [...]
...
N/A

PCIe#

SubCategoryKeyCommandDescriptionExample
Generaltopologylspci -t -vvvtopology of installed PCI devices/
device_infolspci -vvvdevice info on installed PCI devices00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex...