Loading... ## 前言 在模型训练评测过程, 为了更好的记录中间输出便于查询与复现, 加入系统部分软件与硬件的信息变得很有必要. 硬件型号与使用情况 * CPU型号 * RAM使用与总量 * GPU型号,以及显存使用与总量 软件环境 * torch版本 * cuda/cudnn/driver版本 --- ## 硬件信息 ### CPU **方式一: 借助`psutil`库** * 安装`pip install psutil` 代码 ```python import psutil cpu_pc_count = psutil.cpu_count(logical=False) # CPU物理核心数 cpu_count = psutil.cpu_count() # 线程数 cpu_percent = round(psutil.cpu_percent(interval=1), 3) # CPU使用率 # 此方法不存在? # cpu_info = psutil.cpu_info() # cpu_model = cpu_info[0]['model'] print(f'CPU信息: 物理核心: {cpu_pc_count}, 线程数: {cpu_count}, 使用率: {cpu_percent}% 型号: {None}') ``` 示例输出 ```shell CPU信息: 物理核心: 12, 线程数: 24, 使用率: 0.3% 型号: None ``` 备注: `psutil`库无法获取详细的CPU型号信息 **方式二: 借助`cpuinfo`库** * 安装`pip install py-cpuinfo` 代码 ```python import cpuinfo cpu_info = cpuinfo.get_cpu_info() cpu_model = cpu_info['brand_raw'] cpu_arch = cpu_info['arch'] cpu_count = cpu_info['count'] print(f'CPU信息: 型号: {cpu_model}, 架构: {cpu_arch}, 线程数: {cpu_count}') ``` 示例输出 ```shell CPU信息: 型号: AMD Ryzen 9 3900X 12-Core Processor, 架构: X86_64, 线程数: 24 ``` --- ### RAM 需要借助`psutil`库, 安装`pip install psutil` 代码 ```python import os import psutil cvt_base = 1024 memory = psutil.virtual_memory() mem_total = round(memory.total / cvt_base / cvt_base, 1) mem_free = round(memory.available / cvt_base / cvt_base, 1) mem_used = round(memory.used / cvt_base / cvt_base, 1) mem_process_used = round(psutil.Process(os.getpid()).memory_info().rss / cvt_base / cvt_base, 1) print(f'内存信息: 全部: {mem_total:,} MB, 空闲: {mem_free:,} MB, 全部使用: {mem_used:,} MB, 当前进程使用: {mem_process_used:,} MB') ``` 示例输出 ```shell 内存信息: 全部: 64,237.3 MB, 空闲: 61,047.1 MB, 全部使用: 2,538.1 MB, 当前进程使用: 412.8 MB ``` --- ### GPU **查询数量&型号&显存总量** <div class="tab-container post_tab box-shadow-wrap-lg"> <ul class="nav no-padder b-b scroll-hide" role="tablist"> <li class='nav-item ' role="presentation"><a class='nav-link ' style="" data-toggle="tab" aria-controls='tabs-5e3dd53a8dcb2b626190f2d79e4e5165530' role="tab" data-target='#tabs-5e3dd53a8dcb2b626190f2d79e4e5165530'>方法1</a></li><li class='nav-item active' role="presentation"><a class='nav-link active' style="" data-toggle="tab" aria-controls='tabs-14a7bdf9e62e9c5e0b63cb06df8b0991931' role="tab" data-target='#tabs-14a7bdf9e62e9c5e0b63cb06df8b0991931'>方法2</a></li><li class='nav-item ' role="presentation"><a class='nav-link ' style="" data-toggle="tab" aria-controls='tabs-52b5c7844f2bc5ff5203e705df46cd64692' role="tab" data-target='#tabs-52b5c7844f2bc5ff5203e705df46cd64692'>方法3</a></li><li class='nav-item ' role="presentation"><a class='nav-link ' style="" data-toggle="tab" aria-controls='tabs-84b3d88625387a7b1b20b0d78f7766a6603' role="tab" data-target='#tabs-84b3d88625387a7b1b20b0d78f7766a6603'>方法4</a></li> </ul> <div class="tab-content no-border"> <div role="tabpanel" id='tabs-5e3dd53a8dcb2b626190f2d79e4e5165530' class="tab-pane fade "> 基于pytorch库 ```python import torch num_gpus = torch.cuda.device_count() print(f' GPU数量: {num_gpus} '.center(64, '=')) for index in range(num_gpus): gpu_name = torch.cuda.get_device_name(index) gpu_memory = torch.cuda.get_device_properties(index).total_memory cvt_base = 1024 gpu_memory_gb = round(gpu_memory / cvt_base / cvt_base / cvt_base, 1) # Byte => GB, 保留1位有效数字 print(f'GPU-{index}, {gpu_name} VRAM: {gpu_memory_gb} GB') ``` </div><div role="tabpanel" id='tabs-14a7bdf9e62e9c5e0b63cb06df8b0991931' class="tab-pane fade active in"> 基于pynvml库(NVIDIA提供的) * 安装命令: `pip install pynvml` ```python import pynvml # 初始化 pynvml.nvmlInit() num_gpus = pynvml.nvmlDeviceGetCount() print(f' GPU数量: {num_gpus} '.center(64, '=')) cvt_base = 1024 for index in range(num_gpus): handle = pynvml.nvmlDeviceGetHandleByIndex(index) gpu_name = pynvml.nvmlDeviceGetName(handle) memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) gpu_memory = memory_info.total gpu_memory_gb = round(gpu_memory / cvt_base / cvt_base / cvt_base, 1) # Byte => GB, 保留1位有效数字 print(f'GPU-{index}, {gpu_name} VRAM: {gpu_memory_gb} GB') # 用完需要关闭 pynvml.nvmlShutdown() ``` </div><div role="tabpanel" id='tabs-52b5c7844f2bc5ff5203e705df46cd64692' class="tab-pane fade "> 基于pycuda库 * 安装`pip install gputil` * 地址: https://github.com/anderskm/gputil 示例代码 ```python import GPUtil gpus = GPUtil.getGPUs() gpu_count = len(gpus) print(f' GPUs={gpu_count} '.center(64, '=')) cvt_base = 1024 for gpu in gpus: gpu_id = gpu.id gpu_name = gpu.name driver = gpu.driver load = gpu.load mem_util = gpu.memoryUtil mem_total = gpu.memoryTotal mem_used = gpu.memoryUsed men_free = gpu.memoryFree print(f'GPU-{gpu_id} 型号: {gpu_name} 驱动: {driver} 计算负载: {load:.2%}; VRAM: 使用率={mem_util:.2%} 全部: {mem_total} 已用: {mem_used} MB 空闲: {men_free}') ``` </div><div role="tabpanel" id='tabs-84b3d88625387a7b1b20b0d78f7766a6603' class="tab-pane fade "> 基于pycuda库 * 安装命令: `pip install pycuda`, 需要稍等本地编译完成后才能安装成功 ```python import pycuda.driver as cuda # 初始化 cuda.init() num_gpus = cuda.Device.count() print(f' GPU数量: {num_gpus} '.center(64, '=')) for index in range(num_gpus): gpu = cuda.Device(index) gpu_name = gpu.name() gpu_memory = gpu.total_memory() cvt_base = 1024 gpu_memory_gb = round(gpu_memory / cvt_base / cvt_base / cvt_base, 1) # Byte => GB, 保留1位有效数字 print(f'GPU-{index}, {gpu_name} VRAM: {gpu_memory_gb} GB') ``` </div> </div> </div> 示例输出 ```shell =========================== GPU数量: 2 =========================== GPU-0, NVIDIA GeForce RTX 4060 Ti VRAM: 16.0 GB GPU-1, NVIDIA GeForce RTX 4060 Ti VRAM: 16.0 GB ``` **查询实时显存占用&总量** * 需要借助pynvml库, 安装命令`pip install pynvml` * 首先获取指定GPU的handle,之后查询对应的显存分布信息即可 * 获取指定GPU的handle, pynvml提供了根据设备索引, BUSID, UUID等方式 示例代码 ```python import pynvml # 初始化 pynvml.nvmlInit() num_gpus = pynvml.nvmlDeviceGetCount() gpu_driver = pynvml.nvmlSystemGetDriverVersion() print(f' GPU数量: {num_gpus} '.center(64, '=')) print(f'驱动: v{gpu_driver}') cvt_base = 1024 for index in range(num_gpus): handle = pynvml.nvmlDeviceGetHandleByIndex(index) gpu_name = pynvml.nvmlDeviceGetName(handle) memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) # Byte => MB total = round(memory_info.total / cvt_base / cvt_base, 1) used = round(memory_info.used / cvt_base / cvt_base, 1) free = round(memory_info.free / cvt_base / cvt_base, 1) print(f'GPU-{index} Model: {gpu_name}; VRAM: total={total} MB, free={free} MB, used={used} MB') # 用完需要关闭 pynvml.nvmlShutdown() ``` 示例输出 ```shell =========================== GPU数量: 2 =========================== 驱动: v550.54.15 GPU-0 Model: NVIDIA GeForce RTX 4060 Ti; VRAM: total=16380.0 MB, free=16052.1 MB, used=327.9 MB GPU-1 Model: NVIDIA GeForce RTX 4060 Ti; VRAM: total=16380.0 MB, free=16064.8 MB, used=315.2 MB ``` 补充: 名词解释 `Video RAM`, or `VRAM`, is the GPU's dedicated memory, where it stores the information to process graphical tasks. Nvidia, AMD and Intel are the main producers of consumer GPUs. --- ## 软件信息 **torch相关信息** 示例代码 ```python print(f"PyTorch 版本: {torch.__version__}") print(f"CUDA 可用性: {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"CUDA 版本: {torch.version.cuda}") print(f"cuDNN 版本: {torch.backends.cudnn.version()}") else: print("CUDA is not available.") ``` 示例输出 ```shell PyTorch 版本: 2.2.1 CUDA 可用性: True CUDA 版本: 12.1 cuDNN 版本: 8902 ``` 更多待补充... THE END 本文作者:将夜 本文链接:http://zoe.red/2024/360.html 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。 最后修改:2024 年 03 月 24 日 © 允许规范转载 赞 如果觉得我的文章对你有用,请随意赞赏