CUDA 13.2  ·  cuDNN 9.21.0

Xuehang Cang

Machine Learning Engineer

近来专注于模型微调与知识蒸馏
让大模型的能力以更低的代价迁移到更小的模型上

相信好的监督信号胜过堆砌算力,小模型也能做出大事

>Garbage in garbage out — Don't waste my GPU
train.log — pretrain.py
Epoch 1/3  |  8×H200 SXM  |  bs=512  |  lr=3e-4
train/loss
0/48000%  ·  ETA 06:26:43
>
nvidia-smi
Driver Version: 595.71  |  CUDA Version: 13.2
GPU  NameTemp  |  Power  |  Util
0   H200 SXM 141GB62°C  |  891W / 1000W  |  98%
1   H200 SXM 141GB64°C  |  878W / 1000W  |  99%
2   H200 SXM 141GB61°C  |  903W / 1000W  |  97%
3   H200 SXM 141GB65°C  |  867W / 1000W  |  99%
4   H200 SXM 141GB63°C  |  912W / 1000W  |  98%
5   H200 SXM 141GB60°C  |  875W / 1000W  |  99%
6   H200 SXM 141GB66°C  |  895W / 1000W  |  97%
7   H200 SXM 141GB63°C  |  882W / 1000W  |  99%
VRAM GPU-0133120 MiB / 143360 MiB
VRAM GPU-1 ~ 7  (avg)130048 MiB / 143360 MiB
Processes
GPUPIDProcess NameMiB
0-731024torchrun pretrain.py131072×8
# No idle VRAM. This is how it should be.
$