2025-02-19 配置pytorch环境

This article is categorized as "Garbage" . It should NEVER be appeared in your search engine's results.

测试代码

$  python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.backends.cudnn.version()); print(torch.cuda.is_available()); print(torch.randn(1).cuda())"

2025-01-03

遇到了lib/python3.11/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

google结果都是近几个月的，感觉像是最近几个月新出的问题

正在参考🔗 [Installing PyTorch (CUDA 11.8) and PyTorch3D on Python 3.11. | by Gaurav Yadav | Medium] https://pro2017001.medium.com/installing-pytorch-cuda-11-8-and-pytorch-on-python-3-11-1fe872f29368

（事实证明这个教程确实管用，核心内容，本篇笔记后面的命令已经更新）

在python 3.12环境里遇到了AssertionError: Torch not compiled with CUDA enabled

不管了，要么是python3.12的问题要么是之前各种依赖粘一块的问题，新开个干净的环境看看

在一个新的python 3.11环境里遇到了RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

发现nvidia-smi也出问题了：Failed to initialize NVML: Driver/library version mismatch

重启以后就好了

1, pytorch狂吃cpu，完全不吃gpu

用代码torch.cuda.is_available()发现返回False，说明cuda没搞对

2，用conda安装正确的环境

$ nvcc --version 显示版本11.5

开一个新的python 3.11环境

然后参考🔗 [PyTorch installation with GPU support on Ubuntu - PyTorch Forums] https://discuss.pytorch.org/t/pytorch-installation-with-gpu-support-on-ubuntu/196350 里面的命令

# 2026-01-03，用这个
$ conda install -y "mkl<2024.1" "intel-openmp<2024.1" pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia


# 之前的
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

安装完毕后解决了torch.cuda.is_available()的问题

3, 指定model使用gpu

device = torch.device("cuda")
model = model.cuda()

4, 训练的过程中把普通的x和y变成tensor类型（否则会报错Expected all tensors to be on the same device, but found at least two devices...)

# 原代码
for X, y in train_data:

# 新代码
device = torch.device("cuda")
for X, y in train_data:
    X, y = X.to(device), y.to(device)

5. 训练出的gpu模型如果用来计算test_data，则原本的cpu代码也要改，而且改动幅度比较大。如果直接用gpu model去计算整个test_data则有概率gpu内存爆炸，这种情况下需要用batch_size切分，然后再把计算出来的结果append到一起去。实际上大多数情况下test_data只需要用cpu model就可以：

cpu_model = gpu_model.to('cpu')

Last Modified in 2026-01-03

Leave a Comment Anonymous comment is allowed / 允许匿名评论 Cancel reply