在安装caffe2和detectron过程中遇到的问题
1.cmake版本过低 需要使用cmake3及以上
2.使用cmake –version出现下述错误:
CMake Error: Could not find CMAKE_ROOT !!!
CMake has most likely not been installed correctly.
Modules directory not found in
/home/kelvin/local/bin
注意环境变量的添加
3.git clone https://github.com/RLovelett/eigen.git.出现让填账号问题
因为eigen在这个链接不可获得,官方关闭了所有用户的权限,使用镜像链接
https://github.com/eigenteam/eigen-git-mirror.git.。
4.nvcc -V
nvcc command not found
#配置环境变量
echo 'export PATH=/usr/local/cuda-8.0/bin:$PATH'>>~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH'>>~/.bashrc
source ~/.bashrc
#配置完之后,重新输入nvcc -V就可以看到详细信息了
5.ImportError: No module named past.builtins
pip install future(不要加sudo)
6.No module named google.protobuf.internal
pip install protobuf
7.ImportError: No module named hypothesis
pip install hypothesis
8.No module named caffe2.python
#python 环境变量的添加
#要注意caffe2没有安装到Python环境下,所以要到安装的caffe2下进行代码运行
export PATH=/usr/local/pytorch/build:$PATH
export PYTHONPATH=/usr/local/pytorch/build:$PYTHONPATH
9.only in cpu : pybind11模块没有
另行安装pybind11,并且将/path/to/pybind11/includ加入环境变量
(加入环境变量有时候不起作用,则在caffe2 cmake的时候 cmake -D pybind11_INCLUDE_DIR=/root/pybind11/build)
10.RuntimeError: [enforce fail at common_cudnn.h:118] version_match. cuDNN compiled (5105) and runtime (6021) versions mismatch
重新编译安装caffe2,同时指定cudnn和cuda文件夹,cudnn指定为6.0版本
cmake时加上
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0
-DCUDNN_ROOT_DIR=/usr/local/cuda
11.File “/home/gxwtest/object-detection/detectron/lib/utils/vis.py”, line 327, in vis_one_image
ValueError: need more than 2 values to unpack
opencv2.x版本里,cv2.findContours返回值只有两个,vis.py 327行注释掉‘_’
12._png.write_png()
RuntimeError: Could not create write struct
pip安装matplotlib版本过高,_png.so似乎依赖libpng16.so,本机装的是libpng15.so,卸载matplotlib,用yum install python-matplotlib安装低版本Matplotlib
13.多卡训练时报错:
RuntimeError: [enforce fail at context_gpu.h:314] error == cudaSuccess. 77 vs 0. Error at: /home/gxwtest/object-detection/caffe2/caffe2/core/context_gpu.h:314: an illegal memory access was encountered
#Python端执行
from caffe2.python import workspace
print(workspace.GetCudaPeerAccessPattern())
显示[True,False,False,True]显示只能单卡或者双卡训练,双卡训练时只能(1,4)和(2,3)组合,其它配置都是错误。
#执行多卡训练时用:
python2.7 tools/train_net.py \
--multi-gpu-testing \
--cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R--FPN.yaml \
OUTPUT_DIR ./tmp/detectron-output USE_NCCL True
14.多卡训练使用nccl报错:
cudaSuccess. 2 vs 0.
重装NCCL2.1.15版
wget https://developer.download.nvidia.com/compute/redist/nccl/v2.1/nccl_2.1.15-1%2Bcuda8.0_x86_64.txz
参考https://github.com/facebookresearch/video-nonlocal-net/issues/14
然后重新运行caffe2
cmake -DNCCL_ROOT_DIR=/usr/local/nccl
官方安装链接
其他错误参考链接
https://blog.csdn.net/wfei101/article/details/79451754
https://blog.csdn.net/zziahgf/article/details/79141879
https://blog.csdn.net/zziahgf/article/details/72461175