caffe2、detectron安装遇到的问题归纳

在安装caffe2和detectron过程中遇到的问题

1.cmake版本过低需要使用cmake3及以上

2.使用cmake –version出现下述错误：

CMake Error: Could not find CMAKE_ROOT !!!
CMake has most likely not been installed correctly.
Modules directory not found in
/home/kelvin/local/bin

注意环境变量的添加

3.git clone https://github.com/RLovelett/eigen.git.出现让填账号问题

因为eigen在这个链接不可获得，官方关闭了所有用户的权限，使用镜像链接
https://github.com/eigenteam/eigen-git-mirror.git.。

4.nvcc -V

nvcc command not found

#配置环境变量
echo 'export PATH=/usr/local/cuda-8.0/bin:$PATH'>>~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH'>>~/.bashrc
source ~/.bashrc
#配置完之后，重新输入nvcc -V就可以看到详细信息了

5.ImportError: No module named past.builtins

pip install future（不要加sudo）

6.No module named google.protobuf.internal

pip install protobuf

7.ImportError: No module named hypothesis

pip install hypothesis

8.No module named caffe2.python

#python 环境变量的添加
#要注意caffe2没有安装到Python环境下，所以要到安装的caffe2下进行代码运行
export PATH=/usr/local/pytorch/build:$PATH
export PYTHONPATH=/usr/local/pytorch/build:$PYTHONPATH

9.only in cpu : pybind11模块没有

另行安装pybind11，并且将/path/to/pybind11/includ加入环境变量
(加入环境变量有时候不起作用，则在caffe2 cmake的时候 cmake -D pybind11_INCLUDE_DIR=/root/pybind11/build)

10.RuntimeError: [enforce fail at common_cudnn.h:118] version_match. cuDNN compiled (5105) and runtime (6021) versions mismatch

重新编译安装caffe2，同时指定cudnn和cuda文件夹，cudnn指定为6.0版本
cmake时加上
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0
-DCUDNN_ROOT_DIR=/usr/local/cuda

11.File “/home/gxwtest/object-detection/detectron/lib/utils/vis.py”, line 327, in vis_one_image

ValueError: need more than 2 values to unpack

opencv2.x版本里，cv2.findContours返回值只有两个，vis.py 327行注释掉‘_’

12._png.write_png(）

RuntimeError: Could not create write struct

pip安装matplotlib版本过高，_png.so似乎依赖libpng16.so，本机装的是libpng15.so，卸载matplotlib,用yum install python-matplotlib安装低版本Matplotlib

13.多卡训练时报错：

RuntimeError: [enforce fail at context_gpu.h:314] error == cudaSuccess. 77 vs 0. Error at: /home/gxwtest/object-detection/caffe2/caffe2/core/context_gpu.h:314: an illegal memory access was encountered

#Python端执行
from caffe2.python import workspace
print(workspace.GetCudaPeerAccessPattern())

显示[True,False,False,True]显示只能单卡或者双卡训练，双卡训练时只能(1,4)和(2,3)组合，其它配置都是错误。

#执行多卡训练时用：
python2.7 tools/train_net.py \
    --multi-gpu-testing \
    --cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R--FPN.yaml \
    OUTPUT_DIR ./tmp/detectron-output USE_NCCL True

14.多卡训练使用nccl报错：

cudaSuccess. 2 vs 0.

重装NCCL2.1.15版
wget https://developer.download.nvidia.com/compute/redist/nccl/v2.1/nccl_2.1.15-1%2Bcuda8.0_x86_64.txz
参考https://github.com/facebookresearch/video-nonlocal-net/issues/14
然后重新运行caffe2
cmake -DNCCL_ROOT_DIR=/usr/local/nccl