caffe2实现多任务学习

前言

caffe2 是caffe的升级版，跟caffe不兼容，解决了caffe的很多问题，比如没有原生的支持多机器训练，加强了移动端的支持等等。总之，caffe已经不再更新了，打上了tag 1.0，快快转到caffe2吧（but, tensorflow或许是更好的选择）

问题描述

题外话

为什么说tensorflow或许是更好的选择呢，是不是caffe2不够NB呢？其实caffe2还是非常强大的，只是感觉其社区的热度远不如tensorflow，主要是facebook的同学在维护，你看caffe2的issues 数量也不多，也没有啥回复解决问题，pull requests 好多都没能merge上。总之，个人感觉，为了使用上的方便，还没入坑的可以慎重考虑一下。

caffe支持多任务

多任务是一个很常见的需求，为了提高模型能力协同学习也好，或者为了减少inference时间也好，总之，它是一个高频的需求。caffe只是多种的数据格式来训练模型，最高效的是LMDB/LevelDB ，它用存储key-value键值对，key是啥其实不重要，只要唯一就行，而value是 Datum 结构体的序列化。

Datum 的protobuf定义如下：

message Datum {

  optional int32 channels = ;

  optional int32 height = ;

  optional int32 width = ;

  // the actual image data, in bytes

  optional bytes data = ;

  optional int32 label = ;

  // Optionally, the datum could also hold float data.

  repeated float float_data = ;

  // If true data contains an encoded image that need to be decoded

  optional bool encoded =  [default = false];

}

可以看到label是单值的，你无法存储多个label，并且Data 层有很多地方都是hard-code 这个，改起来比较麻烦。虽然也可以自己新写一个layer层来自定义数据层，比如我之前的项目直接写一个python layer来支持多任务，不过实现起来总没有caffe原生效率高，而且有了python layer之后就不支持多GPU训练了。

caffe2支持多任务

caffe2原生且搞笑的数据IO格式为LMDB，和caffe一样，value存放的是序列化之后的字符串，只不过在caffe2里面，这里不再是Datum，而是新的protobuf，叫TensorProtos(注意是复数)，即多个TensorProto的列表，TensorProto的定义如下：

message TensorProto {

  // The dimensions in the tensor.

  repeated int64 dims = ;

  enum DataType {

    UNDEFINED = ;

    FLOAT = ;  // float

    INT32 = ;  // int

    BYTE = ;  // BYTE, when deserialized, is going to be restored as uint8.

    STRING = ;  // string

    // Less-commonly used data types.

    BOOL = ;  // bool

    UINT8 = ;  // uint8_t

    INT8 = ;  // int8_t

    UINT16 = ;  // uint16_t

    INT16 = ;  // int16_t

    INT64 = ;  // int64_t

    FLOAT16 = ;  // caffe2::__f16, caffe2::float16

    DOUBLE = ;  // double

  }

  optional DataType data_type =  [default = FLOAT];

  // For float

  repeated float float_data =  [packed = true];

  // For int32, uint8, int8, uint16, int16, bool, and float16

  // Note about float16: in storage we will basically convert float16 byte-wise

  // to unsigned short and then store them in the int32_data field.

  repeated int32 int32_data =  [packed = true];

  // For bytes

  optional bytes byte_data = ;

  // For strings

  repeated bytes string_data = ;

  // For double

  repeated double double_data =  [packed = true];

  // For int64

  repeated int64 int64_data =  [packed = true];

  // Optionally, a name for the tensor.

  optional string name = ;



  // Optionally, a TensorProto can contain the details about the device that

  // it was serialized from. This is useful in cases like snapshotting a whole

  // workspace in a multi-GPU environment.

  optional DeviceOption device_detail = ;

  // When loading from chunks this is going to indicate where to put data in the

  // full array. When not used full data have to be present

  message Segment {

    required int64 begin = ;

    required int64 end = ;

  }

  optional Segment segment = ;

}

可以发现，新的格式支持丰富了很多，可以原生支持很多的格式和任务，如分割、检测等，没有指定哪个字段是来存放标签label，我们在选择上也更自由一些；

看代码可以知道，caffe2实现了一个ImageInput来进行高效的读取图片和标签数据，操作返回数据和标签


    data, label = brew.image_input(

        model,

        reader, ["data", "label"],

        batch_size=batch_size,

        use_caffe_datum=True,

        mean=,

        std=,

        scale=,

        crop=img_size,

        mirror=

    )

原生的代码不支持多个标签输出，稍微改一下即可，支持，diff如下：

diff --git a/caffe2/image/image_input_op.cc b/caffe2/image/image_input_op.cc
index f804..dec218b 
--- a/caffe2/image/image_input_op.cc
+++ b/caffe2/image/image_input_op.cc
@@ -, +, @@ The dimension of the output image will always be cropxcrop
     .Arg("db", "Name of the database (if not passed as input)")
     .Arg("db_type", "Type of database (if not passed as input)."
          " Defaults to leveldb")
+    .Arg("label_len", "len of labels, for multi-task or multi-dim regression purpose")
     .Input(, "reader", "The input reader (a db::DBReader)")
     .Output(, "data", "Tensor containing the images")
     .Output(, "label", "Tensor containing the labels");
diff --git a/caffe2/image/image_input_op.h b/caffe2/image/image_input_op.h
index ec5e9.d514b2 
--- a/caffe2/image/image_input_op.h
+++ b/caffe2/image/image_input_op.h
@@ -, +, @@ class ImageInputOp final
   bool is_test_;
   bool use_caffe_datum_;
   bool gpu_transform_;
+  bool mean_std_copied_ = false;
+  int label_len_;

   // thread pool for parse + decode
   int num_decode_threads_;
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
         num_decode_threads_(OperatorBase::template GetSingleArgument<int>(
               "decode_threads", )),
         thread_pool_(std::make_shared<TaskThreadPool>(num_decode_threads_)),
+        label_len_(OperatorBase::template GetSingleArgument<int>("label_len", )),
         // output type only supported with CUDA and use_gpu_transform for now
         output_type_(cast::GetCastDataType(this->arg_helper(), "output_type"))
 {
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
   LOG(INFO) << "    Outputting images as "
             << OperatorBase::template GetSingleArgument<string>("output_type", "unknown") << ".";

-  if (gpu_transform_) {
-    if (!std::is_same<Context, CUDAContext>::value) {
-      throw std::runtime_error("use_gpu_transform only for GPUs");
-    } else {
-      mean_gpu_.Resize(mean_.size());
-      std_gpu_.Resize(std_.size());
-
-      context_.template Copy<float, CPUContext, Context>(
-        mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
-      context_.template Copy<float, CPUContext, Context>(
-        std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
-    }
-  }
-
   std::mt19937 meta_randgen(time(nullptr));
   for (int i = ; i < num_decode_threads_; ++i) {
     randgen_per_thread_.emplace_back(meta_randgen());
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
       TIndex(crop_),
       TIndex(crop_),
       TIndex(color_ ?  : ));
-  prefetched_label_.Resize(vector<TIndex>(, batch_size_));
+  //prefetched_label_.Resize(vector<TIndex>(label_len_, batch_size_));
+  prefetched_label_.Resize(TIndex(batch_size_), TIndex(label_len_));
 }
 template <class Context>
@@ -, +, @@ bool ImageInputOp<Context>::GetImageAndLabelAndInfoFromDBValue(

     if (label_proto.data_type() == TensorProto::FLOAT) {
       DCHECK_EQ(label_proto.float_data_size(), );
-
-      prefetched_label_.mutable_data<float>()[item_id] =
-          label_proto.float_data();
+      for(int t = ; t < label_len_; t++) {
+      prefetched_label_.mutable_data<float>()[label_len_*item_id+t] =
+          label_proto.float_data(t);
+      }
     } else if (label_proto.data_type() == TensorProto::INT32) {
       DCHECK_EQ(label_proto.int32_data_size(), );

-      prefetched_label_.mutable_data<int>()[item_id] =
-          label_proto.int32_data();
+      for(int t = ; t < label_len_; t++) {
+        prefetched_label_.mutable_data<int>()[label_len_*item_id+t] =
+          label_proto.int32_data(t);
+      }
     } else {
       LOG(FATAL) << "Unsupported label type.";
     }
@@ -, +, @@ bool ImageInputOp<Context>::CopyPrefetched() {
     label_output->CopyFrom(prefetched_label_, &context_);
   } else {
     if (gpu_transform_) {
+      if (!mean_std_copied_) {
+        mean_gpu_.Resize(mean_.size());
+        std_gpu_.Resize(std_.size());
+
+        context_.template Copy<float, CPUContext, Context>(
+          mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
+        context_.template Copy<float, CPUContext, Context>(
+          std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
+        mean_std_copied_ = true;
+      }
       // GPU transform kernel allows explicitly setting output type
       if (output_type_ == TensorProto_DataType_FLOAT) {
         TransformOnGPU<uint8_t,float,Context>(prefetched_image_on_device_,

这样之后，

    data, label = brew.image_input(
        model,
        reader, ["data","labels"],
        batch_size=batch_size,
        use_caffe_datum=False,
        mean=,
        std=,
        scale=,
        crop=img_size,
        mirror=,
        label_len=
    )

的输出label就是多个标签了，然后再

 model.net.Split("labels", ["label_age", "label_gender"], axis=)

之后label_age, lagel_gender 就可以作为单独的label计算loss了。

后记

好久没有更新博客，一方面工作比较忙，另一方面也懒了好多（：<）。

Ps：今天重新看了caffe2 的github，貌似多个labels以及原生支持了，good，所以这篇记录，貌似没有啥用的说（：>）。