前言
caffe2 是caffe的升级版,跟caffe不兼容,解决了caffe的很多问题,比如没有原生的支持多机器训练,加强了移动端的支持等等。总之,caffe已经不再更新了,打上了tag 1.0,快快转到caffe2吧(but, tensorflow或许是更好的选择)
问题描述
题外话
为什么说tensorflow或许是更好的选择呢,是不是caffe2不够NB呢?其实caffe2还是非常强大的,只是感觉其社区的热度远不如tensorflow,主要是facebook的同学在维护,你看caffe2的issues 数量也不多,也没有啥回复解决问题,pull requests 好多都没能merge上。总之,个人感觉,为了使用上的方便,还没入坑的可以慎重考虑一下。
caffe支持多任务
多任务是一个很常见的需求,为了提高模型能力协同学习也好,或者为了减少inference时间也好,总之,它是一个高频的需求。caffe只是多种的数据格式来训练模型,最高效的是LMDB/LevelDB ,它用存储key-value键值对,key是啥其实不重要,只要唯一就行,而value是 Datum 结构体的序列化。
Datum 的protobuf定义如下:
message Datum {
optional int32 channels = ;
optional int32 height = ;
optional int32 width = ;
// the actual image data, in bytes
optional bytes data = ;
optional int32 label = ;
// Optionally, the datum could also hold float data.
repeated float float_data = ;
// If true data contains an encoded image that need to be decoded
optional bool encoded = [default = false];
}
可以看到label是单值的,你无法存储多个label,并且Data 层有很多地方都是hard-code 这个,改起来比较麻烦。虽然也可以自己新写一个layer层来自定义数据层,比如我之前的项目直接写一个python layer来支持多任务,不过实现起来总没有caffe原生效率高,而且有了python layer之后就不支持多GPU训练了。
caffe2支持多任务
caffe2原生且搞笑的数据IO格式为LMDB,和caffe一样,value存放的是序列化之后的字符串,只不过在caffe2里面,这里不再是Datum,而是新的protobuf,叫TensorProtos(注意是复数),即多个TensorProto的列表,TensorProto的定义如下:
message TensorProto {
// The dimensions in the tensor.
repeated int64 dims = ;
enum DataType {
UNDEFINED = ;
FLOAT = ; // float
INT32 = ; // int
BYTE = ; // BYTE, when deserialized, is going to be restored as uint8.
STRING = ; // string
// Less-commonly used data types.
BOOL = ; // bool
UINT8 = ; // uint8_t
INT8 = ; // int8_t
UINT16 = ; // uint16_t
INT16 = ; // int16_t
INT64 = ; // int64_t
FLOAT16 = ; // caffe2::__f16, caffe2::float16
DOUBLE = ; // double
}
optional DataType data_type = [default = FLOAT];
// For float
repeated float float_data = [packed = true];
// For int32, uint8, int8, uint16, int16, bool, and float16
// Note about float16: in storage we will basically convert float16 byte-wise
// to unsigned short and then store them in the int32_data field.
repeated int32 int32_data = [packed = true];
// For bytes
optional bytes byte_data = ;
// For strings
repeated bytes string_data = ;
// For double
repeated double double_data = [packed = true];
// For int64
repeated int64 int64_data = [packed = true];
// Optionally, a name for the tensor.
optional string name = ;
// Optionally, a TensorProto can contain the details about the device that
// it was serialized from. This is useful in cases like snapshotting a whole
// workspace in a multi-GPU environment.
optional DeviceOption device_detail = ;
// When loading from chunks this is going to indicate where to put data in the
// full array. When not used full data have to be present
message Segment {
required int64 begin = ;
required int64 end = ;
}
optional Segment segment = ;
}
可以发现,新的格式支持丰富了很多,可以原生支持很多的格式和任务,如分割、检测等,没有指定哪个字段是来存放标签label,我们在选择上也更自由一些;
看代码可以知道,caffe2实现了一个ImageInput来进行高效的读取图片和标签数据,操作返回数据和标签
data, label = brew.image_input(
model,
reader, ["data", "label"],
batch_size=batch_size,
use_caffe_datum=True,
mean=,
std=,
scale=,
crop=img_size,
mirror=
)
原生的代码不支持多个标签输出,稍微改一下即可,支持,diff如下:
diff --git a/caffe2/image/image_input_op.cc b/caffe2/image/image_input_op.cc
index f804..dec218b
--- a/caffe2/image/image_input_op.cc
+++ b/caffe2/image/image_input_op.cc
@@ -, +, @@ The dimension of the output image will always be cropxcrop
.Arg("db", "Name of the database (if not passed as input)")
.Arg("db_type", "Type of database (if not passed as input)."
" Defaults to leveldb")
+ .Arg("label_len", "len of labels, for multi-task or multi-dim regression purpose")
.Input(, "reader", "The input reader (a db::DBReader)")
.Output(, "data", "Tensor containing the images")
.Output(, "label", "Tensor containing the labels");
diff --git a/caffe2/image/image_input_op.h b/caffe2/image/image_input_op.h
index ec5e9.d514b2
--- a/caffe2/image/image_input_op.h
+++ b/caffe2/image/image_input_op.h
@@ -, +, @@ class ImageInputOp final
bool is_test_;
bool use_caffe_datum_;
bool gpu_transform_;
+ bool mean_std_copied_ = false;
+ int label_len_;
// thread pool for parse + decode
int num_decode_threads_;
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
num_decode_threads_(OperatorBase::template GetSingleArgument<int>(
"decode_threads", )),
thread_pool_(std::make_shared<TaskThreadPool>(num_decode_threads_)),
+ label_len_(OperatorBase::template GetSingleArgument<int>("label_len", )),
// output type only supported with CUDA and use_gpu_transform for now
output_type_(cast::GetCastDataType(this->arg_helper(), "output_type"))
{
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
LOG(INFO) << " Outputting images as "
<< OperatorBase::template GetSingleArgument<string>("output_type", "unknown") << ".";
- if (gpu_transform_) {
- if (!std::is_same<Context, CUDAContext>::value) {
- throw std::runtime_error("use_gpu_transform only for GPUs");
- } else {
- mean_gpu_.Resize(mean_.size());
- std_gpu_.Resize(std_.size());
-
- context_.template Copy<float, CPUContext, Context>(
- mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
- context_.template Copy<float, CPUContext, Context>(
- std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
- }
- }
-
std::mt19937 meta_randgen(time(nullptr));
for (int i = ; i < num_decode_threads_; ++i) {
randgen_per_thread_.emplace_back(meta_randgen());
@@ -, +, @@ ImageInputOp<Context>::ImageInputOp(
TIndex(crop_),
TIndex(crop_),
TIndex(color_ ? : ));
- prefetched_label_.Resize(vector<TIndex>(, batch_size_));
+ //prefetched_label_.Resize(vector<TIndex>(label_len_, batch_size_));
+ prefetched_label_.Resize(TIndex(batch_size_), TIndex(label_len_));
}
template <class Context>
@@ -, +, @@ bool ImageInputOp<Context>::GetImageAndLabelAndInfoFromDBValue(
if (label_proto.data_type() == TensorProto::FLOAT) {
DCHECK_EQ(label_proto.float_data_size(), );
-
- prefetched_label_.mutable_data<float>()[item_id] =
- label_proto.float_data();
+ for(int t = ; t < label_len_; t++) {
+ prefetched_label_.mutable_data<float>()[label_len_*item_id+t] =
+ label_proto.float_data(t);
+ }
} else if (label_proto.data_type() == TensorProto::INT32) {
DCHECK_EQ(label_proto.int32_data_size(), );
- prefetched_label_.mutable_data<int>()[item_id] =
- label_proto.int32_data();
+ for(int t = ; t < label_len_; t++) {
+ prefetched_label_.mutable_data<int>()[label_len_*item_id+t] =
+ label_proto.int32_data(t);
+ }
} else {
LOG(FATAL) << "Unsupported label type.";
}
@@ -, +, @@ bool ImageInputOp<Context>::CopyPrefetched() {
label_output->CopyFrom(prefetched_label_, &context_);
} else {
if (gpu_transform_) {
+ if (!mean_std_copied_) {
+ mean_gpu_.Resize(mean_.size());
+ std_gpu_.Resize(std_.size());
+
+ context_.template Copy<float, CPUContext, Context>(
+ mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
+ context_.template Copy<float, CPUContext, Context>(
+ std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
+ mean_std_copied_ = true;
+ }
// GPU transform kernel allows explicitly setting output type
if (output_type_ == TensorProto_DataType_FLOAT) {
TransformOnGPU<uint8_t,float,Context>(prefetched_image_on_device_,
这样之后,
data, label = brew.image_input(
model,
reader, ["data","labels"],
batch_size=batch_size,
use_caffe_datum=False,
mean=,
std=,
scale=,
crop=img_size,
mirror=,
label_len=
)
的输出label就是多个标签了,然后再
model.net.Split("labels", ["label_age", "label_gender"], axis=)
之后label_age, lagel_gender 就可以作为单独的label计算loss了。
后记
好久没有更新博客,一方面工作比较忙,另一方面也懒了好多(:<)。
Ps:今天重新看了caffe2 的github,貌似多个labels以及原生支持了,good,所以这篇记录,貌似没有啥用的说(:>)。