论文地址:https://arxiv.org/pdf/1506.02640.pdf
本文所包含代码GitHub地址:https://github.com/shankezh/DL_HotNet_Tensorflow
如果对机器学习有兴趣,不仅仅满足将深度学习模型当黑盒模型使用的,想了解为何机器学习可以训练拟合最佳模型,可以看我过往的博客,使用数学知识推导了机器学习中比较经典的案例,并且使用了python撸了一套简单的神经网络的代码框架用来加深理解:https://blog.csdn.net/shankezh/article/category/7279585
项目帮忙或工作机会请邮件联系:[email protected]
---------------------
本文对论文进行了代码复现;
---------------------
论文阅读
关键信息提取
1.当前检测系统应用分类器去执行检测,如DPM,RCNN等;
2.YOLO则将对象检测看做一个回归问题;使用这个系统,你进需要看一次图像就可以预测对象是什么且对象在哪里;
3.系统划分输入图像为SxS个格子,如果一个对象的中心在一个单元格中,那么这个单元格就负责检测这个对象;
4.每一个单元格预测B个bbox和这些bbox的置信率分数;分数反映了模型对这个box中的对象预测的正确率的思考;定义置信率分数为Pr(Object) * IOU;
5.如果没有对象在这个单元格,那么置信率分数将会是0;否则我们想要置信率分数等于预测box和真实box的IOU;
6.每个bbox包含五个预测,x,y,w,h,confidence(置信率),(x,y)坐标表示盒子的中心相对与单元格的边界;width和height是相对于整个图像预测的;置信率表示为真实box和预测box之间的IOU;
7.每个格子也要预测C个条件概率,Pr(Class i | Object);这些概率是限制在格子包含的一个对象中;
8.仅预测每个单元格中一组类别概率,无论里面有多个boxes B;
9.测试时乘上条件概率和乘上分离的盒子置信率,公式见下图:
其中,Pr(Object)要么为1,表示单元格中有对象,要么为0,表示单元格中无对象;
10.评估YOLO在PASCAL VOC中,我们使用S=7,B=2. PASCAL VOC有二十个标签类别,因此C=20,那么最终预测的结果就是7x7x30 shape的tensor,公式为S x S x (B * 5 + C);
11.设计的网络中,卷积层从图像抽取特征,fc预测输出概率和坐标;
12.设计的网络受到GoogLeNet的图像分类启发,设计了24个卷积层后跟2个fc层,使用了1x1后跟3x3来代替GoogLeNet的inception模块,全部设计见下图:
作者在ImageNet上进行了预训练分类任务,分辨率是224x224,然后使用了翻倍的分辨率去做检测,也就是448x448;
训练
1.在ImageNet上预训练;
2.预训练使用了前20层卷积层,后面跟一个均值池化层和一个全连接层;
3.由于一篇论文说同时添加卷积核全连接可以提升性能,因此我们添加了4个卷积层和2个全连接层,并设置了随机初始化权重;由于检测往往需要细密的纹理视觉信息,因此增加了输入到网络的分辨率,将224x224变成了448x448;
4.最后一层预测类别概率和bbox坐标;作者通过图片的宽高规范化了bbox的宽和高,因此他们降到了0和1之间;我们参数化bbox坐标x,y为特殊的单元格定位偏移,因此他们也总是在0和1之间;
5.我们最后一层使用线性激活单元,其他所有曾都是用leaky 激活函数,激活函数见下图(Leaky ReLU);
6.优化器使用和方差作为模型输出,因此和方差容易计算,但对于最大化平均精度来说不够完美;权衡定位误差和分类误差不够完美;如果每个图像中许多个单元格都不包含对象,那么就设置下这些单元格的置信率为0,通常梯度会超过包含对象的单元格,这会导致模型不稳定,因为早期的训练会发散;
7.为了解决6的问题,作者增加了bbox的坐标预测损失,减少不包含对象的置信率预测损失,因此使用了两个参数λcoord, λnoobj , λcoord = 5, λnoobj = 0.5;
8.和方差同样对待大boxes和小boxes的误差,度量误差时,大的boxes比小的boxes偏差更重要,为了部分解决这个问题,作者预测bbox的宽高的方根来代替直接预测宽高;
9.设计的误差如下:
这里使用X(上角标 #下角标)来代替公式中具有上下角标的变量,其中 1(obj # i)代表在第i个单元格中存在对象;1(obj # ij)代表第j个bbox的预测器在单元格i中负责检测;
10.在VOC2007和2012上训练了135个批次,batchsize为64,使用了momentun,冲量0.9,衰减0.0005;学习速率如下,第一个批次缓慢减少从0.001到0.01,后面到第75个批次,速率为0.01,然后再用0.001训练30个批次,最后使用0.0001训练30个批次;
11.为了避免过拟合,使用了dropout和数据扩充,dropout设置为0.5在第一个fc后,对于数据扩充使用了随机缩放和变换,大约在20%的原始图像尺寸,也随机调整了图像的HSV饱和度通过设置因子为1.5;
12.一些大的对象单位会交叉落到多个单元格内,因此,非极大抑制法(NMS)就起作用了;
YOLO的限制
1.yolo在预测bbox上具有较强的空间约束,因为每个单元格仅仅预测两个boxes和仅有一个类别;这样就限制了模型对较多邻近对象的预测;
2.模型也使用了粗略的相关性特征去预测bbox,因为结构有多重下采样层;
Tensorflow代码实现:
模型代码(包含预训练和检测两种网络)
model.py:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import net.Detection.YOLOV1.config as cfg
import numpy as np
class YOLO_Net(object):
def __init__(self,is_pre_training=False,is_training = True):
self.classes = cfg.VOC07_CLASS
self.pre_train_num = cfg.PRE_TRAIN_NUM
self.det_cls_num = len(self.classes)
self.image_size = cfg.DET_IMAGE_SIZE
self.cell_size = cfg.CELL_SIZE
self.boxes_per_cell = cfg.PER_CELL_CHECK_BOXES
self.output_size = (self.cell_size * self.cell_size) * ( 5 * self.boxes_per_cell + self.det_cls_num)
self.scale = 1.0 * self.image_size / self.cell_size
self.boundary1 = self.cell_size * self.cell_size * self.det_cls_num
self.boundary2 = self.boundary1 + self.cell_size * self.cell_size * self.boxes_per_cell
self.object_scale = cfg.OBJ_CONFIDENCE_SCALE
self.no_object_scale = cfg.NO_OBJ_CONFIDENCE_SCALE
self.class_scale = cfg.CLASS_SCALE
self.coord_scale = cfg.COORD_SCALE
self.learning_rate = 0.0001
self.batch_size = cfg.BATCH_SIZE
self.keep_prob = cfg.KEEP_PROB
self.pre_training = is_pre_training
self.offset = np.transpose(
np.reshape(
np.array(
[np.arange(self.cell_size)]*self.cell_size*self.boxes_per_cell
),(self.boxes_per_cell,self.cell_size,self.cell_size)
),(1,2,0)
)
self.bn_params = cfg.BATCH_NORM_PARAMS
self.is_training = tf.placeholder(tf.bool)
if self.pre_training:
self.images = tf.placeholder(tf.float32, [None, 224, 224, 3], name='images')
else:
self.images = tf.placeholder(tf.float32, [None, self.image_size, self.image_size, 3], name='images')
self.logits = self.build_network(self.images,is_training=self.is_training)
if is_training:
if self.pre_training:
self.labels = tf.placeholder(tf.float32, [None,self.pre_train_num])
self.classify_loss(self.logits,self.labels)
self.total_loss = tf.losses.get_total_loss()
self.evalution = self.classify_evalution(self.logits,self.labels)
print('预训练网络')
else:
self.labels = tf.placeholder(tf.float32, [None,self.cell_size,self.cell_size,5+self.det_cls_num])
self.det_loss_layer(self.logits,self.labels)
self.total_loss = tf.losses.get_total_loss()
tf.summary.scalar('total_loss', self.total_loss)
print('识别网络')
def build_network(self, images,is_training = True,scope = 'yolov1'):
net = images
with tf.variable_scope(scope):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_regularizer=slim.l2_regularizer(0.00004)):
with slim.arg_scope([slim.conv2d],
weights_initializer=slim.xavier_initializer(),
normalizer_fn=slim.batch_norm,
activation_fn=slim.nn.leaky_relu,
normalizer_params=self.bn_params):
with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
net = slim.conv2d(net, 64, [7, 7], stride=2, padding='SAME', scope='layer1')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool1')
net = slim.conv2d(net, 192, [3, 3], stride=1, padding='SAME', scope='layer2')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool2')
net = slim.conv2d(net, 128, [1, 1], stride=1, padding='SAME', scope='layer3_1')
net = slim.conv2d(net, 256, [3, 3], stride=1, padding='SAME', scope='layer3_2')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer3_3')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer3_4')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool3')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_1')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_2')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_3')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_4')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_5')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_6')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_7')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_8')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer4_9')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer4_10')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool4')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_1')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_2')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_3')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_4')
if self.pre_training:
net = slim.avg_pool2d(net, [7, 7], stride=1, padding='VALID', scope='clssify_avg5')
net = slim.flatten(net)
net = slim.fully_connected(net, self.pre_train_num, activation_fn=slim.nn.leaky_relu,
scope='classify_fc1')
return net
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_5')
net = slim.conv2d(net, 1024, [3, 3], stride=2, padding='SAME', scope='layer5_6')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_1')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_2')
net = slim.flatten(net)
net = slim.fully_connected(net, 1024, activation_fn=slim.nn.leaky_relu, scope='fc1')
net = slim.dropout(net, 0.5)
net = slim.fully_connected(net, 4096, activation_fn=slim.nn.leaky_relu, scope='fc2')
net = slim.dropout(net, 0.5)
net = slim.fully_connected(net, self.output_size, activation_fn=None, scope='fc3')
# N, 7,7,30
# net = tf.reshape(net,[-1,S,S,B*5+C])
return net
def classify_loss(self,logits,labels):
with tf.name_scope('classify_loss') as scope:
_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
mean_loss = tf.reduce_mean(_loss)
tf.losses.add_loss(mean_loss)
tf.summary.scalar(scope + 'classify_mean_loss', mean_loss)
def classify_evalution(self,logits,labels):
with tf.name_scope('classify_evaluation') as scope:
correct_pre = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accurary = tf.reduce_mean(tf.cast(correct_pre, 'float'))
# tf.summary.scalar(scope + 'accuracy:', accurary)
return accurary
'''
@:param predicts shape->[N,7x7x30]
@:param labels shape->[N,7,7,25] <==>[N,h方向,w方向,25] ==>[N,7,7,25(1:是否负责检测,2-5:坐标,6-25:类别one-hot)]
'''
def det_loss_layer(self, predicts, labels, scope='det_loss'):
with tf.variable_scope(scope):
predict_classes = tf.reshape(predicts[:, :self.boundary1],
[-1, 7, 7, 20]) # 类别预测 ->[batch_size,cell_size,cell_size,num_cls]
predict_scale = tf.reshape(predicts[:, self.boundary1:self.boundary2],
[-1, 7, 7, 2]) # 置信率预测-> [batch_size,cell_size,cell_size,boxes_per_cell]
predict_boxes = tf.reshape(predicts[:,self.boundary2:],
[-1, 7, 7, 2, 4]) # 坐标预测->[batch_size,cell_size,cell_size,boxes_per_cell,4]
response = tf.reshape(labels[:, :, :, 0], [-1, 7, 7, 1]) # 标签置信率,用来判断cell是否负责检测
boxes = tf.reshape(labels[:, :, :, 1:5], [-1, 7, 7, 1, 4]) # 标签坐标
boxes = tf.tile(boxes,
[1, 1, 1, 2, 1]) / self.image_size # 标签坐标,由于预测是2个,因此需要将标签也变成2个,同时对坐标进行yolo形式归一化
classes = labels[:, :, :, 5:] # 标签类别
offset = tf.constant(self.offset, dtype=tf.float32)
offset = tf.reshape(offset, [1, 7, 7, 2])
offset = tf.tile(offset, [tf.shape(boxes)[0], 1, 1, 1])
predict_boxes_tran = tf.stack([
1. * (predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
1. * (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
tf.square(predict_boxes[:, :, :, :, 2]),
tf.square(predict_boxes[:, :, :, :, 3])
], axis=-1)
# predict_boxes_tran = tf.transpose(predict_boxes_tran,[1,2,3,4,0])
iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
no_object_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
boxes_tran = tf.stack([
1. * boxes[:, :, :, :, 0] * 7 - offset,
1. * boxes[:, :, :, :, 1] * 7 - tf.transpose(offset, (0, 2, 1, 3)),
tf.sqrt(boxes[:, :, :, :, 2]),
tf.sqrt(boxes[:, :, :, :, 3])
], axis=-1)
# 类别损失
class_delta = response * (predict_classes - classes)
class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
name='class_loss') * self.class_scale
# 对象损失
object_delta = object_mask * (predict_scale - iou_predict_truth)
object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
name='object_loss') * self.object_scale
# 无对象损失
no_object_delta = no_object_mask * predict_scale
no_object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(no_object_delta), axis=[1, 2, 3]),
name='no_object_loss') * self.no_object_scale
# 坐标损失
coord_mask = tf.expand_dims(object_mask, 4)
boxes_delta = coord_mask * (predict_boxes - boxes_tran)
coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
name='coord_loss') * self.coord_scale
tf.losses.add_loss(class_loss)
tf.losses.add_loss(object_loss)
tf.losses.add_loss(no_object_loss)
tf.losses.add_loss(coord_loss)
tf.summary.scalar('class_loss', class_loss)
tf.summary.scalar('object_loss', object_loss)
tf.summary.scalar('noobject_loss', no_object_loss)
tf.summary.scalar('coord_loss', coord_loss)
tf.summary.histogram('boxes_delta_x', boxes_delta[:, :, :, :, 0])
tf.summary.histogram('boxes_delta_y', boxes_delta[:, :, :, :, 1])
tf.summary.histogram('boxes_delta_w', boxes_delta[:, :, :, :, 2])
tf.summary.histogram('boxes_delta_h', boxes_delta[:, :, :, :, 3])
tf.summary.histogram('iou', iou_predict_truth)
def calc_iou(self, boxes1, boxes2, scope='iou'):
"""calculate ious
Args:
boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)
boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
Return:
iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
"""
with tf.variable_scope(scope):
boxes1 = tf.stack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0], axis=-1)
# boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
boxes2 = tf.stack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2.0], axis=-1)
# boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])
lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])
intersection = tf.maximum(0.0, rd - lu)
inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]
square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])
union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)
return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)
训练(包含分类预训练和检测训练):
solver.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Created by solver on 19-5-6
import tensorflow as tf
from net.Detection.YOLOV1.model import YOLO_Net
import net.Detection.YOLOV1.config as cfg
import tensorflow.contrib.slim as slim
from net.Detection.YOLOV1.voc07_img import Pascal_voc
from coms.learning_rate import CLR_EXP_RANGE
from coms.utils import isHasGpu,isLinuxSys
import time,os
from coms.pre_process import get_cifar10_batch
import net.Detection.YOLOV1.voc07_tfrecord as VOC07RECORDS
class Solver(object):
def __init__(self,net,data,tf_records=False):
self.net = net
self.data = data
self.tf_records = tf_records
self.batch_size = cfg.BATCH_SIZE
self.clr = CLR_EXP_RANGE()
self.log_dir = cfg.LOG_DIR
self.model_cls_dir = cfg.CLS_MODEL_DIR
self.model_det_dir = cfg.DET_MODEL_DIR
self.learning_rate = tf.placeholder(tf.float32)
self.re_train = True
tf.summary.scalar('learning_rate',self.learning_rate)
self.optimizer = self.optimizer_bn(lr=self.learning_rate,loss=self.net.total_loss)
if isHasGpu():
gpu_option = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(allow_soft_placement=True,gpu_options=gpu_option)
else:
config = tf.ConfigProto(allow_soft_placement=True)
self.sess = tf.Session(config=config)
self.sess.run(tf.global_variables_initializer())
self.summary_op = tf.summary.merge_all()
n_time = time.strftime("%Y-%m-%d %H-%M", time.localtime())
self.writer = tf.summary.FileWriter(os.path.join(self.log_dir, n_time),self.sess.graph)
self.saver = tf.train.Saver(max_to_keep=4)
def train_classify(self):
self.set_classify_params()
max_acc = 0.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=self.sess, coord=coord)
for epoch in range(cfg.EPOCH):
for step in range(1,cfg.ITER_STEP+1):
learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.001,0.01,gamma=0.9998)
train_img_batch, train_label_batch = self.sess.run([self.train_img_batch,self.train_label_batch])
feed_dict_train = {self.net.images:train_img_batch, self.net.labels:train_label_batch, self.net.is_training:True,self.learning_rate:learning_rate_val}
_, summary_op, batch_train_loss, batch_train_acc = self.sess.run([self.optimizer, self.summary_op,self.net.total_loss,self.net.evalution],feed_dict=feed_dict_train)
global_step = int(epoch * cfg.ITER_STEP + step + 1)
print("epoch %d , step %d train end ,loss is : %f ,accuracy is %f ... ..." % (epoch, step, batch_train_loss, batch_train_acc))
train_summary = tf.Summary(
value=[tf.Summary.Value(tag='train_loss', simple_value=batch_train_loss)
, tf.Summary.Value(tag='train_batch_accuracy', simple_value=batch_train_acc)
, tf.Summary.Value(tag='learning_rate', simple_value=learning_rate_val)])
self.writer.add_summary(train_summary,global_step=global_step)
self.writer.add_summary(summary_op,global_step=global_step)
self.writer.flush()
if step % 100 == 0:
print('test sets evaluation start ...')
ac_iter = int(10000 / self.batch_size) # cifar-10测试集数量10000张
ac_sum = 0.
loss_sum = 0.
for ac_count in range(ac_iter):
batch_test_img, batch_test_label = self.sess.run([self.test_img_batch, self.test_label_batch])
feed_dict_test = {self.net.images: batch_test_img,self.net.labels: batch_test_label,self.net.is_training: False,self.learning_rate:learning_rate_val}
test_loss, test_accuracy = self.sess.run([self.net.total_loss, self.net.evalution],feed_dict=feed_dict_test)
ac_sum += test_accuracy
loss_sum += test_loss
ac_mean = ac_sum / ac_iter
loss_mean = loss_sum / ac_iter
print('epoch {} , step {} , accuracy is {}'.format(str(epoch), str(step), str(ac_mean)))
test_summary = tf.Summary(
value=[tf.Summary.Value(tag='test_loss', simple_value=loss_mean)
, tf.Summary.Value(tag='test_accuracy', simple_value=ac_mean)])
self.writer.add_summary(test_summary, global_step=global_step)
self.writer.flush()
if ac_mean >= max_acc:
max_acc = ac_mean
self.saver.save(self.sess, self.model_cls_dir + '/' + 'cifar10_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
print('max accuracy has reaching ,save model successful ...')
print('train network task was run over')
def set_classify_params(self):
self.train_img_batch,self.train_label_batch = get_cifar10_batch(is_train=True,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
self.test_img_batch,self.test_label_batch = get_cifar10_batch(is_train=False,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
def train_detector(self):
self.set_detector_params()
for epoch in range(cfg.EPOCH):
for step in range(1,cfg.ITER_STEP+1):
global_step = int(epoch * cfg.ITER_STEP + step + 1)
learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.0001,0.0005,gamma=0.9998)
if self.tf_records:
train_images, train_labels = self.sess.run(self.train_next_elements)
else:
train_images, train_labels = self.data.next_batch(self.gt_labels_train, self.batch_size)
feed_dict_train = {self.net.images:train_images,self.net.labels:train_labels,self.learning_rate:learning_rate_val,self.net.is_training:True}
_,summary_str,train_loss = self.sess.run([self.optimizer,self.summary_op,self.net.total_loss],feed_dict=feed_dict_train)
print("epoch %d , step %d train end ,loss is : %f ... ..." % (epoch, step, train_loss))
self.writer.add_summary(summary_str,global_step)
if step % 50 ==0:
print('test sets start ...')
# test sets sum :4962
sum_loss = 0.
# test_iter = int (4962 / self.batch_size)
test_iter = 10 # 取10个批次求均值
for _ in range(test_iter):
if self.tf_records:
test_images, test_labels = self.sess.run(self.test_next_elements)
else:
test_images,test_labels = self.data.next_batch(self.gt_labels_test,self.batch_size)
feed_dict_test = {self.net.images:test_images,self.net.labels:test_labels,self.net.is_training:False}
loss_iter = self.sess.run(self.net.total_loss,feed_dict=feed_dict_test)
sum_loss += loss_iter
mean_loss = sum_loss/test_iter
print('epoch {} , step {} , test loss is {}'.format(str(epoch), str(step), str(mean_loss)))
test_summary = tf.Summary(
value=[tf.Summary.Value(tag='test_loss', simple_value=mean_loss)])
self.writer.add_summary(test_summary, global_step=global_step)
self.writer.flush()
self.saver.save(self.sess,self.model_det_dir+'/' + 'det_voc07_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
print('save model successful ...')
def set_detector_params(self):
if self.tf_records:
train_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/trainval.tfrecords'
test_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/test.tfrecords'
train_datasets = VOC07RECORDS.DataSets(record_path=train_records_path,batch_size=self.batch_size)
train_gen = train_datasets.transform(shuffle=True)
train_iterator = train_gen.make_one_shot_iterator()
self.train_next_elements = train_iterator.get_next()
test_datasets = VOC07RECORDS.DataSets(record_path=test_records_path, batch_size=self.batch_size)
test_gen = test_datasets.transform(shuffle=True)
test_iterator = test_gen.make_one_shot_iterator()
self.test_next_elements = test_iterator.get_next()
else:
self.gt_labels_train = self.data.prepare('train')
self.gt_labels_test = self.data.prepare('test')
if self.re_train:
self.load_det_model()
else:
self.load_pre_train_model()
def load_pre_train_model(self):
net_vars = slim.get_model_variables()
model_file = tf.train.latest_checkpoint(self.model_cls_dir)
reader = tf.train.NewCheckpointReader(model_file)
model_vars = reader.get_variable_to_shape_map()
exclude = ['yolov1/classify_fc1/weights', 'yolov1/classify_fc1/biases']
vars_restore_map = {}
for var in net_vars:
if var.op.name in model_vars and var.op.name not in exclude:
vars_restore_map[var.op.name] = var
self.saver = tf.train.Saver(vars_restore_map,max_to_keep=4)
self.saver.restore(self.sess, model_file)
self.saver = tf.train.Saver(var_list=net_vars,max_to_keep=4)
def load_det_model(self):
# self.saver = tf.train.Saver(max_to_keep=4)
net_vars = slim.get_model_variables()
self.saver = tf.train.Saver(net_vars,max_to_keep=4)
model_file = tf.train.latest_checkpoint(self.model_det_dir)
self.saver.restore(self.sess, model_file)
# 带BN的训练函数
def optimizer_bn(self,lr, loss, mom=0.9, fun='mm'):
with tf.name_scope('optimzer_bn'):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies([tf.group(*update_ops)]):
optim = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9)
train_op = slim.learning.create_train_op(loss, optim)
return train_op
def train_classify():
yolov1 = YOLO_Net(is_pre_training=True)
sovler = Solver(net= yolov1,data=0)
print('start ...')
sovler.train_classify()
def train_detector():
yolov1 = YOLO_Net(is_pre_training=False)
pasvoc07 = Pascal_voc()
sovler = Solver(net=yolov1,data=pasvoc07)
print('start train ...')
sovler.train_detector()
def train_detector_with_records():
yolov1 = YOLO_Net(is_pre_training=False)
sovler = Solver(net=yolov1,data=0,tf_records=True)
print('start train ...')
sovler.train_detector()
if __name__ == '__main__':
train_detector_with_records()
至于测试代码detector和classifier我都放到github上,这里就不贴了,yolo一刀切完成识别和定位,强迫网络训练的形式是最像神经网络风格的,但毫无疑问的是,每个格子预测固定数量,一旦出现检测目标贴近就会有漏检的风险,recall就会很低;当然,yolov3改进了这些问题;
个人训练细节:
预训练使用了cifar10数据集,比较少,且分辨率比较低,因此识别训练也会受到很大影响,尤其是cifar10和voc07数据集类别对不上的情况较为突出;
识别训练使用了voc07数据集,前前后后,训练了上千个批次,batchsize也从32,64,96,128,断点训练的时候改的,这里的贴图也只是其中最开始的训练,后续的重载训练就不贴图了;
个人网络遵循了yolov1的网络,区别在于添加了BN加速了训练过程;
贴一下分类训练:
贴一下识别训练:
贴一下识别效果:
结论:
本次识别效果一般,误检和漏检都挺多;
yolo的设计思想对很多看rcnn二刀流流派的人来讲,较为怪异,很人多弄不明白,尤其是写代码的时候,建议多参考参考他人代码,我也是,参考了其他人写的代码,才能写出yolo的代码,最重要的,是yolo标签的制作过程,和其他的有很大不同,识别效果挺一般的,当然和我的预训练数据集有较大关系,也离不开yolov1本身设计还是有较大问题的缘故,后续会有其它升级论文来改正这个缺点;总之,好的效果需要建立在 好的数据集,好的模型,好的训练技巧的基础上。
这次代码全部放在了个人的Github上,包含了yolo的模型,训练(预训练,分类训练),检测(识别检测,分类检测),yolo标签制作(img版本和tfrecords版本),基本上应该是目前Yolov1最全的了,网上应该找不到比我这个还全的tensorflow的版本;
我训练好的权重传到了百度云盘,可以自行下载体验,权重文件,百度云地址为链接:https://pan.baidu.com/s/1BdMZYvkiYT9Fts0dLIgrog
提取码:0rmi
本次代码参考: