作者将传统卷积分成了两步：depth wise convolution和point wise convolution。字面理解是先深度分离地逐层卷积，然后使用逐点的1*1卷积将深度分离的特征连接起来。这样大大降低了计算复杂度。
引入了width multiplier和resolution multiplier两个超参数。width multiplier（宽度系数）是用来减少输入的特征通道数，resolution multiplier（尺度系数）是用来缩小图像分辨率。

深度可分离卷积

假设kernel size为 $D_k*D_k$

深度可分离卷积pytorch代码

深度可分离卷积示意图：

pytorch代码如下：

import torch.nn as nn 

class DepthwiseSeparableConv(nn.Module):
	def __init__(self, in_channels, out_channels):
		super(DepthwiseSeparableConv, self).__init__()
		self.depthwise = nn.Sequential(
						nn.Conv2d(in_channels, in_channels, padding=1, kernel_size=3, group=in_channels),
						nn.BatchNorm2d(in_channels),
						nn.ReLU(),
						)
		self.pointwise = nn.Sequential(
						nn.Conv2d(in_channels, out_channels, kernel_size=1),
						nn.BatchNorm2d(out_channels),
						nn.ReLU(),
						)
		
	def forward(self, x):
		x = self.depthwise(x)
		out = self.pointwise(x)
		return out

MobileNet-V1 架构图

如果把深度可分离卷积视为单独一个卷积，那么MobileNet-V1具有28层。网络最后使用了7*7的avg pooling，整个网络架构中没有使用max pooling进行特征提取与下采样，而是使用stride=2的卷积来进行下采样。这样做的目的是：本身小网络不太会产生过拟合，更大的可能性会欠拟合，加上pooling会使特征丢失从而更容易产生欠拟合。

width multiplier 与 resolution multiplier

尽管MobileNet已经很小，但是作者考虑到仍然有很多场景需要更小更快的模型，所以引入了width multiplier α。他的作用是给模型瘦身，输入通道数改为 $，输出通道数改为。加入了width multiplier的mobile_net的计算复杂度变为：$

MobileNet-V2

论文连接

概述

这篇文章的贡献：1. 提出了新的框架MobileNet-v2；2. 提出了面向检测的模型SSDLite；3. 提出了面向语义分割的Mobile DeepLabv3。4. MobileNet-V2使用了倒残差结构（inverted residual structure）

Linear Bottlenecks & Inverted residuals

这一块作者写的比较神神叨叨的，在这里我用最通俗的话解释了我的理解。
作者发现，在MobileNet-V1中使用深度分离卷积会产生大量的的负值（我理解卷积通道为1，做累加的值数量很少，导致卷积后的结果偏小甚至为负数？），在通过ReLU激活函数后，会出现大量的0。变成死神经元。Linear Bottlenecks和Inverted residuals是为解决这个问题产生的。
论文中有这样一张图来说明高纬的ReLU能保留更多的特征。
在这里插入图片描述

上图是出自一篇综述文章。是来对比使用了bottleneck的res block与传统的res block的区别。with bottleneck也就是Inverted residuals。是因为传统的ResNet block的输入与输出通道数比较大中间卷积的特征通道数小，然而Inverted residuals的输入输出通道数小，中间卷的特征通道数比较大。从上图中可以看到，在inverted residuals block中先经过一个11的卷积将特征映射到高维空间，其后经过一个3X3的卷积（深度可分离卷积）然后再经过11卷积将特征映射到低维空间。
仍然这么理解：深度可分离卷积会会产生大量负值的原因是特征通道数较小，所以需要在特征进入深度可分离卷积之前将其映射到高维空间，在做cutshort之前的特征通道数要与block的输入特征通道数相同，所以在cut short之前又加了一个1*1卷积。

Inverted residuals block代码

参考代码链接

import torch 
import torch.nn as nn 
import torchvision
def Conv3x3BNReLU(in_channels,out_channels,stride,groups):
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=stride, padding=1, groups=groups),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True)
        )

def Conv1x1BNReLU(in_channels,out_channels):
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True)
        )

def Conv1x1BN(in_channels,out_channels):
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels)
        )

class InvertedResidual(nn.Module):
    def __init__(self, in_channels, out_channels, stride, expansion_factor=6):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        mid_channels = (in_channels * expansion_factor)

        self.bottleneck = nn.Sequential(
            Conv1x1BNReLU(in_channels, mid_channels),
            Conv3x3BNReLU(mid_channels, mid_channels, stride,groups=mid_channels),
            Conv1x1BN(mid_channels, out_channels)
        )

        if self.stride == 1:
            self.shortcut = Conv1x1BN(in_channels, out_channels)

    def forward(self, x):
        out = self.bottleneck(x)
        out = (out+self.shortcut(x)) if self.stride==1 else out
        return out

MobileNet-V3

还没看呢，貌似用V2的比较多。总结告一段落

淘先锋技术网

mobile V1到V3

目录

MobileNet-V1

概要