RTSEG: Real-time semantic segmentation comparative study
Abstract: Most of the research on semantic segmentation only focuses on increasing the accuracy of segmentation models with little attention to computationally efficient solutions.所以实时是个很值得去做。基于feature extraction and decoding methods.
特征提取: VGG16, Resnet18, MobileNet and Shufflenet
Decoding SkipNet, unet, Dilation frontend
Dataset cityscapes dataset for urban scenes
1. Introduction
Fcn transposed convolution
Pascal, NYU RGBD, Cityscapes and Mapillary
ENet 效果太差,包括ICNet等real-time算法,效果都不好
- Provide feature extraction and decoding method which is term as meta-architecture
- Present a trade-off between accuracy and computational efficiency
- Shufflenet leads 143x gflops reduction in comparison to segment
2. Benchmarking framewrk
2.1 meta-Architectures
downsampling factor is 32
Skipnet
U-net
Dilation frontend
使用空洞卷积代替下采样的feature map,空洞卷积确保网络维持足够的感受野而不需要通过pooling和stride conv来破坏像素结构。
2.2 Feature extraction architectures
3. Experiments
3.1 Ecperimental setup
Weighted cross entropy loss
Adam optimizer
Learning rate is set to 1e-4
BN
L2 regularization with weight decay rate of 5e-4 is utilized to avoid over-fitting
Feature extractor part of the network is initialized with the pre-trained corresponding encoder trained on Imagenet
Input image resolution is 512x1024
3.2 Semantic Segmentation results
Semantic segmentation is evaluated using mean intersection over union (mIOU), per-class IOU, and per-category IOU