RTSEG:  Real-time semantic segmentation comparative study

Abstract: Most of the research on semantic segmentation only focuses on increasing the accuracy of segmentation models with little attention to computationally efficient solutions.所以实时是个很值得去做。基于feature extraction and decoding methods.

特征提取: VGG16, Resnet18, MobileNet and Shufflenet

Decoding SkipNet, unet, Dilation frontend

Dataset   cityscapes dataset for urban scenes

1. Introduction

Fcn transposed convolution

Pascal, NYU RGBD, Cityscapes and Mapillary

ENet 效果太差,包括ICNet等real-time算法,效果都不好

  1. Provide feature extraction and decoding method which is term as meta-architecture
  2. Present a trade-off between accuracy and computational efficiency
  3. Shufflenet leads 143x gflops reduction in comparison to segment


2. Benchmarking framewrk

2.1 meta-Architectures

downsampling factor is 32



Dilation frontend

使用空洞卷积代替下采样的feature map,空洞卷积确保网络维持足够的感受野而不需要通过pooling和stride conv来破坏像素结构。

2.2 Feature extraction architectures


3. Experiments

3.1 Ecperimental setup

Weighted cross entropy loss

Adam optimizer

Learning rate is set to 1e-4


L2 regularization with weight decay rate of 5e-4 is utilized to avoid over-fitting

Feature extractor part of the network is initialized with the pre-trained corresponding encoder trained on Imagenet

Input image resolution is 512x1024

3.2 Semantic Segmentation results

Semantic segmentation is evaluated using mean intersection over union (mIOU), per-class IOU, and per-category IOU