MobileNet

Deep Learning

发布日期: 2021-04-20

更新日期: 2024-12-22

文章字数: 2.6k

阅读时长: 12 分

阅读次数:

深度学习神经网络特征提取（三）

MobileNet简介

在之前的文章中已经介绍了VGG和ResNet相关的网络结构，随着深度学习的发展，都在追求精度和准确性，因此也导致了网络层数的加深抑或网络的扩展。然而随着网络的不断加深和扩展，参数的数量也在急剧上升，从而导致性能的下降。MobileNet的出现也正是为了解决这种情况。

MobileNetv1

MobileNetv1网络特点主要集中于提出的深度可分离卷积，其网络结构部分只是线性连接，如下图所示。

MobileNetv1

深度可分离卷积

介绍深度可分离卷积，那我们不得不与常规的卷积进行对比，常规的卷积操作如下图。

常规卷积

对于一张通道数为3，长宽为5的输入图像，经过3x3的卷积核，且输出层数为4的卷积时，其卷积核的真实情况如上图，在此种情况下参数量为：4x3x3x3=108。
而在深度可分离卷积中，我们进行同样的3x3的卷积核，且输出层数为4的卷积时，其操作情况如下两张图片。

在深度可分离卷积中，首先通过N个3x3的卷积核（其中N为输入的层数，在图一中N为3）与输入层数一一对应进行特征提取，然后再通过M个1xN的卷积进行层数的缩放（图2）。在这种情况下，参数量为：3x3x3+1x1x3x4=39。相较于常规卷积操作，深度可分离卷积的参数量下降了很多，大大提高了模型的运行性能，并且对最终的结果的精确度影响并不是很高。

MobileNetv1的网络结构

在上图中，我们给出了MobileNetv1的网络结构，主要处理流程为：

（步长为2的卷积和归一化）x 1
（步长为1的深度可分离卷积和归一化，步长为1的卷积和归一化，步长为2的深度可分离卷积和归一化，步长为1的卷积和归一化）x 3
（步长为1的深度可分离卷积和归一化，步长为1的卷积和归一化）x 5
（步长为2的深度可分离卷积和归一化，步长为1的卷积和归一化）x 2
一次7x7平均池化，一层全连接层
最后softmax层

代码如下：

#-------------------------------------------------------------#
#   MobileNet的网络部分
#-------------------------------------------------------------#
def MobileNet(input_shape=[224,224,3], depth_multiplier=1, dropout=1e-3, classes=1000):
    img_input = Input(shape=input_shape)

    # 224,224,3 -> 112,112,32
    x = _conv_block(img_input, 32, strides=(2, 2))

    # 112,112,32 -> 112,112,64
    x = _depthwise_conv_block(x, 64, depth_multiplier, block_id=1)

    # 112,112,64 -> 56,56,128
    x = _depthwise_conv_block(x, 128, depth_multiplier, strides=(2, 2), block_id=2)
    # 56,56,128 -> 56,56,128
    x = _depthwise_conv_block(x, 128, depth_multiplier, block_id=3)

    # 56,56,128 -> 28,28,256
    x = _depthwise_conv_block(x, 256, depth_multiplier, strides=(2, 2), block_id=4)

    # 28,28,256 -> 28,28,256
    x = _depthwise_conv_block(x, 256, depth_multiplier, block_id=5)

    # 28,28,256 -> 14,14,512
    x = _depthwise_conv_block(x, 512, depth_multiplier, strides=(2, 2), block_id=6)

    # 14,14,512 -> 14,14,512
    x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=7)
    x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=8)
    x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=9)
    x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=10)
    x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=11)

    # 14,14,512 -> 7,7,1024
    x = _depthwise_conv_block(x, 1024, depth_multiplier, strides=(2, 2), block_id=12)
    x = _depthwise_conv_block(x, 1024, depth_multiplier, block_id=13)

    # 7,7,1024 -> 1,1,1024
    x = GlobalAveragePooling2D()(x)
    x = Reshape((1, 1, 1024), name='reshape_1')(x)
    x = Dropout(dropout, name='dropout')(x)
    x = Conv2D(classes, (1, 1),padding='same', name='conv_preds')(x)
    x = Activation('softmax', name='act_softmax')(x)
    x = Reshape((classes,), name='reshape_2')(x)

    inputs = img_input

    model = Model(inputs, x, name='mobilenet_1_0_224_tf')
    return model

def _conv_block(inputs, filters, kernel=(3, 3), strides=(1, 1)):
    x = Conv2D(filters, kernel, padding='same', use_bias=False, strides=strides, name='conv1')(inputs)
    x = BatchNormalization(name='conv1_bn')(x)
    return Activation(relu6, name='conv1_relu')(x)


def _depthwise_conv_block(inputs, pointwise_conv_filters, depth_multiplier=1, strides=(1, 1), block_id=1):

    x = DepthwiseConv2D((3, 3), padding='same', depth_multiplier=depth_multiplier, strides=strides, use_bias=False, name='conv_dw_%d' % block_id)(inputs)

    x = BatchNormalization(name='conv_dw_%d_bn' % block_id)(x)
    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)

    x = Conv2D(pointwise_conv_filters, (1, 1), padding='same', use_bias=False, strides=(1, 1), name='conv_pw_%d' % block_id)(x)
    x = BatchNormalization(name='conv_pw_%d_bn' % block_id)(x)
    return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)

def relu6(x):
    return K.relu(x, max_value=6)

MobileNetv2

MobileNetv2网络特点相较于MobileNetv1提出了反残差结构和线性瓶颈结构，总体网络结构如下图所示。

MobileNetv2

反残差结构和线性瓶颈结构

反残差结构是相对于ResNet50而言的，此外MobileNetv2的基础结构和ResNet的基础结构一样，同样是双分支残差连接：

结构对比
基础结构

其中ResNet50中先卷积降维，然后进行3x3卷积提取特征，然后在进行升维，这样做在实际中部证明是比直接3x3卷积效果更好的。而在MobileNetv2中，反向进行操作。
而所谓的线性瓶颈结构则是在卷积降维之后不再进行ReLu6层激活，保证提取得到的特征不被破坏，直接与输入相加。

MobileNetv2的网络结构

在上图中，我们给出了MobileNetv2的网络结构，主要处理流程为：

步长为2的卷积层 x 1
步长为1的瓶颈层 x 1
步长为2的瓶颈层 x 3
步长为1的瓶颈层 x 1
步长为2的瓶颈层 x 1
步长为1的瓶颈层 x 1
步长为1的卷积层 x 1
7x7 平均池化层 x 1
全连接层softmax分类

代码如下：

#-------------------------------------------------------------#
#   MobileNetV2的网络部分
#-------------------------------------------------------------#
# relu6！
def relu6(x):
    return K.relu(x, max_value=6)


def MobileNetV2(input_shape=[224,224,3], classes=1000):

    img_input = Input(shape=input_shape)

    # 224,224,3 -> 112,112,32
    x = ZeroPadding2D(padding=(1, 1), name='Conv1_pad')(img_input)
    x = Conv2D(32, kernel_size=3, strides=(2, 2), padding='valid', use_bias=False, name='Conv1')(x)
    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='bn_Conv1')(x)
    x = Activation(relu6, name='Conv1_relu')(x)

    # 112,112,32 -> 112,112,16
    x = _inverted_res_block(x, filters=16, stride=1,expansion=1, block_id=0)

    # 112,112,16 -> 56,56,24
    x = _inverted_res_block(x, filters=24, stride=2, expansion=6, block_id=1)
    x = _inverted_res_block(x, filters=24, stride=1, expansion=6, block_id=2)

    # 56,56,24 -> 28,28,32
    x = _inverted_res_block(x, filters=32, stride=2, expansion=6, block_id=3)
    x = _inverted_res_block(x, filters=32, stride=1, expansion=6, block_id=4)
    x = _inverted_res_block(x, filters=32, stride=1, expansion=6, block_id=5)

    # 28,28,32 -> 14,14,64
    x = _inverted_res_block(x, filters=64, stride=2, expansion=6, block_id=6)
    x = _inverted_res_block(x, filters=64, stride=1, expansion=6, block_id=7)
    x = _inverted_res_block(x, filters=64, stride=1, expansion=6, block_id=8)
    x = _inverted_res_block(x, filters=64, stride=1, expansion=6, block_id=9)

    # 14,14,64 -> 14,14,96
    x = _inverted_res_block(x, filters=96, stride=1, expansion=6, block_id=10)
    x = _inverted_res_block(x, filters=96, stride=1, expansion=6, block_id=11)
    x = _inverted_res_block(x, filters=96, stride=1, expansion=6, block_id=12)
    # 14,14,96 -> 7,7,160
    x = _inverted_res_block(x, filters=160, stride=2, expansion=6, block_id=13)
    x = _inverted_res_block(x, filters=160, stride=1, expansion=6, block_id=14)
    x = _inverted_res_block(x, filters=160, stride=1, expansion=6, block_id=15)

    # 7,7,160 -> 7,7,320
    x = _inverted_res_block(x, filters=320, stride=1, expansion=6, block_id=16)

    # 7,7,320 -> 7,7,1280
    x = Conv2D(1280, kernel_size=1, use_bias=False, name='Conv_1')(x)
    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_1_bn')(x)
    x = Activation(relu6, name='out_relu')(x)

    x = GlobalAveragePooling2D()(x)
    x = Dense(classes, activation='softmax', use_bias=True, name='Logits')(x)

    inputs = img_input

    model = Model(inputs, x)

    return model


def _inverted_res_block(inputs, expansion, stride, pointwise_filters, block_id):
    in_channels = backend.int_shape(inputs)[-1]
    x = inputs
    prefix = 'block_{}_'.format(block_id)

    # part1 数据扩张
    if block_id:
        # Expand
        x = Conv2D(expansion * in_channels, kernel_size=1, padding='same', use_bias=False, activation=None, name=prefix + 'expand')(x)
        x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'expand_BN')(x)
        x = Activation(relu6, name=prefix + 'expand_relu')(x)
    else:
        prefix = 'expanded_conv_'

    if stride == 2:
        x = ZeroPadding2D(padding=(1,1), name=prefix + 'pad')(x)

    # part2 可分离卷积
    x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None, use_bias=False, padding='same' if stride == 1 else 'valid', name=prefix + 'depthwise')(x)
    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'depthwise_BN')(x)

    x = Activation(relu6, name=prefix + 'depthwise_relu')(x)

    # part3压缩特征，而且不使用relu函数，保证特征不被破坏
    x = Conv2D(pointwise_filters, kernel_size=1, padding='same', use_bias=False, activation=None, name=prefix + 'project')(x)

    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'project_BN')(x)

    if in_channels == pointwise_filters and stride == 1:
        return Add(name=prefix + 'add')([inputs, x])
    return x

MobileNetv3

MobileNetv3网络特点相较于MobileNetv2主要添加了以下特点：

轻量级的注意力机制
利用h-swish代替swish函数

主要网络结构有两种，一种large，一种small，主要区别在于通道数和基础块的次数，本文介绍small类型，网络结构如下：

MobileNetv3

轻量级注意力机制引入

在MobileNetv3中，由于轻量级注意力机制的引入，使得原来的基础块结构产生了一些变化，新的结构如图所示:

MobileNetv3 block

从上图我们可以直观的感受到，轻量级注意力机制的引入主要用于改变各个特征层之间的权重系数。
相信通过前面代码的学习你对特征提取的网络已经有了一定的了解，那么下面的代码就很容易理解了。
代码如下：

alpha = 1
def relu6(x):
    # relu函数
    return K.relu(x, max_value=6.0)

def hard_swish(x):
    # 利用relu函数乘上x模拟sigmoid
    return x * K.relu(x + 3.0, max_value=6.0) / 6.0

def return_activation(x, nl):
    # 用于判断使用哪个激活函数
    if nl == 'HS':
        x = Activation(hard_swish)(x)
    if nl == 'RE':
        x = Activation(relu6)(x)

    return x

def conv_block(inputs, filters, kernel, strides, nl):
    # 一个卷积单元，也就是conv2d + batchnormalization + activation
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1

    x = Conv2D(filters, kernel, padding='same', strides=strides)(inputs)
    x = BatchNormalization(axis=channel_axis)(x)

    return return_activation(x, nl)

def squeeze(inputs):
    # 注意力机制单元
    input_channels = int(inputs.shape[-1])

    x = GlobalAveragePooling2D()(inputs)
    x = Dense(int(input_channels/4))(x)
    x = Activation(relu6)(x)
    x = Dense(input_channels)(x)
    x = Activation(hard_swish)(x)
    x = Reshape((1, 1, input_channels))(x)
    x = Multiply()([inputs, x])

    return x

def bottleneck(inputs, filters, kernel, up_dim, stride, sq, nl):
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1

    input_shape = K.int_shape(inputs)

    tchannel = int(up_dim)
    cchannel = int(alpha * filters)

    r = stride == 1 and input_shape[3] == filters
    # 1x1卷积调整通道数，通道数上升
    x = conv_block(inputs, tchannel, (1, 1), (1, 1), nl)
    # 进行3x3深度可分离卷积
    x = DepthwiseConv2D(kernel, strides=(stride, stride), depth_multiplier=1, padding='same')(x)
    x = BatchNormalization(axis=channel_axis)(x)
    x = return_activation(x, nl)
    # 引入注意力机制
    if sq:
        x = squeeze(x)
    # 下降通道数
    x = Conv2D(cchannel, (1, 1), strides=(1, 1), padding='same')(x)
    x = BatchNormalization(axis=channel_axis)(x)


    if r:
        x = Add()([x, inputs])

    return x

def MobileNetv3_small(shape = (224,224,3),n_class = 1000):
    inputs = Input(shape)
    # 224,224,3 -> 112,112,16
    x = conv_block(inputs, 16, (3, 3), strides=(2, 2), nl='HS')

    # 112,112,16 -> 56,56,16
    x = bottleneck(x, 16, (3, 3), up_dim=16, stride=2, sq=True, nl='RE')

    # 56,56,16 -> 28,28,24
    x = bottleneck(x, 24, (3, 3), up_dim=72, stride=2, sq=False, nl='RE')
    x = bottleneck(x, 24, (3, 3), up_dim=88, stride=1, sq=False, nl='RE')

    # 28,28,24 -> 14,14,40
    x = bottleneck(x, 40, (5, 5), up_dim=96, stride=2, sq=True, nl='HS')
    x = bottleneck(x, 40, (5, 5), up_dim=240, stride=1, sq=True, nl='HS')
    x = bottleneck(x, 40, (5, 5), up_dim=240, stride=1, sq=True, nl='HS')
    # 14,14,40 -> 14,14,48
    x = bottleneck(x, 48, (5, 5), up_dim=120, stride=1, sq=True, nl='HS')
    x = bottleneck(x, 48, (5, 5), up_dim=144, stride=1, sq=True, nl='HS')

    # 14,14,48 -> 7,7,96
    x = bottleneck(x, 96, (5, 5), up_dim=288, stride=2, sq=True, nl='HS')
    x = bottleneck(x, 96, (5, 5), up_dim=576, stride=1, sq=True, nl='HS')
    x = bottleneck(x, 96, (5, 5), up_dim=576, stride=1, sq=True, nl='HS')

    x = conv_block(x, 576, (1, 1), strides=(1, 1), nl='HS')
    x = GlobalAveragePooling2D()(x)
    x = Reshape((1, 1, 576))(x)

    x = Conv2D(1024, (1, 1), padding='same')(x)
    x = return_activation(x, 'HS')

    x = Conv2D(n_class, (1, 1), padding='same', activation='softmax')(x)
    x = Reshape((n_class,))(x)

    model = Model(inputs, x)

    return model