[实战]200类鸟类细粒度图像分类
发布日期:2021-06-29 15:32:17 浏览次数:3 分类:技术文章

本文共 34328 字,大约阅读时间需要 114 分钟。

[实战]200类鸟类细粒度分类识别

我又来了!!!!

一、图像分类

这次进行实战项目,鸟类细粒度分类识别实战。再讲细粒度分类之前,让我们先回顾一下图像分类吧。

图像分类是计算机视觉的最基础的一个任务,从最开始的入门级的mnist手写数字识别、猫狗图像二分类到后来的imagenet任务。图像分类模型随着数据集的增长,一步步提升到了今天的水平。计算机的图像分类水准已经超过了人类。

在这里我把图像分类任务分为了两种,一种是单标签的图像分类任务,一种是多标签的图像分类任务。

多标签的图像分类任务,更加符合人们的认知习惯。因为现实生活中的图片往往会包含多个类别物体。

而在单标签的图像分类任务中又可以分为三类:一种是跨物种语义级别的图像分类,即在不同物种的层次上识别不同类别对象,比如我们常见的猫狗分类。

一种是实例级图像分类即区分不同的个体,最典型的任务那就是人脸识别。 而还剩下最后一种就是细粒度分类,那么什么是细粒度分类呢?

二、图像细粒度分类

而细粒度图像分类,相比较我们前面所说 的跨物种的图像分类,级别更低一些。但相比较实例级的图像分类,级别稍高一些。

概念上的说法 是对同一大类中的子类的分类,

通俗来讲,其主要是解决 我们在日常生活中 可能看到一只狗,确分不清是哪种狗。

如下图所示,我们知道下图中哪一只是阿拉斯加 哪一只是哈士奇,左边是哈士奇 右边是阿拉斯加

这里可以当做可判别性部分是 阿拉斯加犬的鼻梁是与黑色毛色是相连的,这就是discriminative part 即可判别性模块。

三、图像细粒度分类目前的挑战

然后再讲讲细粒度分类实验目前所遇到的挑战吧。

细粒度分类的挑战,但如今面临着如下三大问题:类内差异大类间差异小以及有限的数据集

由于光线,物体的姿势,视角、遮挡、背景干扰等等问题,

类内差异大,像这里的黑脚信天翁,由于光照,背景,姿势的干扰,从肉眼上很难看出属于同一个子类

类间差异小 不同个体归属于不同子类可能是由于一些微小的不同,如鸟的翅膀的颜色 以及鸟喙颜色的不同

以及有限的数据集的问题,数据集的标注 通常需要专业的知识以及耗费大量的标注时间。

由于上述挑战问题,我们很难根据现有的粗粒度神经网络模型得到精准的分类结果。

四、图像细粒度分类的研究现状

那么目前的研究现状是如何呢?

目前细粒度分类主要是通过寻找可判别性的特征来进行分类的,研究方法目前主要是可分为强监督学习弱监督学习

强监督学习是指通常使用边界框和局部标注信息,来获取目标的位置、大小,从而提高分类精度。 即给出了图片标注中物体的某些显著特征,即discriminative

弱监督学习是指仅利用图像的类别标注信息,不使用额外的标注,

目前弱监督学习的主要思路是定位出判别性的部位,取得判别性的特征做辅助来分类

其实这很符合人类辨别细粒度物体的流程,先看全局信息知道大类,然后根据经验把注意力放在一些关键部位来做出判断,而这些部位就是弱监督网络所要找的discriminative parts

目前的强监督学习方法有part-based r-cnn基于r-cnn算法完成了局部区域的检测,利用约束条件对r-cnn提取到的区域信息进行修正之后提取卷积特征,并将不同区域的特征进行拼接,构成最好的特征表示,然后通过SVM分类器进行分类训练。 Posed-normalized Cnn对每一张图片进行位置检测,然后将检测框内的图像进行裁剪,从而提取不同层次、不同位置的图像,再对提取到的图像块进行姿态对其送入CNN,将得到的特征拼接后利用SVM分类器进行分类。 Multi-proposal Net通过Edge Box Crop方法获取图像块,并引入关键点及视觉特征的输出层,进一步强化了局部特征与全部信息直接的位置关联。

弱监督方法,有图像过滤,仅借助于图像的类别信息过滤图片中与物体无关的模块,其中最有代表性的是Two-level算法。two attention level利用物体级和局部级的信息,通过Search Selective算法过滤掉无关背景,然后将过滤掉的背景送入CNN网络进行训练,得到物体级的分类结果,随后通过聚类算法将不同位置的特征继续区分,并将不同区域的特征拼接后送入svm分类器进行训练。

人在认知物体和事物时,往往需要完成对其特征的理解及类别名称的记忆,B-CNN根据大脑工作时同认知类别和关注显著特征的方法,构建了两个线性网络,协调完成局部特征提取和分类的任务。

到这里,前期的基础知识差不多就完成了,下面准备进入正题。

五、200类鸟类细粒度图像分类实战

1.CUB200-2011数据集

首先还是一如既往先介绍我们的驱动力----数据。

不对,放错图了,应该是下面这张。

本次细粒度分类所采取的数据集CUB200-2011,该数据集是由加州理工学院在2010年提出的细粒度数据集,也是目前细粒度分类识别研究的基准图像数据集,该数据集共有117888张鸟类图像,包含了200类鸟类子类,其中训练数据集有5994张图像,测试集有5794张图像,每张图像均提供了图像类标注信息,图像中鸟的bounding box,鸟的关键part信息,以及鸟的属性信息。

评判标准就是以准确率了。

好了,准备上模型了!

2.VGG16模型

先用VGG16来投石问路

在此之前准备好我们的微调模型

# fine-tune 模型def fine_tune_model(model, optimizer, batch_size, epochs, freeze_num):    '''    discription: 对指定预训练模型进行fine-tune,并保存为.hdf5格式        MODEL:传入的模型,VGG16, ResNet50, ...    optimizer: fine-tune all layers 的优化器, first part默认用adadelta    batch_size: 每一批的尺寸,建议32/64/128    epochs: fine-tune all layers的代数    freeze_num: first part冻结卷积层的数量    '''    # datagen = ImageDataGenerator(    #     rescale=1.255,    #     # shear_range=0.2,    #     # zoom_range=0.2,    #     # horizontal_flip=True,    #     # vertical_flip=True,    #     # fill_mode="nearest"    #   )        # datagen.fit(X_train)            # first: 仅训练全连接层(权重随机初始化的)    # 冻结所有卷积层        for layer in model.layers[:freeze_num]:        layer.trainable = False        model.compile(optimizer=optimizer,                   loss="categorical_crossentropy",                  metrics=["accuracy"])    # model.fit_generator(datagen.flow(x_train,y_train,batch_size=batch_size),    #                     steps_per_epoch=len(x_train)/32,    #                     epochs=3,    #                     shuffle=True,    #                     verbose=1,    #                     datagen.flow(x_valid, y_valid))    model.fit(x_train,         y_train,         batch_size=batch_size,         epochs=3,         shuffle=True,         verbose=1,         validation_data=(x_valid,y_valid)        )    print('Finish step_1')            # second: fine-tune all layers    for layer in model.layers[:]:        layer.trainable = True        rc = ReduceLROnPlateau(monitor="val_acc",                factor=0.2,                patience=4,                verbose=1,                mode='max')    model_name = model.name  + ".hdf5"    mc = ModelCheckpoint(model_name,                monitor="val_acc",                save_best_only=True,               verbose=1,               mode='max')    el = EarlyStopping(monitor="val_acc",              min_delta=0,              patience=5,              verbose=1,              restore_best_weights=True)        model.compile(optimizer=optimizer,            loss='categorical_crossentropy',            metrics=["accuracy"])    # history_fit = model.fit_generator(datagen.flow(x_train,y_train,batch_size=32),    #                                  steps_per_epoch=len(x_train)/32,    #                                  epochs=epochs,    #                                  shuffle=True,    #                                  verbose=1,    #                                  callbacks=[mc,rc,el],    #                                  datagen.flow(x_valid, y_valid))    history_fit = model.fit(x_train,                 y_train,                 batch_size=batch_size,                 epochs=epochs,                 shuffle=True,                 verbose=1,                 validation_data=(x_valid,y_valid),                 callbacks=[mc,rc,el])        print('Finish fine-tune')    return history_fit

1.VGG16模型

# 定义一个VGG16的模型def vgg16_model(img_rows,img_cols):  x = Input(shape=(img_rows, img_cols, 3))  x = Lambda(imagenet_utils.preprocess_input)(x)  base_model = VGG16(input_tensor=x,weights="imagenet",include_top=False, pooling='avg')  x = base_model.output  x = Dense(1024,activation="relu",name="fc1")(x)  x = Dropout(0.5)(x)  predictions = Dense(n_classes,activation="softmax",name="predictions")(x)  vgg16_model = Model(inputs=base_model.input,outputs=predictions,name="vgg16")    return vgg16_model
# 创建VGG16模型img_rows, img_cols = 300, 300vgg16_model = vgg16_model(img_rows,img_cols)
for i,layer in enumerate(vgg16_model.layers):  print(i,layer.name)
0 input_31 lambda_32 block1_conv13 block1_conv24 block1_pool5 block2_conv16 block2_conv27 block2_pool8 block3_conv19 block3_conv210 block3_conv311 block3_pool12 block4_conv113 block4_conv214 block4_conv315 block4_pool16 block5_conv117 block5_conv218 block5_conv319 block5_pool20 global_average_pooling2d_321 fc122 dropout_323 predictions
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 30freeze_num = 21%time vgg16_history = fine_tune_model(vgg16_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.    Train on 7013 samples, validate on 3006 samplesEpoch 1/37013/7013 [==============================] - 53s 8ms/step - loss: 7.1095 - acc: 0.0211 - val_loss: 4.4823 - val_acc: 0.0915Epoch 2/37013/7013 [==============================] - 46s 7ms/step - loss: 4.4798 - acc: 0.0914 - val_loss: 3.6892 - val_acc: 0.2239Epoch 3/37013/7013 [==============================] - 46s 7ms/step - loss: 3.6436 - acc: 0.1925 - val_loss: 3.0040 - val_acc: 0.3440Finish step_1Train on 7013 samples, validate on 3006 samplesEpoch 1/307013/7013 [==============================] - 47s 7ms/step - loss: 2.9171 - acc: 0.3087 - val_loss: 2.1821 - val_acc: 0.4667Epoch 00001: val_loss improved from inf to 2.18212, saving model to vgg16.hdf5Epoch 2/307013/7013 [==============================] - 46s 7ms/step - loss: 1.9944 - acc: 0.4840 - val_loss: 1.8748 - val_acc: 0.5226Epoch 00002: val_loss improved from 2.18212 to 1.87480, saving model to vgg16.hdf5Epoch 3/307013/7013 [==============================] - 46s 7ms/step - loss: 1.6493 - acc: 0.5551 - val_loss: 1.7540 - val_acc: 0.5492Epoch 00003: val_loss improved from 1.87480 to 1.75400, saving model to vgg16.hdf5Epoch 4/307013/7013 [==============================] - 46s 7ms/step - loss: 1.4144 - acc: 0.6144 - val_loss: 1.6711 - val_acc: 0.5655Epoch 00004: val_loss improved from 1.75400 to 1.67106, saving model to vgg16.hdf5Epoch 5/307013/7013 [==============================] - 46s 7ms/step - loss: 1.2055 - acc: 0.6628 - val_loss: 1.6020 - val_acc: 0.5749Epoch 00005: val_loss improved from 1.67106 to 1.60200, saving model to vgg16.hdf5Epoch 00026: val_loss improved from 1.32242 to 1.32005, saving model to vgg16.hdf5Epoch 27/307013/7013 [==============================] - 46s 7ms/step - loss: 0.1979 - acc: 0.9511 - val_loss: 1.3209 - val_acc: 0.6517Epoch 00027: val_loss did not improve from 1.32005Epoch 28/307013/7013 [==============================] - 46s 7ms/step - loss: 0.1996 - acc: 0.9528 - val_loss: 1.3206 - val_acc: 0.6514Epoch 00028: val_loss did not improve from 1.32005Epoch 29/307013/7013 [==============================] - 46s 7ms/step - loss: 0.1956 - acc: 0.9555 - val_loss: 1.3216 - val_acc: 0.6517Epoch 00029: val_loss did not improve from 1.32005Epoch 00029: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.Epoch 30/307013/7013 [==============================] - 46s 7ms/step - loss: 0.1884 - acc: 0.9558 - val_loss: 1.3194 - val_acc: 0.6514Epoch 00030: val_loss improved from 1.32005 to 1.31935, saving model to vgg16.hdf5Finish fine-tuneCPU times: user 10min, sys: 3min 58s, total: 13min 58sWall time: 25min 37s
history_plot(vgg16_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HSs3LS6a-1581908404262)(output_38_0.png)]

进过上面的一系列操作,我们可以看到VGG16的分类效果,并不是很好呀,只能刚刚及格。

那么下面有请我们的二号选手EfficientNet

2.EfficientNetB4

咚咚咚,它来了,它来了,它踩着七彩祥云来了!!!

好了,不多说了,直接上代码来搭建EfficientNet网络架构。

# 定义一个EfficientNet模型def efficient_model(img_rows,img_cols):  K.clear_session()  x = Input(shape=(img_rows,img_cols,3))  x = Lambda(imagenet_utils.preprocess_input)(x)    base_model = EfficientNetB4(input_tensor=x,weights="imagenet",include_top=False,pooling="avg")  x = base_model.output  x = Dense(1024,activation="relu",name="fc1")(x)  x = Dropout(0.5)(x)  predictions = Dense(n_classes,activation="softmax",name="predictions")(x)  eB_model = Model(inputs=base_model.input,outputs=predictions,name="eB4")  return eB_model
# 创建Efficient模型img_rows,img_cols=224,224eB_model = efficient_model(img_rows,img_cols)
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 30freeze_num = 469eB_model_history  = fine_tune_model(eB_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samplesEpoch 1/38251/8251 [==============================] - 49s 6ms/step - loss: 9.3405 - acc: 0.0053 - val_loss: 5.5664 - val_acc: 0.0051Epoch 2/38251/8251 [==============================] - 38s 5ms/step - loss: 6.8968 - acc: 0.0052 - val_loss: 5.3289 - val_acc: 0.0040Epoch 3/38251/8251 [==============================] - 39s 5ms/step - loss: 5.8723 - acc: 0.0061 - val_loss: 5.3021 - val_acc: 0.0040Finish step_1Train on 8251 samples, validate on 1768 samplesEpoch 1/308251/8251 [==============================] - 261s 32ms/step - loss: 4.4794 - acc: 0.0980 - val_loss: 2.7448 - val_acc: 0.3399Epoch 00001: val_loss improved from inf to 2.74482, saving model to eB4.hdf5Epoch 2/308251/8251 [==============================] - 155s 19ms/step - loss: 2.2635 - acc: 0.4157 - val_loss: 1.4371 - val_acc: 0.5973Epoch 00002: val_loss improved from 2.74482 to 1.43707, saving model to eB4.hdf5Epoch 3/308251/8251 [==============================] - 155s 19ms/step - loss: 1.3465 - acc: 0.6244 - val_loss: 1.1637 - val_acc: 0.6719Epoch 00003: val_loss improved from 1.43707 to 1.16373, saving model to eB4.hdf5Epoch 4/308251/8251 [==============================] - 154s 19ms/step - loss: 0.8824 - acc: 0.7488 - val_loss: 0.9904 - val_acc: 0.7110Epoch 00016: val_loss did not improve from 0.89365Epoch 17/308251/8251 [==============================] - 154s 19ms/step - loss: 0.0718 - acc: 0.9867 - val_loss: 0.8993 - val_acc: 0.7749Epoch 00017: val_loss did not improve from 0.89365Restoring model weights from the end of the best epochEpoch 00017: early stoppingFinish fine-tune
history_plot(eB_model_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-buJaVium-1581908404264)(output_49_0.png)]

效果很不错呀,EfficientNet不愧是谷歌出品的,必是精品。那么既然EfficientNet的效果已经这么好了,你是不是就不想接着看了,你是不是已经迫不及待想尝试EfficientNet的效果了呢。

不要急,下面还有几个小尝试,首先是在EfficientNet中加入Attention机制,至于Attention机制的话,可以去看我的博客里面有写到,那是在我未解放天性之前,写的可正经了。当然这里更是正经!!!

3.efficientnet-with-attention

# 定义一个加入Attention模块的Efficient网络架构即efficientnet-with-attentiondef efficient_attention_model(img_rows,img_cols):  K.clear_session()    in_lay = Input(shape=(img_rows,img_cols,3))  base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)  pt_depth = base_model.get_output_shape_at(0)[-1]  pt_features = base_model(in_lay)  bn_features = BatchNormalization()(pt_features)  # here we do an attention mechanism to turn pixels in the GAP on an off  atten_layer = Conv2D(64,kernel_size=(1,1),padding="same",activation="relu")(Dropout(0.5)(bn_features))  atten_layer = Conv2D(16,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)  atten_layer = Conv2D(8,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)  atten_layer = Conv2D(1,kernel_size=(1,1),padding="valid",activation="sigmoid")(atten_layer)# H,W,1  # fan it out to all of the channels  up_c2_w = np.ones((1,1,1,pt_depth)) #1,1,C  up_c2 = Conv2D(pt_depth,kernel_size=(1,1),padding="same",activation="linear",use_bias=False,weights=[up_c2_w])  up_c2.trainable = False  atten_layer = up_c2(atten_layer)# H,W,C  mask_features = multiply([atten_layer,bn_features])# H,W,C  gap_features = GlobalAveragePooling2D()(mask_features)# 1,1,C  # gap_mask = GlobalAveragePooling2D()(atten_layer)# 1,1,C  # # to account for missing values from the attention model  # gap = Lambda(lambda x:x[0]/x[1],name="RescaleGAP")([gap_features,gap_mask])  gap_dr = Dropout(0.25)(gap_features)  dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))  out_layer = Dense(200,activation="softmax")(dr_steps)  eb_atten_model = Model(inputs=[in_lay],outputs=[out_layer])  return eb_atten_model
img_rows,img_cols = 224,224eB_atten_model = efficient_attention_model(img_rows,img_cols)
eB_atten_model.save("eb_atten_model.h5")
for i,layer in enumerate(eB_atten_model.layers):  print(i,layer.name)
0 input_11 efficientnet-b32 batch_normalization_13 dropout_14 conv2d_15 conv2d_26 conv2d_37 conv2d_48 conv2d_59 multiply_110 global_average_pooling2d_111 dropout_212 dense_113 dropout_314 dense_2
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 30freeze_num = 12eB_atten_model_history  = fine_tune_model(eB_atten_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.Train on 8251 samples, validate on 1768 samplesEpoch 1/38251/8251 [==============================] - 39s 5ms/step - loss: 5.2083 - acc: 0.0221 - val_loss: 16.0324 - val_acc: 0.0040Epoch 2/38251/8251 [==============================] - 28s 3ms/step - loss: 4.7719 - acc: 0.1130 - val_loss: 16.0147 - val_acc: 0.0057Epoch 3/38251/8251 [==============================] - 28s 3ms/step - loss: 4.3135 - acc: 0.2112 - val_loss: 16.0056 - val_acc: 0.0062Finish step_1Train on 8251 samples, validate on 1768 samplesEpoch 1/308251/8251 [==============================] - 168s 20ms/step - loss: 2.1612 - acc: 0.4549 - val_loss: 1.1888 - val_acc: 0.6725Epoch 00001: val_loss improved from inf to 1.18880, saving model to model_1.hdf5Epoch 2/308251/8251 [==============================] - 121s 15ms/step - loss: 0.9003 - acc: 0.7442 - val_loss: 0.9400 - val_acc: 0.7330Epoch 00002: val_loss improved from 1.18880 to 0.94002, saving model to model_1.hdf5Epoch 3/308251/8251 [==============================] - 121s 15ms/step - loss: 0.5455 - acc: 0.8467 - val_loss: 0.8569 - val_acc: 0.7574Epoch 00013: val_loss did not improve from 0.78748Epoch 14/308251/8251 [==============================] - 121s 15ms/step - loss: 0.0417 - acc: 0.9924 - val_loss: 0.7958 - val_acc: 0.7924Epoch 00014: val_loss did not improve from 0.78748Epoch 00014: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.Epoch 15/308251/8251 [==============================] - 121s 15ms/step - loss: 0.0370 - acc: 0.9936 - val_loss: 0.7938 - val_acc: 0.7941Epoch 00015: val_loss did not improve from 0.78748Epoch 16/308251/8251 [==============================] - 121s 15ms/step - loss: 0.0379 - acc: 0.9933 - val_loss: 0.7932 - val_acc: 0.7952Epoch 00016: val_loss did not improve from 0.78748Restoring model weights from the end of the best epochEpoch 00016: early stoppingFinish fine-tune
history_plot(eB_atten_model_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EekrkFVW-1581908404264)(output_59_0.png)]

效果还是提升了一点点的,下面是又尝试了另外一种attention的写法,用到senet和cbam。当然如果你不了解的话,还是老规矩,去看我的博客,卷积神经网络发展史里面有提到。

4.EfficientNetB3 with attention v2

from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense, multiply, Permute, Concatenate, Conv2D, Add, Activation, Lambdafrom keras import backend as Kfrom keras.activations import sigmoiddef attach_attention_module(net, attention_module):  if attention_module == 'se_block': # SE_block    net = se_block(net)  elif attention_module == 'cbam_block': # CBAM_block    net = cbam_block(net)  else:    raise Exception("'{}' is not supported attention module!".format(attention_module))  return netdef se_block(input_feature, ratio=8):	"""Contains the implementation of Squeeze-and-Excitation(SE) block.	As described in https://arxiv.org/abs/1709.01507.	"""		channel_axis = 1 if K.image_data_format() == "channels_first" else -1	channel = input_feature._keras_shape[channel_axis]	se_feature = GlobalAveragePooling2D()(input_feature)	se_feature = Reshape((1, 1, channel))(se_feature)	assert se_feature._keras_shape[1:] == (1,1,channel)	se_feature = Dense(channel // ratio,					   activation='relu',					   kernel_initializer='he_normal',					   use_bias=True,					   bias_initializer='zeros')(se_feature)	assert se_feature._keras_shape[1:] == (1,1,channel//ratio)	se_feature = Dense(channel,					   activation='sigmoid',					   kernel_initializer='he_normal',					   use_bias=True,					   bias_initializer='zeros')(se_feature)	assert se_feature._keras_shape[1:] == (1,1,channel)	if K.image_data_format() == 'channels_first':		se_feature = Permute((3, 1, 2))(se_feature)	se_feature = multiply([input_feature, se_feature])	return se_featuredef cbam_block(cbam_feature, ratio=8):	"""Contains the implementation of Convolutional Block Attention Module(CBAM) block.	As described in https://arxiv.org/abs/1807.06521.	"""		cbam_feature = channel_attention(cbam_feature, ratio)	cbam_feature = spatial_attention(cbam_feature)	return cbam_featuredef channel_attention(input_feature, ratio=8):		channel_axis = 1 if K.image_data_format() == "channels_first" else -1	channel = input_feature._keras_shape[channel_axis]		shared_layer_one = Dense(channel//ratio,							 activation='relu',							 kernel_initializer='he_normal',							 use_bias=True,							 bias_initializer='zeros')	shared_layer_two = Dense(channel,							 kernel_initializer='he_normal',							 use_bias=True,							 bias_initializer='zeros')		avg_pool = GlobalAveragePooling2D()(input_feature)    	avg_pool = Reshape((1,1,channel))(avg_pool)	assert avg_pool._keras_shape[1:] == (1,1,channel)	avg_pool = shared_layer_one(avg_pool)	assert avg_pool._keras_shape[1:] == (1,1,channel//ratio)	avg_pool = shared_layer_two(avg_pool)	assert avg_pool._keras_shape[1:] == (1,1,channel)		max_pool = GlobalMaxPooling2D()(input_feature)	max_pool = Reshape((1,1,channel))(max_pool)	assert max_pool._keras_shape[1:] == (1,1,channel)	max_pool = shared_layer_one(max_pool)	assert max_pool._keras_shape[1:] == (1,1,channel//ratio)	max_pool = shared_layer_two(max_pool)	assert max_pool._keras_shape[1:] == (1,1,channel)		cbam_feature = Add()([avg_pool,max_pool])	cbam_feature = Activation('sigmoid')(cbam_feature)		if K.image_data_format() == "channels_first":		cbam_feature = Permute((3, 1, 2))(cbam_feature)		return multiply([input_feature, cbam_feature])def spatial_attention(input_feature):	kernel_size = 7		if K.image_data_format() == "channels_first":		channel = input_feature._keras_shape[1]		cbam_feature = Permute((2,3,1))(input_feature)	else:		channel = input_feature._keras_shape[-1]		cbam_feature = input_feature		avg_pool = Lambda(lambda x: K.mean(x, axis=3, keepdims=True))(cbam_feature)	assert avg_pool._keras_shape[-1] == 1	max_pool = Lambda(lambda x: K.max(x, axis=3, keepdims=True))(cbam_feature)	assert max_pool._keras_shape[-1] == 1	concat = Concatenate(axis=3)([avg_pool, max_pool])	assert concat._keras_shape[-1] == 2	cbam_feature = Conv2D(filters = 1,					kernel_size=kernel_size,					strides=1,					padding='same',					activation='sigmoid',					kernel_initializer='he_normal',					use_bias=False)(concat)		assert cbam_feature._keras_shape[-1] == 1		if K.image_data_format() == "channels_first":		cbam_feature = Permute((3, 1, 2))(cbam_feature)			return multiply([input_feature, cbam_feature])
# 定义一个EfficientNet模型def efficient__atten2_model(img_rows,img_cols):  K.clear_session()    in_lay = Input(shape=(img_rows,img_cols,3))  base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)  pt_features = base_model(in_lay)  bn_features = BatchNormalization()(pt_features)  atten_features = attach_attention_module(bn_features,"se_block")  gap_features = GlobalAveragePooling2D()(atten_features)  gap_dr = Dropout(0.25)(gap_features)  dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))  out_layer = Dense(n_classes,activation="softmax")(dr_steps)  eb_atten_model = Model(inputs=[in_lay],outputs=[out_layer])  return eb_atten_model
img_rows,img_cols = 224,224eB_atten2_model = efficient__atten2_model(img_rows,img_cols)
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 30freeze_num = 19eB_atten2_model_history  = fine_tune_model(eB_atten2_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samplesEpoch 1/38251/8251 [==============================] - 33s 4ms/step - loss: 5.3202 - acc: 0.0061 - val_loss: 16.0269 - val_acc: 0.0057Epoch 2/38251/8251 [==============================] - 26s 3ms/step - loss: 5.3261 - acc: 0.0051 - val_loss: 16.0269 - val_acc: 0.0057Epoch 3/38251/8251 [==============================] - 26s 3ms/step - loss: 5.3248 - acc: 0.0048 - val_loss: 16.0269 - val_acc: 0.0057Finish step_1Train on 8251 samples, validate on 1768 samplesEpoch 1/308251/8251 [==============================] - 153s 19ms/step - loss: 3.9559 - acc: 0.1742 - val_loss: 2.1066 - val_acc: 0.4712Epoch 00001: val_loss improved from inf to 2.10657, saving model to model_1.hdf5Epoch 2/308251/8251 [==============================] - 119s 14ms/step - loss: 1.6183 - acc: 0.5708 - val_loss: 1.1768 - val_acc: 0.6618Epoch 00002: val_loss improved from 2.10657 to 1.17679, saving model to model_1.hdf5Epoch 3/308251/8251 [==============================] - 119s 14ms/step - loss: 0.9172 - acc: 0.7374 - val_loss: 0.9507 - val_acc: 0.7189Epoch 00003: val_loss improved from 1.17679 to 0.95071, saving model to model_1.hdf5Epoch 4/308251/8251 [==============================] - 119s 14ms/step - loss: 0.5897 - acc: 0.8317 - val_loss: 0.8628 - val_acc: 0.7562Epoch 00004: val_loss improved from 0.95071 to 0.86283, saving model to model_1.hdf5Epoch 5/308251/8251 [==============================] - 119s 14ms/step - loss: 0.3838 - acc: 0.8956 - val_loss: 0.8359 - val_acc: 0.7636Epoch 00005: val_loss improved from 0.86283 to 0.83592, saving model to model_1.hdf5Epoch 6/308251/8251 [==============================] - 119s 14ms/step - loss: 0.2797 - acc: 0.9234 - val_loss: 0.8280 - val_acc: 0.7647Epoch 00006: val_loss improved from 0.83592 to 0.82797, saving model to model_1.hdf5Epoch 7/308251/8251 [==============================] - 119s 14ms/step - loss: 0.1997 - acc: 0.9495 - val_loss: 0.8620 - val_acc: 0.7602Epoch 00007: val_loss did not improve from 0.82797Epoch 8/308251/8251 [==============================] - 119s 14ms/step - loss: 0.1408 - acc: 0.9667 - val_loss: 0.8602 - val_acc: 0.7800Epoch 00008: val_loss did not improve from 0.82797Epoch 9/308251/8251 [==============================] - 119s 14ms/step - loss: 0.1103 - acc: 0.9739 - val_loss: 0.9202 - val_acc: 0.7545Epoch 00009: val_loss did not improve from 0.82797Epoch 00009: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.Epoch 10/308251/8251 [==============================] - 119s 14ms/step - loss: 0.0803 - acc: 0.9824 - val_loss: 0.8677 - val_acc: 0.7709Epoch 00010: val_loss did not improve from 0.82797Epoch 11/308251/8251 [==============================] - 119s 14ms/step - loss: 0.0772 - acc: 0.9833 - val_loss: 0.8560 - val_acc: 0.7771Epoch 00011: val_loss did not improve from 0.82797Restoring model weights from the end of the best epochEpoch 00011: early stoppingFinish fine-tune
history_plot(eB_atten2_model_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aWfnWcdT-1581908404269)(output_67_0.png)]

咱也不知道为什么,这个效果比上面那个attention的写法,会提升一点点,这就是炼丹吧。

5.双线性EfficientNet

下面就是尝试了一种双线性的网络架构,这里我还画了图呢!!!

该模型的整体流程是:

  • 将图片输入,并对输入图片进行数据增强操作;
  • 之后主干网络用19年google提出的efficientnet架构来提取feature maps;
  • 之后再结合注意力模块,提取attention maps。
  • 然后将attention maps与feature maps逐一相乘,最后在加入全连接层进行分类,从而得到最终的分类结果。

这里也一并给出attention机制的图吧!

# 定义一个双线性EfficientNet Attention模型def blinear_efficient__atten_model(img_rows,img_cols):  K.clear_session()    in_lay = Input(shape=(img_rows,img_cols,3))  base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)    pt_depth = base_model.get_output_shape_at(0)[-1]  cnn_features_a = base_model(in_lay)  cnn_bn_features_a = BatchNormalization()(cnn_features_a)    # attention mechanism  # here we do an attention mechanism to turn pixels in the GAP on an off  atten_layer = Conv2D(64,kernel_size=(1,1),padding="same",activation="relu")(Dropout(0.5)(cnn_bn_features_a))  atten_layer = Conv2D(16,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)  atten_layer = Conv2D(8,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)  atten_layer = Conv2D(1,kernel_size=(1,1),padding="valid",activation="sigmoid")(atten_layer)# H,W,1  # fan it out to all of the channels  up_c2_w = np.ones((1,1,1,pt_depth)) #1,1,C  up_c2 = Conv2D(pt_depth,kernel_size=(1,1),padding="same",activation="linear",use_bias=False,weights=[up_c2_w])  up_c2.trainable = True  atten_layer = up_c2(atten_layer)# H,W,C  cnn_atten_out_a = multiply([atten_layer,cnn_bn_features_a])# H,W,C  cnn_atten_out_b = cnn_atten_out_a  cnn_out_dot = multiply([cnn_atten_out_a,cnn_atten_out_b])  gap_features = GlobalAveragePooling2D()(cnn_out_dot)  gap_dr = Dropout(0.25)(gap_features)  dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))  out_layer = Dense(200,activation="softmax")(dr_steps)    b_eff_atten_model = Model(inputs=[in_lay],outputs=[out_layer],name="blinear_efficient_atten")  return b_eff_atten_model
# 创建双线性EfficientNet Attention模型img_rows,img_cols = 256,256befficient_model = blinear_efficient__atten_model(img_rows,img_cols)
befficient_model.save("befficient_model.h5")
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 30freeze_num = 19befficient_model_history  = fine_tune_model(befficient_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samplesEpoch 1/38251/8251 [==============================] - 38s 5ms/step - loss: 5.3903 - acc: 0.0052 - val_loss: 14.1897 - val_acc: 0.0040Epoch 2/38251/8251 [==============================] - 33s 4ms/step - loss: 5.3926 - acc: 0.0052 - val_loss: 14.1897 - val_acc: 0.0040Epoch 3/38251/8251 [==============================] - 33s 4ms/step - loss: 5.3948 - acc: 0.0068 - val_loss: 14.1897 - val_acc: 0.0040Finish step_1Train on 8251 samples, validate on 1768 samplesEpoch 1/308251/8251 [==============================] - 193s 23ms/step - loss: 4.7127 - acc: 0.0749 - val_loss: 2.9079 - val_acc: 0.3060Epoch 00001: val_acc improved from -inf to 0.30600, saving model to blinear_efficient_atten.hdf5Epoch 2/308251/8251 [==============================] - 148s 18ms/step - loss: 2.1653 - acc: 0.4462 - val_loss: 1.3817 - val_acc: 0.6160Epoch 00002: val_acc improved from 0.30600 to 0.61595, saving model to blinear_efficient_atten.hdf5Epoch 3/308251/8251 [==============================] - 149s 18ms/step - loss: 1.1834 - acc: 0.6676 - val_loss: 1.0714 - val_acc: 0.7002Epoch 00003: val_acc improved from 0.61595 to 0.70023, saving model to blinear_efficient_atten.hdf5Epoch 4/308251/8251 [==============================] - 149s 18ms/step - loss: 0.8070 - acc: 0.7666 - val_loss: 0.9743 - val_acc: 0.7342Epoch 00004: val_acc improved from 0.70023 to 0.73416, saving model to blinear_efficient_atten.hdf5Epoch 5/30Epoch 00007: val_acc improved from 0.74830 to 0.75735, saving model to blinear_efficient_atten.hdf5Epoch 00010: val_acc did not improve from 0.76867Epoch 11/308251/8251 [==============================] - 149s 18ms/step - loss: 0.1421 - acc: 0.9547 - val_loss: 1.1319 - val_acc: 0.7692Epoch 00011: val_acc improved from 0.76867 to 0.76923, saving model to blinear_efficient_atten.hdf5Epoch 12/308251/8251 [==============================] - 149s 18ms/step - loss: 0.1232 - acc: 0.9622 - val_loss: 1.0809 - val_acc: 0.7704Epoch 00018: val_acc improved from 0.77489 to 0.78224, saving model to blinear_efficient_atten.hdf5Epoch 19/308251/8251 [==============================] - 149s 18ms/step - loss: 0.0880 - acc: 0.9714 - val_loss: 1.2171 - val_acc: 0.7721Epoch 00022: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.Epoch 23/308251/8251 [==============================] - 148s 18ms/step - loss: 0.0465 - acc: 0.9859 - val_loss: 1.1591 - val_acc: 0.7930Epoch 00023: val_acc improved from 0.78224 to 0.79299, saving model to blinear_efficient_atten.hdf5Epoch 24/308251/8251 [==============================] - 148s 18ms/step - loss: 0.0360 - acc: 0.9893 - val_loss: 1.1312 - val_acc: 0.7969Epoch 00024: val_acc improved from 0.79299 to 0.79695, saving model to blinear_efficient_atten.hdf5Epoch 25/308251/8251 [==============================] - 148s 18ms/step - loss: 0.0275 - acc: 0.9920 - val_loss: 1.1477 - val_acc: 0.8015Epoch 00028: val_acc did not improve from 0.80147Epoch 29/308251/8251 [==============================] - 148s 18ms/step - loss: 0.0248 - acc: 0.9922 - val_loss: 1.1467 - val_acc: 0.8020Epoch 00029: val_acc improved from 0.80147 to 0.80204, saving model to blinear_efficient_atten.hdf5Epoch 30/308251/8251 [==============================] - 148s 18ms/step - loss: 0.0232 - acc: 0.9919 - val_loss: 1.1427 - val_acc: 0.8003Epoch 00030: val_acc did not improve from 0.80204Finish fine-tune
history_plot(befficient_model_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Xa6r4yc0-1581908404271)(output_78_0.png)]

可以从图中看到,双线性的结构,准确率还会提升一些。

终于来到故事的结尾处了,最后在尝试一些双线性的VGG16。

6.双线性VGG16模型

# 定义双线性VGG16模型from keras import backend as Kdef batch_dot(cnn_ab):    return K.batch_dot(cnn_ab[0], cnn_ab[1], axes=[1, 1])def sign_sqrt(x):    return K.sign(x) * K.sqrt(K.abs(x) + 1e-10)def l2_norm(x):    return K.l2_normalize(x, axis=-1)  def bilinear_vgg16(img_rows,img_cols):    input_tensor = Input(shape=(img_rows,img_cols,3))    input_tensor = Lambda(imagenet_utils.preprocess_input)(input_tensor)    model_vgg16 = VGG16(include_top=False, weights="imagenet",                        input_tensor=input_tensor,pooling="avg")        cnn_out_a = model_vgg16.layers[-2].output    cnn_out_shape = model_vgg16.layers[-2].output_shape    cnn_out_a = Reshape([cnn_out_shape[1]*cnn_out_shape[2],                         cnn_out_shape[-1]])(cnn_out_a)    cnn_out_b = cnn_out_a    cnn_out_dot = Lambda(batch_dot)([cnn_out_a, cnn_out_b])    cnn_out_dot = Reshape([cnn_out_shape[-1]*cnn_out_shape[-1]])(cnn_out_dot)     sign_sqrt_out = Lambda(sign_sqrt)(cnn_out_dot)    l2_norm_out = Lambda(l2_norm)(sign_sqrt_out)        fc1 = Dense(1024,activation="relu",name="fc1")(l2_norm_out)    dropout = Dropout(0.5)(fc1)    output = Dense(n_classes, activation="softmax",name="output")(dropout)    bvgg16_model = Model(inputs=model_vgg16.input, outputs=output,name="bvgg16")    return bvgg16_model
# 创建双线性VGG16模型img_rows,img_cols = 300,300bvgg16_model = bilinear_vgg16(img_rows,img_cols)
for i,layer in enumerate(bvgg16_model.layers):  print(i,layer.name)
0 input_11 lambda_12 block1_conv13 block1_conv24 block1_pool5 block2_conv16 block2_conv27 block2_pool8 block3_conv19 block3_conv210 block3_conv311 block3_pool12 block4_conv113 block4_conv214 block4_conv315 block4_pool16 block5_conv117 block5_conv218 block5_conv319 block5_pool20 reshape_121 lambda_222 reshape_223 lambda_324 lambda_425 fc126 dropout_127 output
optimizer = optimizers.Adam(lr=0.0001)batch_size = 32epochs = 100freeze_num = 25bvgg16_history = fine_tune_model(bvgg16_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.Train on 8251 samples, validate on 1768 samplesEpoch 1/38251/8251 [==============================] - 80s 10ms/step - loss: 5.1197 - acc: 0.0572 - val_loss: 4.8534 - val_acc: 0.2002Epoch 2/38251/8251 [==============================] - 71s 9ms/step - loss: 4.4758 - acc: 0.1863 - val_loss: 4.1177 - val_acc: 0.3569Epoch 3/38251/8251 [==============================] - 71s 9ms/step - loss: 3.7386 - acc: 0.2743 - val_loss: 3.4439 - val_acc: 0.4378Finish step_1Train on 8251 samples, validate on 1768 samplesEpoch 1/1008251/8251 [==============================] - 76s 9ms/step - loss: 2.9186 - acc: 0.3475 - val_loss: 2.5064 - val_acc: 0.5334Epoch 00001: val_loss improved from inf to 2.50638, saving model to bvgg16.hdf5Epoch 2/1008251/8251 [==============================] - 70s 9ms/step - loss: 2.3073 - acc: 0.4696 - val_loss: 2.1717 - val_acc: 0.5888Epoch 00002: val_loss improved from 2.50638 to 2.17170, saving model to bvgg16.hdf5Epoch 3/1008251/8251 [==============================] - 70s 9ms/step - loss: 2.0086 - acc: 0.5355 - val_loss: 1.9604 - val_acc: 0.6222Epoch 00067: val_loss did not improve from 0.89483Epoch 68/1008251/8251 [==============================] - 71s 9ms/step - loss: 0.0539 - acc: 0.9971 - val_loss: 0.8984 - val_acc: 0.7590Epoch 00068: val_loss did not improve from 0.89483Epoch 00068: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.Epoch 69/1008251/8251 [==============================] - 71s 9ms/step - loss: 0.0536 - acc: 0.9972 - val_loss: 0.8972 - val_acc: 0.7602Epoch 00069: val_loss did not improve from 0.89483Epoch 70/1008251/8251 [==============================] - 71s 9ms/step - loss: 0.0517 - acc: 0.9973 - val_loss: 0.8968 - val_acc: 0.7630Epoch 00070: val_loss did not improve from 0.89483Restoring model weights from the end of the best epochEpoch 00070: early stoppingFinish fine-tune
history_plot(bvgg16_history)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0ANGmffL-1581908404271)(output_87_0.png)]

终于,完成了。至于效果的话,大家就看图感受吧。效果肯定是不如EfficientNet了。

当然大家可以调调图片的分辨率,学习率,batch_size等等,好好练丹吧!

转载地址:https://codingchaozhang.blog.csdn.net/article/details/104354525 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:【实战】英文垃圾短信分类
下一篇:[YoLoV3目标检测实战] keras+yolov3训练自身口罩检测数据集

发表评论

最新留言

路过按个爪印,很不错,赞一个!
[***.219.124.196]2024年04月05日 17时57分45秒