深度模型剪枝问题-白红宇的个人博客

深度模型剪枝问题

发布日期：2021-05-14 15:19:54 浏览次数：23 分类：精选文章

本文共 2038 字，大约阅读时间需要 6 分钟。

$深度模型剪枝问题$

Abstract:

The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20x reduction in model size and a 5x reduction in computing operations.

在这里插入图片描述

剪枝的整体流程：训练-剪枝-再训练

在这里插入图片描述

pruneI(剪枝)

对特征图进行排序，再进行删选，选择好的特征图，即加载自己选择的特征图，就是剪枝

原理是什么呢？

卷积后能得到多个特征图，这些图不一定都重要

训练模型的时候可以加入一些策略，让权重参数体现出主次之分

如何分清权重参数的重要性

Network slimming，就是利用BN层中的缩放因子γ

BN层：去均值，再除以标准差，再引入两个可训练的参数：`γ`和`β`

在这里插入图片描述

在模型训练的时候，特别是深层模型，如果每一次输入的数据的分布都在变化，很可能会导致模型越来越难收敛，而且会过拟合，为什么会过拟合呢？其实直观上还是很容易理解的，都在变化，不加约束，那所涉及的范围就广了，本来是条直线就能解决的问题，现在变成了曲线，这样模型就过拟合了，所以加入BN层就是对每层的学习结果进行限制

还有为什么说BN层能缓解梯度弥散，或者梯度消失

看下面这张图：

在这里插入图片描述

BN层将输出压缩到[-1，1]之间，就是让有明显的梯度值，从而达到了缓解梯度弥散，或者梯度消失的问题

所以BN层一般加在Conv和relu层之间，就是让卷积的输出的结果控制在[0,1]之间，来缓解梯度消失的问题

BN把越来越偏离的分布给他拉回来，重新规范化到均值为0方差为1的标准正态分布，这样能够使得激活函数在数值层面更敏感，训练更快。

BN另一方面还需要保证一些非线性，对规范化后的结果再进行变换

在这里插入图片描述

这两个参数是训练得到的

L1和L2正则化

论文中提出:训练时使用L1正则化能对参数进行稀疏作用

L1:稀疏与特征选择;

L2:平滑特征

在这里插入图片描述

论文核心点

以BN中的γ为切入点，即v越小，其对应的特征图越不重要

为了使得能有特征选择的作用，引入L1正则来控制γ

上一篇：关于使用TensorRT进行深度模型加速的一些问题

下一篇：一些面试的准备的回答

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！