MyDLNote - Networks: [2017CVPR] Dilated Residual Networks
发布日期:2021-06-23 19:02:57 浏览次数:9 分类:技术文章

本文共 9060 字,大约阅读时间需要 30 分钟。

Dilated Residual Networks

我的博客尽可能提取文章内的主要传达的信息,既不是完全翻译,也不简单粗略。论文的motivation网络设计细节,将是我写这些博客关注的重点。

[arxiv] 

Table of Contents


摘要

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible. Such loss of spatial acuity can limit image classification accuracy and complicate the transfer of the model to downstream applications that require detailed scene understanding.

传统CNN图像分类会将feature的维度降维到很低,失去空间辨别能力。用dilated卷积可以解决这个问题,因为它可以保持receptive fileds(感受野)。但dilated卷积会引入gridding问题。本文提出了一种degridding的方法。

 

Introduction

While convolutional networks have done well, the almost complete elimination of spatial acuity may be preventing these models from achieving even higher accuracy, for example by preserving the contribution of small and thin objects that may be important for correctly understanding the image.

有些时候,小物体的识别可能会提高对图像的准确理解,这种情况下,传统CNN的方法并不适用了,因为CNN方法需要将分辨率降维到很低才能学习到high-level的特征。对于细节保存,手写字识别任务可能不那么重要,但在更复杂的图像场景中,可能是有用的。

Image classification is most  often a proxy task that is used to pretrain a model before it is transferred to other applications that involve more detailed scene understanding [4, 10]. In such tasks, severe loss of spatial acuity is a significant handicap. Existing techniques compensate for the lost resolution by introducing up-convolutions [10, 11], skip connections [5], and other post-hoc measures.

另外,图像分类任务也是其他语义理解任务的‘代理任务’,即图像分类通常用于场景理解任务中的预训练模型。这时,空间精度就很重要。一般,可以通过去卷积、跳接或者其他后处理方法来实现细小结构的保持。

Must convolutional networks crush the image in order to classify it? In this paper, we show that this is not necessary, or even desirable.

分类网络一定要降维到很低吗?本文认为不是必须的,甚至还能更好些。

The output resolution of a DRN on typical ImageNet input is 28×28, comparable to small thumbnails that convey the structure of the image when examined by a human [15]. While it may not be clear a priori that average pooling can properly handle such high-resolution output, we show that it can, yielding a notable accuracy gain.

本文提出的DRN输出特征图的大小是28x28,可以达到人眼识别结构的分辨率。虽然还没有先验知识证明average pooling可以处理这么大分辨率的特征图,但我们的结果显示是可以的。

We also show that DRNs yield improved accuracy on downstream applications such as weakly-supervised object localization and semantic segmentation.

本文也将DRNs引入到语义分割任务中,得到很好的效果。

 

Dilated Residual Networks

Natural images often feature many objects whose identities and relative configurations are important for understanding the scene. The classification task becomes difficult when a key object is not spatially dominant – for example, when the labeled object is thin (e.g., a tripod) or when there is a big background object such as a mountain. In these cases, the background response may suppress the signal from the object of interest. What’s worse, if the object’s signal is lost due to downsampling, there is little hope to recover it during training. However, if we retain high spatial resolution throughout the model and provide output signals that densely cover the input field, backpropagation can learn to preserve important information about smaller and less salient objects.

对自然图像的理解,每个物体的身份及相关配置非常重要。CNN的方法在这个情况就很局限,比如被标记的物体很细(三脚架的腿),或者背景太大掩盖了要识别物体的注意力。一旦要识别的物体在下采样中丢失了,后续再怎么训练都不可能再被识别出来。如果整个网络一直保持高分辨率,并且输出能够紧密地覆盖输入图像是视野,反向传播是能够学习到更细小的物体。

The starting point of our construction is the set of network architectures presented by He et al. [Deep residual learning for image recognition].

A naive approach to increasing resolution in higher layers of the network would be to simply remove subsampling (striding) from some of the interior layers. This does increase downstream resolution, but has a detrimental side effect that negates the benefits: removing subsampling correspondingly reduces the receptive field in subsequent layers. Thus removing striding such that the resolution of the output layer is increased by a factor of 4 also reduces the receptive field of each output unit by a factor of 4. This severely reduces the amount of context that can inform the prediction produced by each unit. Since contextual information is important in disambiguating local cues [3], such reduction in receptive field is an unacceptable price to pay for higher resolution.

DRN是建立在标准ResNet上进行改进。标准ResNet包含5组residual卷积block。DRN在第4、5组进行修改。

要保持输出的分辨率,最简单的方法就是把第4、5组的stride去掉。但这样做会降低感知域:第5层的分辨率比之前大4倍,感知域比之前降低4倍。这大大减少了上下文的数量。由于上下文信息对于消除局部线索[3]的歧义很重要,因此接受域的这种减少对于更高的分辨率来说是不可接受的代价。

Dilated卷积恰恰可以在扩大感受野的同时,保持分辨率。

The first step in the conversion to DRN is to remove the striding in both G^4_1 and G^5_1. Note that the receptive field of each unit in G^4_1 remains unaffected: we just doubled the output resolution of G^4_1 without affecting the receptive field of its units. However, subsequent layers are all affected: their receptive fields have been reduced by a factor of 2 in each dimension. We therefore replace the convolution operators in those layers by 2-dilated convolutions

DRN的方法就是把原始ResNet第4组中的第一层卷积中的stride=1,而后续的几层卷积dilation=2;第5组中的第一层卷积中的stride=1,而后续的几层卷积dilation=4。

The converted DRN has the same number of layers and parameters as the original ResNet. The key difference is that the original ResNet downsamples the input image by a factor of 32 in each dimension (a thousand-fold reduction in area), while the DRN downsamples the input by a factor of 8.

这样,DRN和原始ResNet层数和参数都一样。不同的是,ResNet输出的分辨率比输入降低32倍,而DRN值降低8倍。

Global average pooling therefore takes in 2^ 4(16) times more values, which can help the classifier recognize objects that cover a smaller number of pixels in the input image and take such objects into account in its prediction.

之后,DRN做global average pooling,这时的参数是ResNet的16倍。

The presented construction could also be applied to earlier groups of layers (G^1 , G^2 , or G^3 ), in the limit retaining the full resolution of the input. We chose not to do this because a downsampling factor of 8 is known to preserve most of the information necessary to correctly parse the original image at pixel level [10]. Furthermore, a 28×28 thumbnail, while small, is sufficiently resolved for humans to discern the structure of the scene [15]. Additional increase in resolution has costs and should not be pursued without commensurate gains: when feature map resolution is increased by a factor of 2 in each dimension, the memory consumption of that feature map increases by a factor of 4.

为什么不在第1、2、3组中做DRN呢?有两个原因。

1. 降采样8次后的分辨率已经够了,没必要再大;

2. 如果不用下采样,内存会消耗很大。

 

Localization

To obtain high-resolution class activation maps, we remove the global average pooling operator. We then connect the final 1×1 convolution directly to G^5 . A softmax is applied to each column in the resulting volume to convert the pixelwise prediction scores to proper probability distributions. The output of the resulting network is a set of activation maps that have the same spatial resolution as G^5 (28×28). Each classification category y has a corresponding activation map. For each pixel in this map, the map contains the probability that the object observed at this pixel is of category y.

这一节的内容主要说,如果最后不用global average polling,得到的是一张28x28的像素级分类。对于每一个类别category y,都有一个对应的28x28的激活图 activation maps。

 

Degridding

Gridding artifacts occur when a feature map has higher-frequency content than the sampling rate of the dilated convolution.

The input feature map has a single active pixel. A 2-dilated convolution induces a corresponding grid pattern in the output.

Gridding artifacts 通常发生在图像内容的高频信号高于dilation的比例(有点像香农定理)。如下图,本来有一个激活像素,通过膨胀卷积,得到了9个激活像素。

Degridding方法如下图。

An intermediate stage of the construction described in the present section is referred to as DRN-B. The final construction is referred to as DRN-C.

DRN-A表示第二节所讲的模型。DRN-B是中间过渡模型。DRN-C是最终采用的模型。具体解决方法包括以下几个措施:

1. Removing max pooling

We found that this max pooling operation leads to high-amplitude high-frequency activations. Such high-frequency activations can be propagated to later layers and ultimately exacerbate gridding artifacts. We thus replace max pooling by convolutional filters.

Max pooling 会产生高振幅高频激活,这种激活会传播到后面卷积层,加剧gridding artifacts。

2. Adding layers

To remove gridding artifacts, we add convolutional layers at the end of the network, with progressively lower dilation. Specifically, after the last 4-dilated layer in DRN-A, we add a 2-dilated residual block followed by a 1-dilated block. This is akin to removing aliasing artifacts using filters with appropriate frequency [Empirical filter estimation for subpixel interpolation and matching] (感觉有点像膨胀-腐蚀算法之间的那种互补措施)。

3. Removing residual connections

Adding layers with decreasing dilation, as described in the preceding paragraph, does not remove gridding artifacts entirely because of residual connections. The residual connections in levels 7 and 8 of DRN-B can propagate gridding artifacts from level 6. To remove gridding artifacts more effectively, we remove the residual connections in levels 7 and 8.

2 Adding layers 的措施并不能彻底消除gridding,因为residual连接。比如 DRN-B,第6层卷积直接传播到了第7、8两层。于是,DRN-C的第7、8层直接去掉residual连接。

 

Experiments

各种实验,这篇文章做的还是很细致的。这里先不细讲。等用到的时候,学习这篇文章的实验思路。

转载地址:https://blog.csdn.net/u014546828/article/details/101062870 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:MyDLNote - Network: [NL系列] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
下一篇:concatenation 和 element-wise summation 该选哪个?

发表评论

最新留言

表示我来过!
[***.240.166.169]2024年03月25日 21时41分25秒

关于作者

    喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!

推荐文章

MySQL常见的主从复制架构_如何搭建经典的MySQL 主从复制架构 2019-04-21
编写python程序、计算账户余额_小明有20w存款存在余额宝中,按余额宝年收益为3.35%计算,用Python编写程序计算,多少年后小明的存款达到30w?... 2019-04-21
python 公众号引流_公众号引流方法有哪些? 2019-04-21
java 减少内存_java中减少内存占用小技巧 2019-04-21
centos 7 mysql图形界面_centos7-vnstat图形界面搭建 2019-04-21
java 防渗透_「java、工程师工作经验怎么写」-看准网 2019-04-21
java中跳出当前循环怎么做_在java中,如何跳出当前的多重循环? 2019-04-21
java程序中执行maven_java – 将一个enviornment变量传递给Maven中的已执行进程 2019-04-21
java16下载_java lombok下载 2019-04-21
python 图像处理与识别书籍_Python图像处理之识别图像中的文字(实例讲解) 2019-04-21
java安全初始化_java安全编码指南之:声明和初始化 2019-04-21
java jstat gc_分析JVM GC及内存情况的方法 2019-04-21
php pclzip.lib.php,php使用pclzip类实现文件压缩的方法(附pclzip类下载地址) 2019-04-21
php dns更新,php_mzdns: 站群,大量域名 通过 dns 服务商 api 批量添加 ip 工具。你懂的~ 基于 mzphp2 框架。... 2019-04-21
jdk 1.8 java.policy,JDK1.8 导致系统报错:java.security.InvalidKeyException:illegal Key Size 2019-04-21
php linux权限,Linux权限详细介绍 2019-04-21
典型环节的matlab仿真分析,典型环节的MATLAB仿真.doc 2019-04-21
Php contenttype类型,各种类型文件的Content Type 2019-04-21
php使用redis持久化,redis如何持久化 2019-04-21
php7.1解压包安装,【Swoole】php7.1安装swoole扩展 2019-04-21