[Classification] 07. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

2020. 8. 26. 13:47

728x90

논문: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Abstract

최근에는 , Very deep convolutional networks는 이미지 인식 성능에서 가장큰 중심으로 되여 오고 있다. 그중에서 Inception architecture은 이미지 인식에서 우수한 성능을 가지고 있으며 , 계산 비용도 적게 들어간다. 최근에는 residual 와 전통적인 아키텍처와 연계되어 사용하는 것이 최점단 성능을 2015 ILSVRC challenge에서 제공하는 것을 소계하였다:그 성능은 최신의 세대 Inception-v3네트워크와 비슷하다. 그래서 residual 와 Inception의아키텍츠가결합하여 더 좋은 성능을 가질 수 있는 것인 의문을 가지게 된다. 여기서 우리는 충분한 실험으로 ,residual와 연결하여 Inception architectur가 훈련에서 상당히 가속한다는 것을 증명 하였다. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin.

우리는 또한 여러가지 새로운 주요 아키텍처(residual and non-residual Inception networks)를 하였다. 이러한 변형은 ILSVRC 2012 classification task에서 단일 프레임 인식 성능을 상당히 향상시켰다. 우리는 더욱더 설명한것은 , very wide residual Inception networks 의 안정성 훈련을 전제로 보장하면서 , 어떻게 합리적으로 activation을 추가하는 것인지 .

앙상블 3개 Residual + 1개 Inception-4 ImageNet classification (CLS) challenge 에서 top-5 error는 3.08% 얻었다.

1. Introduction

2012년 ImageNet의 우성자 -Krizhevsky , 그들의 제안한 "AlexNet"은 성공적으로 여러가지 비전 작업 영역에서 성공적으로 사용되였다 . 예를 들면 object-detection [4], segmentation [10], human pose estimation [17], video classification [7], object tracking [18], and superresolution [3]. These examples are but a few of all the applications to which deep convolutional networks have been very successfully applied ever since.

여기에서 우리는 어떻게 최근의 아이디어 두 모델을 결합하는 것을 연구하였다. :

Residual connections introduced by He et al.in [5] ResNet은 residual의 연결은 very deep architectures 학습 중에서 꼭 필요한 중요한 조건이라고 생각한다.

the latest revised version of the Inception architecture [15].

Inception networks는 매우 깊은 경향이 있끼 때문에 , 자연스럽게 , 우리는 Inception과 Residual 을 연결하였다. residual사상의 장점을 얻을 뿐만아니라 , Inception의 계산 효율도 보장해준다.

직접 Residual 와 융합 하는 외에 , 우리는 연구하였다. Inception 깊이가 좀더 깊고 , 넓은 경우에 , 효율성을 가지고 있는지 . 이를 위해 , 우리는 새로운 버전 Inception -v4를 설계하였다. Inception-v4는 uniform simplified architecture , 그리고 더 많은 Inception 모듈을 가지고 있다 (Inception-v3보다). Inception-v3는 그 전의 네트워크 설계의 수많은 장점을 승계하였고 , 주요 제약조건은 need for partitioning the model for distributed training using DistBelief [2]. 하지만 TensorFlow [1]이 나온 후에 , 그 단점은 존재하지 않았다. 간단한 아키텍처 소계는 Section 3에서 한다.

여 레포트에는 우리는 두가지 순순한 모델 변형 Inception-v3 and v4을 비교하고 , 그리고 유사한 버전의 hybrid Inception-ResNet 을 비교한다 . parameter나 computational complexity가 non-residual 모델과 유사해야한다는 주된 제약 사항을 고려한 임시적인 방법으로 만들어진 모델들이다. 사실은 우리는 더 크고 더 넓은 Inception-ResNet variants을 테스트 해봤지만 그것들의 성능은 ImageNet classification challenge [11] dataset에서 비슷하다.

마지막 실험은 논문에서 제안한 우수한 모델들을 ensemble와 비교한다. Inception-v4와 Inception-ResNet-v2가 유사한 성능을 보였으며, ImageNet validation dataset의 single frame에 대한 평가에서 state-of-the-art의 성능을 능가했다. 따라서, ImagaeNet과 같이 잘 연구 된 데이터에서, 이러한 조합들이 어떻게 state-of-the-art 성능을 달성하는지 알아보려 했다. 놀랍게도 ,Ensemble 성능에서는 single frame에서의 성능 이득만큼 차이나지 않았다. 그럼에도, ensemble에서의 best 성능은 ImageNet validation set에 대한 top-5 error가 3.1%를 달성한다.

마지막 장에서는 classification에 실패한 경우의 일부에 대해 알아보고, ensemble이 여전히 dataset에 대한 label noise까지 도달하진 못했으며, 예측을 위한 개선 여지가 여전히 존재한다고 결론 내린다.

2. Related Work

Krizhevsky et al. [8]이 AlextNet을 제안 후에 , Convolutional networks 은 in large scale image recognition tasks에서 점점 유행해오고 있다 . Some of the next important milestones were Network-innetwork [9] by Lin et al., VGGNet [12] by Simonyan et al. and GoogLeNet (Inception-v1) [14] by Szegedy et al.

Residual connection은 ResNet에서 소개됐다. 여기서는 image recognition과 object detection 분야에서, signal의 additive merging을 활용함으로써 얻는 이점에 대해, 설득력 있는 이론적 및 실용적인 증거를 제시하고 있다. 저자들은 residual connections이 본질적으로 very deep convolutional models 학습에서 꼭 필요한 조건이라고 생각한다. 하지만 , 이 논문의 연구 결과에서는 최소한 image recognition에 대해서는 이 주장을 지지하지 못하는 것으로 보인다. 하지만 , 이러한 residual connection이 이득이 되는 범위를 이해하려면, deeper architecture에 대한 measurement point가 더 많이 필요해 보인다.실험에서는 residual connection을 활용하지 않고도, 경쟁력 있는 성능의 very deep network를 학습 시키는 것이 그리 어렵지 않다는 것을 보여준다. 하지만 , residual connection을 사용하면 학습 속도가 크게 향상되는 것으로 보이며, 이는 residual connection의 활용에 대한 큰 이유가 된다.

Inception deep convolutional architecture는 [14] 에서 소개됬으면 GoogLeNet or Inception-v1라고 부른다. 이후의 Inception 구조는 다양한 방법으로 개선 됐으며, 처음에는 batch normalization을 이용한 [6] Inception-v2 by Ioffe et al 구조가 제안됐다. 그 다음으로는 factorization 아이디어의 추가를 통해 개선 된 in the third iteration [15] Inception-v3이 제안 됬다.

Figure 1. Residual connections as introduced in He et al. [5].

Figure 2. Optimized version of ResNet connections by [5] to shield computation.

3. Architectural Choices

3.1. Pure Inception blocks

우리의 older Inception models 에서 원래는 분할 방식으로학습하였다. 각 복제본(replica)은, 전체 모델이 메모리에 올라갈 수 있도록 여러 개의 sub-network로 분할됐다.하지만, the Inception architecture는 고도로 튜닝하였다. 즉, 학습 후의 네트워크 성능에 영향을 미치지 않는 layer들의 filter 개수에 많은 변화를 줄 수 있다.원래는 우리는 여러 sub-network 간의 계산적인 균형을 위해 layer의 크기를 조심스럽게 튜닝했으며, 이를 통해 학습 속도를 최적화시켰다. 대조적으로, TensorFlow 분산 학습 프레임워크를 도입하면, 복제본의 분할하지 않고도 가장 최근의 모델을 학습시킬 수 있다.

이는 gradient 계산에 필요한 tensor를 신중하게 고려하고, 이러한 tensors를 줄이기 위한 계산적인 구조화를 통해 부분적으로 가능해진다. 즉, backpropation에 사용된 메모리의 recent optimization으로 가능하게 된다. 역사적으로, 우리는 아키텍처 선택을 변경하는 것에 대해 상대적으로 전통적이며 제한된 실험을 하였다.네트워크의 안정을 유지하면서, 일부 구성 요소들에 변화를 주기 위함이기 때문이다. 또한, 네트워크 초반부의 구조를 단순화하지 않으면 필요 이상으로 복잡해보였다고 한다.우리의 새로운 Inception-v4의 실험에서는 , 이런 불필요한 baggage를 버리고 각 grid size에 대한 Inception block의 구조를 획일화 시켰다.

Figure 9 : for the large scale structure of the Inception-v4 network

Figures 3, 4, 5, 6, 7 and 8 : 자세한 structure of its components

Inception module의 구조에서,

‘V’라고 표시 되지 않은 것은 figures are same-padded meaning that their output grid matches the size of their input

padding = "same"

“V” 라고 표시 된 것은 are valid padded, meaning that input patch of each unit is fully contained in the previous layer and the grid size of the output activation map is reduced accordingly padding = "valid"

3.2. Residual Inception Blocks

residual 버전의 Inception networks에서 우리는 the original Inception보다 저렴한 Inception blocks사용한다.각 Inception block 뒤에는, filter bank의 dimension을 입력의 depth에 맞추기 위한 filter-expansion layer(1 × 1 convolution without activation) 가 사용된다. Activation이 없는 1x1 conv layer에 해당하며, 이는 Inception block에 의한 dimensionality reduction을 보완하기 위함이다.

우리는 여러가지 버전의 residual version of Inception을 시도하였다. 두가지 모델에 대해서 여기에서 상세하게 설명한다. Inception-ResNet-v1의 비용은 대략 Inception-v3과 유사하며, Inception-ResNet-v2의 비용은 3.1절의 Inception-v4와 일치한다. Figure 15에서는 large scale structure of both varianets. ( 하지만, 실제로 Inception-v4는 더 많은 layer의 수로 인해, 학습 속도가 더 느린 것으로 증명됐다. )

우리의 residual와 non-residual Inception버전간의 또 다른 작은 기술의 차이는 batch-normalization을 traiditional layer에서만 사용됐으며, summation에는 사용하지 않았다. batchnormalization 을 충분히 사용하는 것이 이득이긴 하지만, 각 모델의 복제본을 single GPU 상에서 유지하기 위함이다. 큰 activation size를 가진 layer가 차지하는 메모리 공간은 GPU memory를 불균형하게 소비하는 것으로 밝혀졌다고 한다. 이러한 layer들 위의 batch-normalization을 생략함으로써 , 우리는 Inception block의 수를 크게 늘릴 수 있었다.우리는 컴퓨팅 리소스를 보다 효율적으로 활용해서, 이런 trade-off가 필요 없어졌으면 하는것을 바란다.

Figure 3. The schema for stem of the pure Inception-v4 and Inception-ResNet-v2 networks. This is the input part of those networks. Cf. Figures 9 and 15

Figure 4. The schema for 35 × 35 grid modules of the pure Inception-v4 network. This is the Inception-A block of Figure 9.

Figure 5. The schema for 17 × 17 grid modules of the pure Inception-v4 network. This is the Inception-B block of Figure 9.

Figure 6. The schema for 8×8 grid modules of the pure Inceptionv4 network. This is the Inception-C block of Figure 9.

Figure 7. The schema for 35 × 35 to 17 × 17 reduction module. Different variants of this blocks (with various number of filters) are used in Figure 9, and 15 in each of the new Inception(-v4, - ResNet-v1, -ResNet-v2) variants presented in this paper. The k, l, m, n numbers represent filter bank sizes which can be looked up in Table 1.

Figure 8. The schema for 17 × 17 to 8 × 8 grid-reduction module. This is the reduction module used by the pure Inception-v4 network in Figure 9.

Figure 9. The overall schema of the Inception-v4 network. For the detailed modules, please refer to Figures 3, 4, 5, 6, 7 and 8 for the detailed structure of the various components.

Figure 10. The schema for 35 × 35 grid (Inception-ResNet-A) module of Inception-ResNet-v1 network.

Figure 11. The schema for 17 × 17 grid (Inception-ResNet-B) module of Inception-ResNet-v1 network.

Figure 12. “Reduction-B” 17×17 to 8×8 grid-reduction module. This module used by the smaller Inception-ResNet-v1 network in Figure 15.

Figure 13. The schema for 8×8 grid (Inception-ResNet-C) module of Inception-ResNet-v1 network.

Figure 14. The stem of the Inception-ResNet-v1 network.

Figure 15. Schema for Inception-ResNet-v1 and InceptionResNet-v2 networks. This schema applies to both networks but the underlying components differ. Inception-ResNet-v1 uses the blocks as described in Figures 14, 10, 7, 11, 12 and 13. InceptionResNet-v2 uses the blocks as described in Figures 3, 16, 7,17, 18 and 19. The output sizes in the diagram refer to the activation vector tensor shapes of Inception-ResNet-v1.

Figure 16. The schema for 35 × 35 grid (Inception-ResNet-A) module of the Inception-ResNet-v2 network.

Figure 17. The schema for 17 × 17 grid (Inception-ResNet-B) module of the Inception-ResNet-v2 network

Figure 18. The schema for 17 × 17 to 8 × 8 grid-reduction module. Reduction-B module used by the wider Inception-ResNet-v1 network in Figure 15.

Figure 19. The schema for 8×8 grid (Inception-ResNet-C) module of the Inception-ResNet-v2 network.

Table 1. The number of filters of the Reduction-A module for the three Inception variants presented in this paper. The four numbers in the colums of the paper parametrize the four convolutions of Figure 7

Figure 20. The general schema for scaling combined Inceptionresnet moduels. We expect that the same idea is useful in the general resnet case, where instead of the Inception block an arbitrary subnetwork is used. The scaling block just scales the last linear activations by a suitable constant, typically around 0.1.

3.3. Scaling of the Residuals

Filter 개수가 1000개를 초과하게 되면 residual variant가 불안정해지기 시작하며, 네트워크가 학습 초기에 죽어버리는 것으로 나타났다. 이 뜻은 마지막 layer전에 the average pooling started to produce only zeros after a few tens of thousands of iterations. learning rate를 낮추거나 batch-normalization을 추가하는 것으로 예방할수없다.

Residual을 누적 된 layer activation에 추가하기 전에 scaling down을 하는 것이 학습의 안정화에 도움되는 것처럼 보였다. 이를 위한 scaling factor는 0.1에서 0.3 사이의 값을 사용했다 (cf. Figure 20).

He et al. in [5] ResNet에서는 very deep residual networks 의 경우에도 비슷한 불안정성을 관찰했고, 이를 두단계로 learning rate 로 스케쥴링 했다. 첫 단계인 “warm-up” 단계에서는 매우 낮은 learning rate로 학습하다가, 두 번째 단계에서는 높은 learning rate로 학습한다. 저자들은 filter의 수가 매우 많은 경우에는 0.00001의 매우 낮은 learning rate조차도 불안정성에 대처하기에 충분하지 않으며, 높은 learning rate로 학습한다면 이를 제거할 기회를 가지게 된다는 것을 알아냈다. 저자들은 또한, 이 방법 대신 residual을 scaling하는 것이 훨씬 더 안정적이라는 것을 알아냈다.

이러한 scaling이 엄밀히 꼭 필요한 것은 아니며, 최종 성능에 해를 끼치지 않으면서 학습의 안정화에 도움이 되는 것이라 한다.

4. Training Methodology

20개의 복제본(replica)이 각각 NVidia Kepler GPU에서 수행되도록 TensorFlow [1] 분산 학습 시스템을 사용하여 stochastic gradient로 학습했다.

실험 초기에는

momentum [13] with a decay of 0.9

best 성능은 decay가 0.9

RMSProp [16] with decay of 0.9 and = 1.0

learning rate of 0.045,2번의 epoch마다 0.94를 곱했다.

모델 평가는 시간이 지남에 따라 계산 된 parameter들의 running average로 수행됐다.

Figure 21. Top-1 error evolution during training of pure Inceptionv3 vs a residual network of similar computational cost. The evaluation is measured on a single crop on the non-blacklist images of the ILSVRC-2012 validation set. The residual model was training much faster, but reached slightly worse final accuracy than the traditional Inception-v3.

5. Experimental Results

우선 4가지 버전의 학습 중 top-1 및 top-5 validation error를 관찰한다. Bounding box의 질이 좋지 못한 약 1700개의 blacklisted entity를 생략한 validation set의 subset에 대해 지속적으로 평가했었음을 실험 후에 발견했다. Blacklisted entity의 생략은 CLS-LOC benchmark에 대해서만 이뤄졌어야 했었다. 그럼에도, 저자들의 이전 연구를 포함한 다른 연구들과는 비교할 수 없을 정도의 긍정적 수치를 얻었다. 성능의 차이는 top-1와 top-5 error에서 각각 0.3%, 0.15% 정도였으며, 그 차이가 일관적이었기 때문에 성능 그래프 간의 비교가 공정한 것으로 본다고 한다.

반면에 , 50000개의 이미지로 구성 된 validation set에 대해, multi-crop 및 ensemble 결과는 재수행했다. 또한, 최종 ensemble 결과는 test set에 대해 수행된 후, 검증을 위해 ILSVRC test server에 전송하고 overfitting이 일어나지 않았는지 확인했다. 저자들은 최종 검증을 한 번만 수행했었으며, 작년에는 결과를 두 번만 제출했다는 점을 강조하고 싶다고 한다.

첫번째 BN-Inception paper

그 다음에는 during the ILSVR-2015 CLSLOC competition

테스트의 수에 따라, 제안하는 모델의 일반적인 성능을 추정할 수 있다고 믿기 때문이다.

마지막으로, Inception과 Incepion-ResNet의 다양한 버전에 대한 성능 비교를 보여준다.Inception-v3와 Inception-v4는 residual connection을 활용하지 않는 deep convolutional network이며, Inception-ResNet-v1과 Inception-ResNet-v2는 filter concatenation 대신 residual connection을 이용하는 Inception style network이다.

Figure 22. Top-5 error evolution during training of pure Inceptionv3 vs a residual Inception of similar computational cost. The evaluation is measured on a single crop on the non-blacklist images of the ILSVRC-2012 validation set. The residual version has trained much faster and reached slightly better final recall on the validation set.

Figure 23. Top-1 error evolution during training of pure Inceptionv3 vs a residual Inception of similar computational cost. The evaluation is measured on a single crop on the non-blacklist images of the ILSVRC-2012 validation set. The residual version was training much faster and reached slightly better final accuracy than the traditional Inception-v4.

Table 2. Single crop - single model experimental results. Reported on the non-blacklisted subset of the validation set of ILSVRC 2012.

Figure 24. Top-5 error evolution during training of pure Inceptionv4 vs a residual Inception of similar computational cost. The evaluation is measured on a single crop on the non-blacklist images of the ILSVRC-2012 validation set. The residual version trained faster and reached slightly better final recall on the validation set.

Figure 25. Top-5 error evolution of all four models (single model, single crop). Showing the improvement due to larger model size. Although the residual version converges faster, the final accuracy seems to mainly depend on the model size.

Figure 26. Top-1 error evolution of all four models (single model, single crop). This paints a similar picture as the top-5 evaluation.

Table.2는 validation set에 대한 다양한 구조들의 single model, single-crop 성능을 보여준다.

Table.3은 다양한 모델들이 적은 수의 crop을 사용한 경우의 성능을 보여준다. 10 crops for ResNet as was reported in [5]), for the Inception variants, we have used the 12 crops evaluation as as described in [14].10/12-crop evaluation, single model에 대한 실험 결과이다.

Table 3. 10/12 crops evaluations - single model experimental results. Reported on the all 50000 images of the validation set of ILSVRC 2012.

Table 4. 144 crops evaluations - single model experimental results. Reported on the all 50000 images of the validation set of ILSVRC 2012.

Table 5. Ensemble results with 144 crops/dense evaluation. Reported on the all 50000 images of the validation set of ILSVRC 2012. For Inception-v4(+Residual), the ensemble consists of one pure Inception-v4 and three Inception-ResNet-v2 models and were evaluated both on the validation and on the test-set. The test-set performance was 3.08% top-5 error verifying that we don’t overfit on the validation set.

Table 4는 다양한 모델 에 대한 single model성능을 보여준다. For residual network the dense evaluation result is reported from [5]. For the inception networks, the 144 crops strategy was used as described in [14].

Table.5는 ensemble 결과를 비교한다. For the pure residual network the 6 models dense evaluation result is reported from [5]. For the inception networks 4 models were ensembled using the 144 crops strategy as described in [14].

6. Conclusions

이 논문에서는 3가지의 새로운 network architecture를 보였다.

• Inception-ResNet-v1: a hybrid Inception version that has a similar computational cost to Inception-v3 from [15].

Inception-v3 과 비슷한 hybrid Inception버전

• Inception-ResNet-v2: a costlier hybrid Inception version with significantly improved recognition performance.

Recognition 성능이 크게 향상 된 비싼 비용의 하이브리드 버전

• Inception-v4: a pure Inception variant without residual connections with roughly the same recognition performance as Inception-ResNet-v2.

Inception-ResNet-v2와 거의 동일한 recognition 성능을 가진 non-residual, pure Inception 버전

Residual connection의 도입으로 Inception 구조의 학습 속도가 얼마나 향상되는지 알아봤다. 또한 제안하는 모델들은 모델의 크기가 커짐에 따라, (residual connection의 유무에 상관없이) 이들의 모든 이전 네트워크 성능을 능가했다.

참조 :

https://blog.csdn.net/kxh123456/article/details/102828148

深度学习论文翻译 -- Inception-v4，Inception-ResNet and the Impact of Residual Connections on Learning_DAOCHI-CSDN博��

本文翻译论文为深度学习经典模型之一：Inception-V4 论文链接：https://arxiv.org/pdf/1602.07261.pdf 摘要：近些年，超深度卷积网络成为图像识别领域的核心算法。其中，Inception结构在图像分类中表现优��

blog.csdn.net

https://sike6054.github.io/blog/paper/fourth-post/

(Inception-v4) Inception-v4, inception-resnet and the impact of residual connections on learning 번역 및 추가 설명과 Ker

sike6054.github.io

'논문 > Image Classification' 카테고리의 다른 글

[Image Classification] ResNet : Deep Residual Learning for Image Recognition (0)	2020.08.28
[Classification] 09. Wide Residual Networks (0)	2020.08.27
[Image Classification] Inception-v2,Inception-v3 : Rethinking the Inception Architecture for Computer Vision (0)	2020.08.25
[Classification] 05. ResNet200, ResNet-1001 : Identity Mappings in Deep Residual Networks (0)	2020.08.24
[Classification] 08. Stochastic Depth ResNet : Deep Networks with Stochastic Depth (0)	2020.08.21

NAIAHD

[Classification] 07. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

'논문 > Image Classification' 카테고리의 다른 글

+ Recent posts

티스토리툴바