반응형

https://arxiv.org/abs/1804.02767?fbclid=IwAR3A2nyK8EPa-JcoGp_N6tNqVkKOmy2J1ip5AYcEki5FzkZ62E3z6tbNSy0

 

YOLOv3: An Incremental Improvement

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3

arxiv.org

 

코드 : 

https://pjreddie.com/darknet/yolo/

 

YOLO: Real-Time Object Detection

YOLO: Real-Time Object Detection You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev. Comparison to Other Detectors YOLOv3 is extremel

pjreddie.com

Abstract

약간의 디자인 change를 했고 → 성능 , 속도를 향상 ⇒ ideas from other people

new network that’s pretty swell.

320 x 320 YOLOv3 runs in 22ms at 28.2 mAP, as accurate as SSD but three times faster.

코드 : https://pjreddie.com/darknet/yolo/

1. Introduction

YOLOV3

 

2. The Deal

2.1. Bounding Box Prediction

YOLO9000 bounding box를 예측하는데 dimension clusters anchor boxes로 사용한다.

4 coordinates for each bounding box

loss : sum of squared error loss

predicts an objectness score for each bounding box using logistic regression

threshold of .5

Faster RCCN과 달리 우리 시스템은 각 ground truth object 에 대해 하나의 bounding box를 할당한다.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.2. Class Prediction

multilabel classification를 사용하여 bounding box에 포함될 수 있는 class를 예측한다.

softmax를 상요하지 않고 대신 단순한 independent logistic classifiers를 사용한다. softmax를 사용하면 각 bounding box가 겹치는 레이블이 많을 경우 잘 안된다.

학습하는 동안 class predictions 을 위해 binary cross entropy loss을 사용한다.

A multilabel approach better models the data. ⇒ 겹치는 label에 대하여

 

2.3. Predictions Across Scales

boxes는 3가지 다른 scale에 대해서 예측한다.

we predict 3 boxes at each scale so the tensor is N × N × [3 ∗ (4 + 1 + 80)] for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.

다음으로 우리는 previous layer 2개에서 feature map을 가져와 2x upsample 한다. 또한 network의 초기에서 feature map을 가져와 연결을 사용하여 upsampled 된 feature와 concatenation한다. ⇒ 이 방법을 사용하여 upsampled feature에서 더 의미 있는 의미 정보를 얻고 이전 feature map에서 더 세분화된 정보를 얻을 수 있다.

Feature Pyramid Network

We then add a few more convolutional layers to process this combined feature map, and eventually predict a similar tensor, although now twice the size.

COCO dataset the 9 clusters were: (10×13),(16×30),(33×23),(30×61),(62×45),(59×119),(116 × 90),(156 × 198),(373 × 326).

 

2.4. Feature Extractor

hybrid approach between the network used in YOLOv2, Darknet-19, and that newfangled residual network stuff

successive 3 × 3 and 1 × 1 convolutional layers but now has some shortcut connections

Darknet-53

 

 

2.5. Training

full image

multi-scale training, lots of data augmentation, batch normalization, all the standard stuff.

Darknet neural network framework for training and testing

 

3. How We Do

 

4. Things We Tried That Didn’t Work

Anchor box x, y offset predictions.

normal anchor box prediction mechanism ⇒ model stability를 감소시키고 잘 작동하지 않는다는 것을 발견

Linear x, y predictions instead of logistic.

linear activation to directly predict the x, y offset instead of the logistic activation ⇒ drop mAP

Focal loss.

dropped our mAP about 2 points. YOLOv3은 별도의 objectness predictions과 conditional class predictions 을 가지고 있기 때문에 focal loss 해결하려는 문제에 이미 강력하다.

Dual IOU thresholds and truth assignment.

Faster RCNN uses two IOU thresholds during training. ⇒ couldn’t get good results.

 

 

 

구글 번역기 참조 하였다. 

반응형

+ Recent posts