Computer Vision in 2020: EfficientNet and YOLOv4

July 29, 2020

ai machine-learning computer-vision

Computer vision is maturing rapidly. EfficientNet scaled neural architecture search to new heights. YOLOv4 pushed real-time detection forward. Here’s what you need to know.

EfficientNet: Scaling Done Right

The Problem

How do you make a CNN better? Traditionally:

Make it deeper (more layers)
Make it wider (more channels)
Use higher resolution

But which combination works best?

Compound Scaling

EfficientNet’s insight: scale all three dimensions together in a principled way.

depth = α^φ
width = β^φ
resolution = γ^φ

where α * β² * γ² ≈ 2

One scaling factor φ controls all dimensions.

Results

Model	Params	Top-1 Accuracy	FLOPS
ResNet-50	26M	76.0%	4.1B
EfficientNet-B0	5.3M	77.1%	0.39B
EfficientNet-B7	66M	84.3%	37B

Smaller, faster, more accurate.

Using EfficientNet

import torch
from efficientnet_pytorch import EfficientNet

# Load pre-trained
model = EfficientNet.from_pretrained('efficientnet-b0')

# Fine-tune for your task
model._fc = torch.nn.Linear(model._fc.in_features, num_classes)

Or with TensorFlow:

import tensorflow as tf

model = tf.keras.applications.EfficientNetB0(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Add your classifier
x = tf.keras.layers.GlobalAveragePooling2D()(model.output)
output = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
model = tf.keras.Model(model.input, output)

When to Use Each Size

Model	Use Case
B0-B3	Mobile, edge deployment
B4-B5	Server-side, balanced
B6-B7	Maximum accuracy, compute available

YOLOv4: Real-Time Detection

YOLO Evolution

YOLOv1 (2016): First real-time detector
YOLOv2/v3: Improved accuracy
YOLOv4 (2020): State-of-the-art speed/accuracy

Key Improvements

Bag of Freebies (training tricks):

CutMix, Mosaic augmentation
DropBlock regularization
Self-adversarial training

Bag of Specials (architecture):

CSPDarknet53 backbone
SPP (Spatial Pyramid Pooling)
PANet neck

Performance

YOLOv4: 43.5% mAP @ 65 FPS (Tesla V100)

Real-time detection with high accuracy.

Using YOLOv4

With Darknet (official):

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights image.jpg

With PyTorch:

import torch

# Load YOLOv4
model = torch.hub.load('ultralytics/yolov5', 'yolov4')

# Inference
results = model('image.jpg')
results.print()
results.show()

Training Custom Objects

# Prepare annotations in YOLO format
# image.jpg → image.txt
# <class_id> <x_center> <y_center> <width> <height>

# Create dataset config
# obj.data
classes = 3
train = data/train.txt
valid = data/valid.txt
names = data/obj.names

# Train
./darknet detector train data/obj.data cfg/yolov4-custom.cfg yolov4.conv.137

Choosing Between Models

Classification vs Detection

Task	Best Choice
Image classification	EfficientNet
Object detection	YOLOv4, EfficientDet
Instance segmentation	Mask R-CNN, YOLACT
Real-time detection	YOLOv4

Speed vs Accuracy

Accuracy ────────────────────────────►
    │
    │  EfficientDet-D7  ●
    │                      ● Cascade R-CNN
    │  YOLOv4  ●
    │
    │  YOLOv4-tiny  ●
    │
Speed │

Practical Tips

Data Augmentation

import albumentations as A

transform = A.Compose([
    A.RandomResizedCrop(224, 224),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness=0.2, contrast=0.2),
    A.Normalize(),
])

Augmentation is free accuracy.

Transfer Learning

# Freeze backbone, train head
for param in model.backbone.parameters():
    param.requires_grad = False

# Train classifier
optimizer = torch.optim.Adam(model.head.parameters(), lr=1e-3)

# Later: unfreeze and fine-tune all
for param in model.parameters():
    param.requires_grad = True
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)

Deployment

# TensorRT for NVIDIA GPUs
import torch2trt

model_trt = torch2trt(model, [dummy_input])
torch.save(model_trt.state_dict(), 'model_trt.pth')

# ONNX for cross-platform
torch.onnx.export(model, dummy_input, 'model.onnx')

What’s Next?

2020 trends continuing:

Vision Transformers (ViT): Transformers for images
Self-supervised learning: Less labeled data needed
Edge deployment: Smaller, faster models
Multi-modal: Vision + language

Final Thoughts

EfficientNet and YOLOv4 represent the 2020 state of the art. But the field moves fast.

Start with pre-trained models. Fine-tune for your task. Optimize for your deployment target. The fundamentals haven’t changed—just the architectures get better.

Stand on the shoulders of giants. Fine-tune, don’t train from scratch.