Computer Vision in 2020: EfficientNet and YOLOv4
Computer vision is maturing rapidly. EfficientNet scaled neural architecture search to new heights. YOLOv4 pushed real-time detection forward. Here’s what you need to know.
EfficientNet: Scaling Done Right
The Problem
How do you make a CNN better? Traditionally:
- Make it deeper (more layers)
- Make it wider (more channels)
- Use higher resolution
But which combination works best?
Compound Scaling
EfficientNet’s insight: scale all three dimensions together in a principled way.
depth = α^φ
width = β^φ
resolution = γ^φ
where α * β² * γ² ≈ 2
One scaling factor φ controls all dimensions.
Results
| Model | Params | Top-1 Accuracy | FLOPS |
|---|---|---|---|
| ResNet-50 | 26M | 76.0% | 4.1B |
| EfficientNet-B0 | 5.3M | 77.1% | 0.39B |
| EfficientNet-B7 | 66M | 84.3% | 37B |
Smaller, faster, more accurate.
Using EfficientNet
import torch
from efficientnet_pytorch import EfficientNet
# Load pre-trained
model = EfficientNet.from_pretrained('efficientnet-b0')
# Fine-tune for your task
model._fc = torch.nn.Linear(model._fc.in_features, num_classes)
Or with TensorFlow:
import tensorflow as tf
model = tf.keras.applications.EfficientNetB0(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
# Add your classifier
x = tf.keras.layers.GlobalAveragePooling2D()(model.output)
output = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
model = tf.keras.Model(model.input, output)
When to Use Each Size
| Model | Use Case |
|---|---|
| B0-B3 | Mobile, edge deployment |
| B4-B5 | Server-side, balanced |
| B6-B7 | Maximum accuracy, compute available |
YOLOv4: Real-Time Detection
YOLO Evolution
- YOLOv1 (2016): First real-time detector
- YOLOv2/v3: Improved accuracy
- YOLOv4 (2020): State-of-the-art speed/accuracy
Key Improvements
Bag of Freebies (training tricks):
- CutMix, Mosaic augmentation
- DropBlock regularization
- Self-adversarial training
Bag of Specials (architecture):
- CSPDarknet53 backbone
- SPP (Spatial Pyramid Pooling)
- PANet neck
Performance
YOLOv4: 43.5% mAP @ 65 FPS (Tesla V100)
Real-time detection with high accuracy.
Using YOLOv4
With Darknet (official):
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights image.jpg
With PyTorch:
import torch
# Load YOLOv4
model = torch.hub.load('ultralytics/yolov5', 'yolov4')
# Inference
results = model('image.jpg')
results.print()
results.show()
Training Custom Objects
# Prepare annotations in YOLO format
# image.jpg → image.txt
# <class_id> <x_center> <y_center> <width> <height>
# Create dataset config
# obj.data
classes = 3
train = data/train.txt
valid = data/valid.txt
names = data/obj.names
# Train
./darknet detector train data/obj.data cfg/yolov4-custom.cfg yolov4.conv.137
Choosing Between Models
Classification vs Detection
| Task | Best Choice |
|---|---|
| Image classification | EfficientNet |
| Object detection | YOLOv4, EfficientDet |
| Instance segmentation | Mask R-CNN, YOLACT |
| Real-time detection | YOLOv4 |
Speed vs Accuracy
Accuracy ────────────────────────────►
│
│ EfficientDet-D7 ●
│ ● Cascade R-CNN
│ YOLOv4 ●
│
│ YOLOv4-tiny ●
│
Speed │
Practical Tips
Data Augmentation
import albumentations as A
transform = A.Compose([
A.RandomResizedCrop(224, 224),
A.HorizontalFlip(p=0.5),
A.ColorJitter(brightness=0.2, contrast=0.2),
A.Normalize(),
])
Augmentation is free accuracy.
Transfer Learning
# Freeze backbone, train head
for param in model.backbone.parameters():
param.requires_grad = False
# Train classifier
optimizer = torch.optim.Adam(model.head.parameters(), lr=1e-3)
# Later: unfreeze and fine-tune all
for param in model.parameters():
param.requires_grad = True
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
Deployment
# TensorRT for NVIDIA GPUs
import torch2trt
model_trt = torch2trt(model, [dummy_input])
torch.save(model_trt.state_dict(), 'model_trt.pth')
# ONNX for cross-platform
torch.onnx.export(model, dummy_input, 'model.onnx')
What’s Next?
2020 trends continuing:
- Vision Transformers (ViT): Transformers for images
- Self-supervised learning: Less labeled data needed
- Edge deployment: Smaller, faster models
- Multi-modal: Vision + language
Final Thoughts
EfficientNet and YOLOv4 represent the 2020 state of the art. But the field moves fast.
Start with pre-trained models. Fine-tune for your task. Optimize for your deployment target. The fundamentals haven’t changed—just the architectures get better.
Stand on the shoulders of giants. Fine-tune, don’t train from scratch.