Computer Vision in 2020: EfficientNet and YOLOv4

ai machine-learning computer-vision

Computer vision is maturing rapidly. EfficientNet scaled neural architecture search to new heights. YOLOv4 pushed real-time detection forward. Here’s what you need to know.

EfficientNet: Scaling Done Right

The Problem

How do you make a CNN better? Traditionally:

But which combination works best?

Compound Scaling

EfficientNet’s insight: scale all three dimensions together in a principled way.

depth = α^φ
width = β^φ
resolution = γ^φ

where α * β² * γ² ≈ 2

One scaling factor φ controls all dimensions.

Results

ModelParamsTop-1 AccuracyFLOPS
ResNet-5026M76.0%4.1B
EfficientNet-B05.3M77.1%0.39B
EfficientNet-B766M84.3%37B

Smaller, faster, more accurate.

Using EfficientNet

import torch
from efficientnet_pytorch import EfficientNet

# Load pre-trained
model = EfficientNet.from_pretrained('efficientnet-b0')

# Fine-tune for your task
model._fc = torch.nn.Linear(model._fc.in_features, num_classes)

Or with TensorFlow:

import tensorflow as tf

model = tf.keras.applications.EfficientNetB0(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Add your classifier
x = tf.keras.layers.GlobalAveragePooling2D()(model.output)
output = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
model = tf.keras.Model(model.input, output)

When to Use Each Size

ModelUse Case
B0-B3Mobile, edge deployment
B4-B5Server-side, balanced
B6-B7Maximum accuracy, compute available

YOLOv4: Real-Time Detection

YOLO Evolution

Key Improvements

Bag of Freebies (training tricks):

Bag of Specials (architecture):

Performance

YOLOv4: 43.5% mAP @ 65 FPS (Tesla V100)

Real-time detection with high accuracy.

Using YOLOv4

With Darknet (official):

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights image.jpg

With PyTorch:

import torch

# Load YOLOv4
model = torch.hub.load('ultralytics/yolov5', 'yolov4')

# Inference
results = model('image.jpg')
results.print()
results.show()

Training Custom Objects

# Prepare annotations in YOLO format
# image.jpg → image.txt
# <class_id> <x_center> <y_center> <width> <height>

# Create dataset config
# obj.data
classes = 3
train = data/train.txt
valid = data/valid.txt
names = data/obj.names

# Train
./darknet detector train data/obj.data cfg/yolov4-custom.cfg yolov4.conv.137

Choosing Between Models

Classification vs Detection

TaskBest Choice
Image classificationEfficientNet
Object detectionYOLOv4, EfficientDet
Instance segmentationMask R-CNN, YOLACT
Real-time detectionYOLOv4

Speed vs Accuracy

Accuracy ────────────────────────────►

    │  EfficientDet-D7  ●
    │                      ● Cascade R-CNN
    │  YOLOv4  ●

    │  YOLOv4-tiny  ●

Speed │

Practical Tips

Data Augmentation

import albumentations as A

transform = A.Compose([
    A.RandomResizedCrop(224, 224),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness=0.2, contrast=0.2),
    A.Normalize(),
])

Augmentation is free accuracy.

Transfer Learning

# Freeze backbone, train head
for param in model.backbone.parameters():
    param.requires_grad = False

# Train classifier
optimizer = torch.optim.Adam(model.head.parameters(), lr=1e-3)

# Later: unfreeze and fine-tune all
for param in model.parameters():
    param.requires_grad = True
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)

Deployment

# TensorRT for NVIDIA GPUs
import torch2trt

model_trt = torch2trt(model, [dummy_input])
torch.save(model_trt.state_dict(), 'model_trt.pth')

# ONNX for cross-platform
torch.onnx.export(model, dummy_input, 'model.onnx')

What’s Next?

2020 trends continuing:

Final Thoughts

EfficientNet and YOLOv4 represent the 2020 state of the art. But the field moves fast.

Start with pre-trained models. Fine-tune for your task. Optimize for your deployment target. The fundamentals haven’t changed—just the architectures get better.


Stand on the shoulders of giants. Fine-tune, don’t train from scratch.

All posts