BLOG POSTS

MangoHost Blog / How to Train YOLOv5 with Custom Data

How to Train YOLOv5 with Custom Data

Training YOLOv5 with custom datasets transforms the general-purpose object detection model into a specialized tool tailored for your specific use case. Whether you’re building surveillance systems that detect specific vehicles, agricultural tools that identify crop diseases, or retail applications that recognize particular products, custom-trained YOLO models deliver significantly better accuracy than generic pre-trained models. This guide walks you through the complete process of preparing data, configuring training parameters, running the training pipeline, and optimizing your custom YOLOv5 model for production deployment.

How YOLOv5 Custom Training Works

YOLOv5 uses transfer learning to adapt pre-trained weights from the COCO dataset to your custom classes. The model architecture consists of a backbone (CSPDarknet53), neck (PANet), and head (YOLO detection layers) that work together to predict bounding boxes and class probabilities. During custom training, the final classification layers are modified to match your number of classes, while earlier layers retain learned features like edges, shapes, and textures.

The training process involves feeding annotated images through the network, computing losses for box coordinates, objectness scores, and class predictions, then backpropagating gradients to update model weights. YOLOv5 implements several advanced techniques including mosaic augmentation, CIoU loss, and genetic algorithm hyperparameter evolution to improve training efficiency and final model performance.

Step-by-Step Implementation Guide

Start by setting up your development environment with the required dependencies:

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Create your dataset directory structure following YOLOv5 conventions:

custom_dataset/
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── labels/
    ├── train/
    ├── val/
    └── test/

Prepare your annotations in YOLO format where each text file contains one line per object:

# Format: class_id center_x center_y width height (normalized 0-1)
0 0.5 0.3 0.2 0.4
1 0.7 0.6 0.15 0.25

Create a dataset configuration file (dataset.yaml):

path: /path/to/custom_dataset
train: images/train
val: images/val
test: images/test

nc: 3  # number of classes
names: ['class1', 'class2', 'class3']

Launch training with appropriate parameters for your hardware and dataset size:

python train.py --img 640 --batch 16 --epochs 100 --data dataset.yaml --weights yolov5s.pt --cache

For larger datasets or production models, consider using more powerful base models and longer training:

python train.py --img 1280 --batch 8 --epochs 300 --data dataset.yaml --weights yolov5x.pt --device 0,1 --multi-scale

Real-World Examples and Use Cases

A manufacturing company successfully deployed YOLOv5 for quality control by training on 15,000 images of circuit boards with defect annotations. Their custom model achieved 94.3% mAP@0.5 compared to 31% with the pre-trained COCO model. The training configuration used YOLOv5l as the base model with 200 epochs and heavy augmentation:

python train.py --img 832 --batch 12 --epochs 200 --data pcb_defects.yaml --weights yolov5l.pt --hyp hyp.finetune.yaml --augment

An agricultural startup trained YOLOv5 on drone imagery to detect pest damage across 50,000 crop field images. They implemented a multi-stage training approach, first training on general crop features for 100 epochs, then fine-tuning on pest-specific annotations:

# Stage 1: General crop detection
python train.py --img 640 --batch 24 --epochs 100 --data crops_general.yaml --weights yolov5m.pt

# Stage 2: Pest-specific fine-tuning  
python train.py --img 640 --batch 24 --epochs 150 --data pest_damage.yaml --weights runs/train/exp/weights/best.pt --freeze 10

Security applications benefit significantly from custom training. A retail chain trained YOLOv5 on 25,000 CCTV frames to detect shoplifting behaviors, achieving real-time detection at 45 FPS on NVIDIA RTX 3070 hardware with custom data augmentation techniques.

Performance Comparisons and Model Selection

Model	Parameters (M)	FLOPs (G)	Speed GPU (ms)	mAP@0.5	Best Use Case
YOLOv5n	1.9	4.5	6.3	45.7	Mobile/Edge devices
YOLOv5s	7.2	16.5	6.4	56.8	Balanced speed/accuracy
YOLOv5m	21.2	49.0	8.2	64.1	Production systems
YOLOv5l	46.5	109.1	10.1	67.3	High accuracy needs
YOLOv5x	86.7	205.7	12.1	68.9	Maximum accuracy

Dataset size significantly impacts model selection effectiveness:

Dataset Size	Recommended Model	Training Epochs	Expected mAP Improvement	Training Time (V100)
< 1,000 images	YOLOv5s	100-150	15-25%	2-4 hours
1,000-5,000	YOLOv5m	150-250	25-40%	8-12 hours
5,000-20,000	YOLOv5l	200-300	40-60%	1-2 days
> 20,000	YOLOv5x	300-500	60-80%	3-5 days

Advanced Training Techniques and Optimizations

Implement progressive resizing to improve training efficiency and final accuracy. Start with smaller image sizes and gradually increase resolution:

# Phase 1: Lower resolution for faster initial learning
python train.py --img 416 --batch 32 --epochs 50 --data dataset.yaml --weights yolov5m.pt --name phase1

# Phase 2: Medium resolution for detail refinement  
python train.py --img 640 --batch 16 --epochs 100 --data dataset.yaml --weights runs/train/phase1/weights/best.pt --name phase2

# Phase 3: Full resolution for final optimization
python train.py --img 832 --batch 8 --epochs 50 --data dataset.yaml --weights runs/train/phase2/weights/best.pt --name final

Custom hyperparameter optimization delivers substantial performance improvements. Create a custom hyperparameter file (hyp.custom.yaml) based on your dataset characteristics:

lr0: 0.01
lrf: 0.2
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 0.05
cls: 0.5
cls_pw: 1.0
obj: 1.0
obj_pw: 1.0
iou_t: 0.20
anchor_t: 4.0
fl_gamma: 0.0
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.9
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
mosaic: 1.0
mixup: 0.15

Use the custom hyperparameters with genetic algorithm evolution for automatic optimization:

python train.py --img 640 --batch 16 --epochs 300 --data dataset.yaml --weights yolov5m.pt --hyp hyp.custom.yaml --evolve 50

Common Issues and Troubleshooting

GPU memory issues are the most frequent problem during training. If you encounter CUDA out of memory errors, reduce batch size and enable gradient accumulation:

# Reduce batch size and accumulate gradients to maintain effective batch size
python train.py --img 640 --batch 4 --epochs 100 --data dataset.yaml --weights yolov5m.pt --accumulate 4

Poor convergence often results from inappropriate learning rates or insufficient data augmentation. Monitor training metrics and adjust accordingly:

If loss plateaus early: Reduce learning rate by 10x and increase epochs
If loss oscillates wildly: Lower learning rate and momentum
If validation mAP drops while training improves: Increase data augmentation strength
If small objects aren’t detected well: Increase input image resolution and use multi-scale training

Class imbalance severely impacts model performance. Address this by implementing focal loss and class-weighted sampling:

# Enable focal loss for imbalanced datasets
python train.py --img 640 --batch 16 --epochs 200 --data dataset.yaml --weights yolov5m.pt --hyp hyp.focal.yaml

Annotation quality issues cause training instability. Validate your dataset using YOLOv5’s built-in tools:

# Check dataset statistics and visualize annotations
python val.py --data dataset.yaml --weights yolov5s.pt --task study
python detect.py --weights yolov5s.pt --source dataset/images/val --save-txt --save-conf

Best Practices and Production Deployment

Data preparation significantly impacts final model quality. Follow these guidelines for optimal results:

Maintain consistent annotation quality across all team members using detailed annotation guidelines
Include diverse lighting conditions, angles, and backgrounds in your training set
Keep validation and test sets completely separate with no data leakage
Aim for at least 100-200 examples per class for basic functionality, 1000+ for production quality
Use hard negative mining by including challenging backgrounds without target objects

Model optimization for deployment requires several post-training steps. Export your trained model to various formats for different deployment scenarios:

# Export to ONNX for cross-platform deployment
python export.py --weights runs/train/exp/weights/best.pt --include onnx --img 640

# Export to TensorRT for NVIDIA GPU inference acceleration  
python export.py --weights runs/train/exp/weights/best.pt --include engine --img 640 --device 0

# Export to CoreML for iOS deployment
python export.py --weights runs/train/exp/weights/best.pt --include coreml --img 640

Implement model versioning and performance monitoring in production environments. Track key metrics like inference time, accuracy degradation, and edge case failures:

import torch
from utils.general import non_max_suppression
import time

def benchmark_model(model_path, test_images, conf_threshold=0.25):
    model = torch.load(model_path)
    model.eval()
    
    inference_times = []
    with torch.no_grad():
        for img in test_images:
            start_time = time.time()
            pred = model(img)
            pred = non_max_suppression(pred, conf_threshold)
            inference_times.append(time.time() - start_time)
    
    return {
        'avg_inference_time': sum(inference_times) / len(inference_times),
        'fps': len(inference_times) / sum(inference_times),
        'model_size_mb': os.path.getsize(model_path) / (1024 * 1024)
    }

Consider implementing A/B testing frameworks to compare model versions in production environments. Deploy new models to a subset of traffic first, monitor performance metrics, and gradually roll out successful improvements.

For comprehensive documentation and advanced techniques, refer to the official YOLOv5 documentation and explore the extensive community wiki for troubleshooting specific deployment scenarios.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.