Training YOLOv8 on a Custom Dataset
YOLO algorithm brief history
YOLO(You Only Look Once) is a state-of-the-art, real-time object detection and image segmentation model developed by Joseph Redmon and Ali Farhadi at the University of Washington, and first version of YOLO was released in 2015.
YOLOv2 was released in 2016 and it improves the original model by incorporating batch normalization, anchor boxes, and dimension clusters.
Next version YOLOv3 was released in 2018 and further improved the model's performance by using a more efficient backbone network, adding a feature pyramid, and making use of focal loss.
Version YOLOv4, released in 2020, introduced a number of innovations such as the use of Mosaic data augmentation, a new anchor-free detection head, and a new loss function.
Next version YOLOv5, released in 2021, improved the model's performance and added new features such as support for panoptic segmentation and object tracking.
Version YOLOv7 was release in 2022 and bring further improvements on the algorithm.
Latest version YOLOv8 was release in January 2023 and as every previous version improve the algorithm accuracy. YOLOv8 is designed with extensibility in mind, and as a framework that supports all previous versions of YOLO, making it easy to switch between different versions and compare their performance.
Install
There are two ways how YOLOv8 can be installed. It can be installed as a package
# pip install ultralytics
or, it can be downloaded from GitHub
# git clone https://github.com/ultralytics/ultralytics
# cd ultralytics
# pip install -e '.[dev]'
Training YOLOv8 on a Custom Dataset
As already mention this tutorial is about to guide you to train latest version of YOLOv8 on a custom dataset. For this purpose Rocket Dataset is used. You can download the dataset by
# git clone https://github.com/sunn-e/Google-Image-Downloader-Rocket-Dataset cd Google-Image-Downloader-Rocket-Dataset
In order to train the algorithm on this dataset dataset was labeled by using Label Studio

Dataset was exported in YOLO format

In order to prepare the dataset for training python split script is used
import os
import numpy as np
import shutil
from sklearn.model_selection import train_test_split
#Utility function to move images
def move_files_to_folder(list_of_files, destination_folder):
for f in list_of_files:
try:
shutil.move(f, destination_folder)
except:
print(f)
assert False
if __name__ == '__main__':
# Read images and annotations
images = [os.path.join('images', x) for x in os.listdir('images') if x[-3:] == "jpg"]
labels = [os.path.join('labels', x) for x in os.listdir('labels') if x[-3:] == "txt"]
print(f'Images size {len(images)}')
print(f'Labels size {len(labels)}')
images.sort()
labels.sort()
# Split the dataset into train-valid-test splits
train_images, val_images, train_labels, val_labels = train_test_split(images, labels, test_size = 0.2, random_state = 1)
val_images, test_images, val_labels, test_labels = train_test_split(val_images, val_labels, test_size = 0.5, random_state = 1)
# Move the splits into their folders
move_files_to_folder(train_images, 'images/train')
move_files_to_folder(val_images, 'images/val/')
move_files_to_folder(test_images, 'images/test/')
move_files_to_folder(train_labels, 'labels/train/')
move_files_to_folder(val_labels, 'labels/val/')
move_files_to_folder(test_labels, 'labels/test/')
This script will separate the images and labels in train, test and val subdirectories.
Copy the dataset (in my case rocket_dataset) into the root yolov8 directory.
Add ./ultralytics/yolo/data/datasets/rocket_dataset.yaml file with the following content
# Ultralytics YOLO 🚀, GPL-3.0 license
# Example usage: python train.py
# yolov8
# ├── ultralitics
# | └── yolo
# | └── data
# | └── datasets
# | └── rocket_dataset.yaml
# └── rocket_dataset
# ├── images
# └── labels
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../yolov8/rocket_dataset # dataset root dir
train: images/train # train images (relative to 'path') 128 images
test: images/test # test images (optional)
val: images/val # val images (relative to 'path') 128 images
# Classes
names:
0: Airplane
1: Drone
2: Helicopter
3: Rocket
Training can be triggered with the following python script
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="rocket_dataset.yaml", epochs=100, imgsz=640)
Validation can be triggered with the following python script
from ultralytics import YOLO
# Load a model
model = YOLO("./runs/detect/train16/weights/best.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.val()
And the last step (most interesting one), prediction and display of the result can be triggered with the following python script for video
from ultralytics import YOLO
# Load a model
model = YOLO("./runs/detect/train16/weights/best.pt") # load a pretrained model (recommended for training)
# Predict (detect objects) by using the model
results = model.predict(source="rocket_launch.mp4", show=True)
or, with the following script for image
from ultralytics import YOLO
from PIL import Image
# Load a model
model = YOLO("./runs/detect/train16/weights/best.pt") # load a pretrained model (recommended for training)
# Train the model
im = Image.open("rocket_launch.jpg")
results = model.predict(source=im, save=True)

Enjoy using the YOLOv8 model!