Object detection in WOD Lidar BEV image using YOLOv8

YOLO (You Only Look Once), is an object detection and image segmentation model. It is the latest version of YOLO model developed by Ultralytics.

WOD (Waymo Open Dataset) is an comprehensive self-driving dataset provided by Waymo, and it’s Perception dataset is composed of high resolution sensor data and labels for 2,030 scenes. The new WOD v2 format is based on the Apache Parquet column-oriented file format. This format separates the data into multiple tables, allowing selective download the portion of the dataset needed for specific use case. This modular format offers a significant advantage over the previous format by reducing the amount of data that needs to be downloaded and processed, saving time and resources.

This tutorial describe how to train YOLOv8 on WOD v2 dataset to detect objects in BEV(Bird’s eye view) image created from Lidar point cloud.

There are two options to train YOLOv8 on WOD v2 dataset, read the data and convert into YOLOv8 directory structure with images and labels sub-directories, or modify YOLOv8 code in order to directly read data from the WOD dataset and directly train the model.

In this tutorial option to modify YOLOv8 model is taken in order to directly read data in memory and train the model from that data.

First of all new WOD Perception dataset files need to be downloaded, and wod.yaml configuration file need to be created like

# Ultralytics YOLO 🚀, GPL-3.0 license
# Example usage: python train.py
# ultralitics
#     ├── ultralitics
#      |       └── cfg
#      |            └──datasets
#      |                  └── wod.yaml
#      └── datasets
#             └──wod
#                 ├── config
#                 └── data
#                 └── utils


path: wod/data  # dataset root dir
train: training # train wod subdirectory
test: training # test wod subdirectory
val: validation # val wod subdirectory


# Classes
names:
 0: none
 1: vehicle
 2: pedestrian
 3: sign
 4: cyclist


# Download script/URL (optional)
#download: https://ultralytics.com/assets/wod.zip

Modify

~/ultralytics/ultralytics/models/yolo/detect/__init__.py

script in order to add 'WodDetectionTrainer', 'WodDetectionValidator'.

# Ultralytics YOLO 🚀, AGPL-3.0 license

from .predict import DetectionPredictor
from .train import DetectionTrainer, WodDetectionTrainer
from .val import DetectionValidator, WodDetectionValidator

__all__ = 'DetectionPredictor', 'DetectionTrainer', 'DetectionValidator', 'WodDetectionTrainer', 'WodDetectionValidator'

Modify

~/ultralytics/ultralytics/data/__init__.py

script in order to add 'WodDataset' and 'build_wod_dataset'.

# Ultralytics YOLO 🚀, AGPL-3.0 license

from .base import BaseDataset
from .build import build_dataloader, build_yolo_dataset, build_wod_dataset, load_inference_source
from .dataset import ClassificationDataset, SemanticDataset, YOLODataset, WodDataset

__all__ = ('BaseDataset', 'ClassificationDataset', 'SemanticDataset', 'YOLODataset', 'WodDataset', 'build_yolo_dataset',
           'build_wod_dataset',
           'build_dataloader', 'load_inference_source')

Modify

~/ultralytics/ultralytics/models/yolo/model.py

script file in order to modify ‘detect’ configuration

'detect': {
   'model': DetectionModel,
   'trainer': yolo.detect.WodDetectionTrainer,
   'validator': yolo.detect.DetectionValidator,
   'predictor': yolo.detect.DetectionPredictor, },

In order to add WOD dataset builder

~/ultralytics/ultralytics/data/build.py

script need to be modified

def build_wod_dataset(cfg, img_path, batch, data, mode='train', rect=False, stride=32):
   """Build WOD Dataset"""
   return WodDataset(
       mode=mode,
       img_path=img_path,
       imgsz=cfg.imgsz,
       batch_size=batch,
       # augment=mode == 'train',  # augmentation
       augment=False,  # augmentation
       hyp=cfg,  # TODO: probably add a get_hyps_from_cfg function
       rect=cfg.rect or rect,  # rectangular batches
       cache=cfg.cache or None,
       single_cls=cfg.single_cls or False,
       stride=int(stride),
       pad=0.0 if mode == 'train' else 0.5,
       prefix=colorstr(f'{mode}: '),
       use_segments=cfg.task == 'segment',
       use_keypoints=cfg.task == 'pose',
       classes=cfg.classes,
       data=data,
       fraction=cfg.fraction if mode == 'train' else 1.0)

WOD dataset implementation is added into the

~/ultralytics/ultralytics/data/dataset.py

script file

class WodDataset(BaseDataset):
   """
   Dataset class for loading object detection and/or segmentation labels in YOLO format.

   Args:
       data (dict, optional): A dataset YAML dictionary. Defaults to None.
       use_segments (bool, optional): If True, segmentation masks are used as labels. Defaults to False.
       use_keypoints (bool, optional): If True, keypoints are used as labels. Defaults to False.

   Returns:
       (torch.utils.data.Dataset): A PyTorch dataset object that can be used for training an object detection model.
   """
   cache_version = '1.0.2'  # dataset labels *.cache version, >= 1.0.0 for YOLOv8
   rand_interp_methods = [cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4]

   def __init__(self, *args, data=None, use_segments=False, use_keypoints=False, mode='', **kwargs):
       self.cfg = config.load()
       self.use_segments = use_segments
       self.use_keypoints = use_keypoints
       self.data = data
       # self.dataset_dir = str(data.get('path')) + "/training"
       self.dataset_dir = str(data.get(mode))
       self.context_names = self.get_context_names(self.dataset_dir)
       self.laser_name = 1
       assert not (self.use_segments and self.use_keypoints), 'Can not use both segments and keypoints.'
       super().__init__(*args, **kwargs)

   def get_context_names(self, dataset_dir):
       # return [os.path.splitext(os.path.basename(name))[0][len("training_lidar_"):]
       #         for name in glob.glob(dataset_dir + "/lidar/*.*")]
       return [os.path.splitext(os.path.basename(name))[0] for name in glob.glob(dataset_dir + "/lidar/*.*")]

   def get_img_files(self, dataset_dir):
       """Read lidar file names"""
       try:
           context_names = []  # image files
           im_files = []
           for p in dataset_dir if isinstance(dataset_dir, list) else [dataset_dir]:
               p = Path(p)  # os-agnostic
               if p.is_dir():  # dir
                   # context_names = [os.path.splitext(os.path.basename(name))[0][len("training_lidar_"):] for name in
                   #      glob.glob(str(p / 'lidar' / '*.*'), recursive=True)]
                   context_names = [os.path.splitext(os.path.basename(name))[0] for name in
                                    glob.glob(str(p / 'lidar' / '*.*'), recursive=True)]
               else:
                   raise FileNotFoundError(f'{self.prefix}{p} does not exist')
           ##
           for context_name in context_names:
               lidar_lidar_box_df = wod_reader.read_lidar_lidar_box_df(dataset_dir, context_name, self.laser_name)

               for i, (_, r) in enumerate(lidar_lidar_box_df.iterrows()):
                   lidar = v2.LiDARComponent.from_dict(r)
                   im_files.append(lidar.key.segment_context_name + "#" + str(lidar.key.laser_name) + "#" + str(
                       lidar.key.frame_timestamp_micros))
           ##
           assert im_files, f'{self.prefix}No images found'
       except Exception as e:
           raise FileNotFoundError(f'{self.prefix}Error loading data from {dataset_dir}/lidar\n{HELP_URL}') from e
       if self.fraction < 1:
           im_files = im_files[:round(len(im_files) * self.fraction)]
       return im_files

   def cache_images(self, cache):
       """Cache images to memory or disk."""
       b, gb = 0, 1 << 30  # bytes of cached images, bytes per gigabytes
       with ThreadPool(NUM_THREADS) as pool:
           total = len(self.context_names)
           pbar = TQDM(self.context_names, total=total, disable=LOCAL_RANK > 0)
           idx = 0
           for context_name in pbar:
               lidar_lidar_box_df = wod_reader.read_lidar_lidar_box_df(self.dataset_dir, context_name, self.laser_name)
               lidar_calibration_df = wod_reader.read_lidar_calibration_df(self.dataset_dir, context_name,
                                                                           self.laser_name)
               lidar_pose_df = wod_reader.read_lidar_pose_df(self.dataset_dir, context_name, self.laser_name)
               vehicle_pose_df = wod_reader.read_vehicle_pose_df(self.dataset_dir, context_name)

               df = lidar_lidar_box_df.merge(lidar_calibration_df)
               df = v2.merge(df, lidar_pose_df)
               df = v2.merge(df, vehicle_pose_df)

               for i, (_, r) in enumerate(df.iterrows()):
                   lidar = v2.LiDARComponent.from_dict(r)
                   lidar_calibration = v2.LiDARCalibrationComponent.from_dict(r)
                   lidar_pose = v2.LiDARPoseComponent.from_dict(r)
                   vehicle_pose = v2.VehiclePoseComponent.from_dict(r)

                   pcl = _lidar_utils.convert_range_image_to_point_cloud(lidar.range_image_return1, lidar_calibration,
                                                                         lidar_pose.range_image_return1, vehicle_pose)
                   bev_img = self.pcl_to_bev(pcl)
                   bev_img = cv2.rotate(bev_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
                   # img = Image.fromarray(bev_img).convert('RGB')

                   if cache == 'disk':
                       b += self.npy_files[i].stat().st_size
                   else:  # 'ram'
                       self.ims[idx], self.im_hw0[idx], self.im_hw[idx] = bev_img, (640, 640), (
                           640, 640)  # im, hw_orig, hw_resized = load_image(self, i)
                       b += self.ims[idx].nbytes
                   idx += 1
                   pbar.desc = f'{self.prefix}Caching images ({b / gb:.1f}GB {cache})'
           pbar.close()

   def check_cache_ram(self, safety_margin=0.5):
       """Check image caching requirements vs available memory."""
       b, gb = 0, 1 << 30  # bytes of cached images, bytes per gigabytes
       n = min(self.ni, 30)  # extrapolate from 30 random images
       for _ in range(n):
           ratio = self.imgsz / max(640, 640)  # max(h, w)  # ratio
           b += 50 * 1024 * ratio ** 2
       mem_required = b * self.ni / n * (1 + safety_margin)  # GB required to cache dataset into RAM
       mem = psutil.virtual_memory()
       cache = mem_required < mem.available  # to cache or not to cache, that is the question
       if not cache:
           LOGGER.info(f'{self.prefix}{mem_required / gb:.1f}GB RAM required to cache images '
                       f'with {int(safety_margin * 100)}% safety margin but only '
                       f'{mem.available / gb:.1f}/{mem.total / gb:.1f}GB available, '
                       f"{'caching images ✅' if cache else 'not caching images ⚠️'}")
       return cache

   def pcl_to_bev(self, pcl):
       pcl_npa = pcl.numpy()
       mask = np.where((pcl_npa[:, 0] >= self.cfg.range_x[0]) & (pcl_npa[:, 0] <= self.cfg.range_x[1]) &
                       (pcl_npa[:, 1] >= self.cfg.range_y[0]) & (pcl_npa[:, 1] <= self.cfg.range_y[1]) &
                       (pcl_npa[:, 2] >= self.cfg.range_z[0]) & (pcl_npa[:, 2] <= self.cfg.range_z[1]))
       pcl_npa = pcl_npa[mask]

       # compute bev-map discretization by dividing x-range by the bev-image height
       bev_discrete = (self.cfg.range_x[1] - self.cfg.range_x[0]) / self.cfg.bev_height

       # create a copy of the lidar pcl and transform all metrix x-coordinates into bev-image coordinates
       pcl_cpy = np.copy(pcl_npa)
       pcl_cpy[:, 0] = np.int_(np.floor(pcl_cpy[:, 0] / bev_discrete))

       # transform all metrix y-coordinates as well but center the forward-facing x-axis in the middle of the image
       pcl_cpy[:, 1] = np.int_(np.floor(pcl_cpy[:, 1] / bev_discrete) + (self.cfg.bev_width + 1) / 2)

       # shift level of ground plane to avoid flipping from 0 to 255 for neighboring pixels
       pcl_cpy[:, 2] = pcl_cpy[:, 2] - self.cfg.range_z[0]

       # re-arrange elements in lidar_pcl_cpy by sorting first by x, then y, then by decreasing height
       idx_height = np.lexsort((-pcl_cpy[:, 2], pcl_cpy[:, 1], pcl_cpy[:, 0]))
       lidar_pcl_hei = pcl_cpy[idx_height]

       # extract all points with identical x and y such that only the top-most z-coordinate is kept (use numpy.unique)
       _, idx_height_unique = np.unique(lidar_pcl_hei[:, 0:2], axis=0, return_index=True)
       lidar_pcl_hei = lidar_pcl_hei[idx_height_unique]

       # assign the height value of each unique entry in lidar_top_pcl to the height map and
       # make sure that each entry is normalized on the difference between the upper and lower height defined in the config file
       height_map = np.zeros((self.cfg.bev_height + 1, self.cfg.bev_width + 1))
       height_map[np.int_(lidar_pcl_hei[:, 0]), np.int_(lidar_pcl_hei[:, 1])] = lidar_pcl_hei[:, 2] / float(
           np.abs(self.cfg.range_z[1] - self.cfg.range_z[0]))

       # sort points such that in case of identical BEV grid coordinates, the points in each grid cell are arranged based on their intensity
       pcl_cpy[pcl_cpy[:, 2] > 1.0, 2] = 1.0
       idx_intensity = np.lexsort((-pcl_cpy[:, 2], pcl_cpy[:, 1], pcl_cpy[:, 0]))
       pcl_cpy = pcl_cpy[idx_intensity]

       # only keep one point per grid cell
       _, indices = np.unique(pcl_cpy[:, 0:2], axis=0, return_index=True)
       lidar_pcl_int = pcl_cpy[indices]

       # create the intensity map
       intensity_map = np.zeros((self.cfg.bev_height + 1, self.cfg.bev_width + 1))
       intensity_map[np.int_(lidar_pcl_int[:, 0]), np.int_(lidar_pcl_int[:, 1])] = lidar_pcl_int[:, 2] / (
               np.amax(lidar_pcl_int[:, 2]) - np.amin(lidar_pcl_int[:, 2]))

       # Compute density layer of the BEV map
       density_map = np.zeros((self.cfg.bev_height + 1, self.cfg.bev_width + 1))
       _, _, counts = np.unique(lidar_pcl_int[:, 0:2], axis=0, return_index=True, return_counts=True)
       normalized_counts = np.minimum(1.0, np.log(counts + 1) / np.log(64))
       density_map[np.int_(lidar_pcl_int[:, 0]), np.int_(lidar_pcl_int[:, 1])] = normalized_counts

       bev_map = np.zeros((3, self.cfg.bev_height, self.cfg.bev_width))
       bev_map[2, :, :] = density_map[:self.cfg.bev_height, :self.cfg.bev_width]  # r_map
       bev_map[1, :, :] = height_map[:self.cfg.bev_height, :self.cfg.bev_width]  # g_map
       bev_map[0, :, :] = intensity_map[:self.cfg.bev_height, :self.cfg.bev_width]  # b_map

       bev_map = (np.transpose(bev_map, (1, 2, 0)) * 255).astype(np.uint8)

       return bev_map

   def crop_bbox(self, x, y, size_x, size_y):
       if x - size_x / 2 < 0:
           if x < 0:
               size_x = x + size_x / 2
           if x > 0:
               size_x = (size_x / 2 - x) + size_x / 2
           x = size_x / 2
       if x + size_x / 2 > self.cfg.bev_width:
           if x < self.cfg.bev_width:
               size_x = (self.cfg.bev_width - x) + size_x / 2
           if x > self.cfg.bev_width:
               size_x = size_x / 2 - (x - self.cfg.bev_width)
           x = self.cfg.bev_width - size_x / 2
       if y - size_y / 2 < 0:
           if y < 0:
               size_y = y + size_y / 2
           if y > 0:
               size_y = (size_y / 2 - y) + size_y / 2
           y = size_y / 2
       if y + size_y / 2 > self.cfg.bev_height:
           if y < self.cfg.bev_height:
               size_y = (self.cfg.bev_height - y) + size_y / 2
           if y > self.cfg.bev_height:
               size_y = size_y / 2 - (y - self.cfg.bev_height)
           y = self.cfg.bev_height - size_y / 2

       return x, y, size_x, size_y

   def create_label(self, lidar, lidar_box):
       discrete = (self.cfg.range_x[1] - self.cfg.range_x[0]) / self.cfg.bev_width
       bboxes = []
       clss = []

       for i, (object_id, object_type, x, size_x, y, size_y, yaw) in enumerate(zip(
               lidar_box.key.laser_object_id, lidar_box.type, lidar_box.box.center.x, lidar_box.box.size.x,
               lidar_box.box.center.y,
               lidar_box.box.size.y, lidar_box.box.heading
       )):
           x = x / discrete
           y = (-y / discrete) + self.cfg.bev_width / 2

           size_x = size_x / discrete
           size_y = size_y / discrete

           if ((x + size_x / 2) < 0 or (x - size_x / 2) > self.cfg.bev_width) or (
                   (y + + size_y / 2) < 0 or (y - size_y / 2) > self.cfg.bev_height):
               continue

           x, y, size_x, size_y = self.crop_bbox(x, y, size_x, size_y)

           bboxes.append([x / self.cfg.bev_width, y / self.cfg.bev_height, size_x / self.cfg.bev_width,
                          size_y / self.cfg.bev_height])
           clss.append([object_type])

       return dict(
           im_file=lidar.key.segment_context_name + "#" + str(lidar.key.laser_name) + "#" + str(
               lidar.key.frame_timestamp_micros),
           shape=(self.cfg.bev_width, self.cfg.bev_height),
           ori_shape=(self.cfg.bev_width, self.cfg.bev_height),
           resized_shape=(self.cfg.bev_width, self.cfg.bev_height),
           cls=np.array(clss) if len(clss) != 0 else np.array([[]]).reshape((0, 1)),  # n, 1
           bboxes=np.array(bboxes) if len(bboxes) != 0 else np.array([[]]).reshape((0, 4)),  # n, 4
           segments=[],
           # keypoints=None,
           normalized=True,
           bbox_format='xywh')

   def cache_labels(self, path=Path('./labels.cache')):
       """Cache dataset labels, check images and read shapes.
       Args:
           path (Path): path where to save the cache file (default: Path('./labels.cache')).
       Returns:
           (dict): labels.
       """
       x = {'labels': []}
       nm, nf, ne, nc, msgs = 0, 0, 0, 0, []  # number missing, found, empty, corrupt, messages
       desc = f'{self.prefix}Scanning {path.parent}/lidar...'
       total = len(self.context_names)
       nkpt, ndim = self.data.get('kpt_shape', (0, 0))
       if self.use_keypoints and (nkpt <= 0 or ndim not in (2, 3)):
           raise ValueError("'kpt_shape' in data.yaml missing or incorrect. Should be a list with [number of "
                            "keypoints, number of dims (2 for x,y or 3 for x,y,visible)], i.e. 'kpt_shape: [17, 3]'")
       with ThreadPool(NUM_THREADS) as pool:
           pbar = TQDM(self.context_names, desc=desc, total=total)
           for context_name in pbar:
               lidar_lidar_box_df = wod_reader.read_lidar_lidar_box_df(self.dataset_dir, context_name, self.laser_name)
               lidar_calibration_df = wod_reader.read_lidar_calibration_df(self.dataset_dir, context_name,
                                                                           self.laser_name)
               lidar_pose_df = wod_reader.read_lidar_pose_df(self.dataset_dir, context_name, self.laser_name)
               vehicle_pose_df = wod_reader.read_vehicle_pose_df(self.dataset_dir, context_name)

               df = lidar_lidar_box_df.merge(lidar_calibration_df)
               df = v2.merge(df, lidar_pose_df)
               df = v2.merge(df, vehicle_pose_df)

               for i, (_, r) in enumerate(df.iterrows()):
                   lidar = v2.LiDARComponent.from_dict(r)
                   lidar_box = v2.LiDARBoxComponent.from_dict(r)

                   nf += 1
                   x['labels'].append(self.create_label(lidar, lidar_box))
               nm = 0
               ne = 0
               nc = 0
               msg = ''
               if msg:
                   msgs.append(msg)
               pbar.desc = f'{desc} {nf} images, {nm + ne} backgrounds, {nc} corrupt'
           pbar.close()

       if msgs:
           LOGGER.info('\n'.join(msgs))
       if nf == 0:
           LOGGER.warning(f'{self.prefix}WARNING ⚠️ No labels found in {path}. {HELP_URL}')
       x['hash'] = get_hash(self.im_files)
       x['results'] = nf, nm, ne, nc, len(self.context_names)
       x['msgs'] = msgs  # warnings
       x['version'] = self.cache_version  # cache version
       if is_dir_writeable(path.parent):
           if path.exists():
               path.unlink()  # remove *.cache file if exists
           np.save(str(path), x)  # save cache for next time
           path.with_suffix('.cache.npy').rename(path)  # remove .npy suffix
           LOGGER.info(f'{self.prefix}New cache created: {path}')
       else:
           LOGGER.warning(f'{self.prefix}WARNING ⚠️ Cache directory {path.parent} is not writeable, cache not saved.')
       return x

   def get_labels(self):
       """Returns dictionary of labels for WOD training."""
       # self.label_files = img2label_paths(self.im_files)
       # cache_path = Path(str(self.data.get('path')) + "/training/labels").with_suffix('.cache')
       cache_path = Path(str(self.dataset_dir) + "/labels").with_suffix('.cache')
       try:
           import gc
           gc.disable()  # reduce pickle load time https://github.com/ultralytics/ultralytics/pull/1585
           cache, exists = np.load(str(cache_path), allow_pickle=True).item(), True  # load dict
           gc.enable()
           assert cache['version'] == self.cache_version  # matches current version
           assert cache['hash'] == get_hash(self.im_files)  # identical hash
       except (FileNotFoundError, AssertionError, AttributeError):
           cache, exists = self.cache_labels(cache_path), False  # run cache ops

       # Display cache
       nf, nm, ne, nc, n = cache.pop('results')  # found, missing, empty, corrupt, total
       if exists and LOCAL_RANK in (-1, 0):
           d = f'Scanning {cache_path}... {nf} images, {nm + ne} backgrounds, {nc} corrupt'
           TQDM(None, desc=self.prefix + d, total=n, initial=n)  # display cache results
           if cache['msgs']:
               LOGGER.info('\n'.join(cache['msgs']))  # display warnings
       if nf == 0:  # number of labels found
           raise FileNotFoundError(f'{self.prefix}No labels found in {cache_path}, can not start training. {HELP_URL}')

       # Read cache
       [cache.pop(k) for k in ('hash', 'version', 'msgs')]  # remove items
       labels = cache['labels']
       self.im_files = [lb['im_file'] for lb in labels]  # update im_files

       # Check if the dataset is all boxes or all segments
       lengths = ((len(lb['cls']), len(lb['bboxes']), len(lb['segments'])) for lb in labels)
       len_cls, len_boxes, len_segments = (sum(x) for x in zip(*lengths))
       if len_segments and len_boxes != len_segments:
           LOGGER.warning(
               f'WARNING ⚠️ Box and segment counts should be equal, but got len(segments) = {len_segments}, '
               f'len(boxes) = {len_boxes}. To resolve this only boxes will be used and all segments will be removed. '
               'To avoid this please supply either a detect or segment dataset, not a detect-segment mixed dataset.')
           for lb in labels:
               lb['segments'] = []
       if len_cls == 0:
           raise ValueError(f'All labels empty in {cache_path}, can not start training without labels. {HELP_URL}')
       return labels

   def build_transforms(self, hyp=None):
       """Builds and appends transforms to the list."""
       if self.augment:
           hyp.mosaic = hyp.mosaic if self.augment and not self.rect else 0.0
           hyp.mixup = hyp.mixup if self.augment and not self.rect else 0.0
           transforms = v8_transforms(self, self.imgsz, hyp)
       else:
           transforms = Compose([LetterBox(new_shape=(self.imgsz, self.imgsz), scaleup=False)])
       transforms.append(
           Format(bbox_format='xywh',
                  normalize=True,
                  return_mask=self.use_segments,
                  return_keypoint=self.use_keypoints,
                  batch_idx=True,
                  mask_ratio=hyp.mask_ratio,
                  mask_overlap=hyp.overlap_mask))
       return transforms

   def close_mosaic(self, hyp):
       """Sets mosaic, copy_paste and mixup options to 0.0 and builds transformations."""
       hyp.mosaic = 0.0  # set mosaic ratio=0.0
       hyp.copy_paste = 0.0  # keep the same behavior as previous v8 close-mosaic
       hyp.mixup = 0.0  # keep the same behavior as previous v8 close-mosaic
       self.transforms = self.build_transforms(hyp)

   def update_labels_info(self, label):
       """custom your label format here."""
       # NOTE: cls is not with bboxes now, classification and semantic segmentation need an independent cls label
       # we can make it also support classification and semantic segmentation by add or remove some dict keys there.
       bboxes = label.pop('bboxes')
       if bboxes.size == 0:
           np.reshape(bboxes, (0, 4))
       segments = label.pop('segments')
       keypoints = label.pop('keypoints', None)
       bbox_format = label.pop('bbox_format')
       normalized = label.pop('normalized')
       # label['instances'] = Instances(bboxes, segments, keypoints, bbox_format=bbox_format, normalized=normalized)
       label['instances'] = Instances(bboxes, segments, keypoints, bbox_format=bbox_format, normalized=normalized)
       return label

   def set_rectangle(self):
       """Sets the shape of bounding boxes for YOLO detections as rectangles."""
       bi = np.floor(np.arange(self.ni) / self.batch_size).astype(int)  # batch index
       nb = bi[-1] + 1  # number of batches

       s = np.array([x.pop('shape') for x in self.labels])  # hw
       ar = s[:, 0] / s[:, 1]  # aspect ratio
       irect = ar.argsort()
       # self.im_files = [self.im_files[i] for i in irect]
       self.labels = [self.labels[i] for i in irect]
       ar = ar[irect]

       # Set training image shapes
       shapes = [[1, 1]] * nb
       for i in range(nb):
           ari = ar[bi == i]
           mini, maxi = ari.min(), ari.max()
           if maxi < 1:
               shapes[i] = [maxi, 1]
           elif mini > 1:
               shapes[i] = [1, 1 / mini]

       self.batch_shapes = np.ceil(np.array(shapes) * self.imgsz / self.stride + self.pad).astype(int) * self.stride
       self.batch = bi  # batch index of image

   @staticmethod
   def collate_fn(batch):
       """Collates data samples into batches."""
       new_batch = {}
       keys = batch[0].keys()
       values = list(zip(*[list(b.values()) for b in batch]))
       for i, k in enumerate(keys):
           value = values[i]
           if k == 'img':
               value = torch.stack(value, 0)
           if k in ['masks', 'keypoints', 'bboxes', 'cls']:
               value = torch.cat(value, 0)
           new_batch[k] = value
       new_batch['batch_idx'] = list(new_batch['batch_idx'])
       for i in range(len(new_batch['batch_idx'])):
           new_batch['batch_idx'][i] += i  # add target image index for build_targets()
       new_batch['batch_idx'] = torch.cat(new_batch['batch_idx'], 0)
       return new_batch

In WodDataset class there is implementation for image loading, label loading, conversion of point cloud to BEV (Bird's Eye View) image.

Following utility functions can be used to load data form WOD dask dataframes

import tensorflow as tf
import dask.dataframe as dd
from waymo_open_dataset import v2


def read_df(dataset_dir: str, context_name: str, tag: str) -> dd.DataFrame:
   """Creates a Dask DataFrame for the component specified by its tag."""
   # paths = tf.io.gfile.glob(f'{dataset_dir}/{tag}/training_{tag}_{context_name}.parquet')
   paths = tf.io.gfile.glob(f'{dataset_dir}/{tag}/{context_name}.parquet')
   return dd.read_parquet(paths)


def read_cam_img_cam_box_df(dataset_dir: str, context_name: str, camera_name: int):
   cam_img_df = read_df(dataset_dir, context_name, 'camera_image')
   cam_box_df = read_df(dataset_dir, context_name, 'camera_box')

   # Join all DataFrames using matching columns
   cam_img_df = cam_img_df[cam_img_df['key.camera_name'] == camera_name]
   cam_img_cam_box_df = v2.merge(cam_img_df, cam_box_df, right_group=True)

   return cam_img_cam_box_df


def read_lidar_df(dataset_dir: str, context_name: str, laser_name: int):
   lidar_df = read_df(dataset_dir, context_name, 'lidar')
   lidar_df = lidar_df[lidar_df['key.laser_name'] == laser_name]

   return lidar_df


def read_lidar_lidar_box_df(dataset_dir: str, context_name: str, laser_name: int):
   lidar_df = read_df(dataset_dir, context_name, 'lidar')
   lidar_box_df = read_df(dataset_dir, context_name, 'lidar_box')

   # Join all DataFrames using matching columns
   lidar_df = lidar_df[lidar_df['key.laser_name'] == laser_name]
   lidar_lidar_box_df = v2.merge(lidar_df, lidar_box_df, right_group=True)

   return lidar_lidar_box_df


def read_lidar_calibration_df(dataset_dir: str, context_name: str, laser_name: int):
   lidar_calibration_df = read_df(dataset_dir, context_name, 'lidar_calibration')
   lidar_calibration_df = lidar_calibration_df[lidar_calibration_df['key.laser_name'] == laser_name]

   return lidar_calibration_df


def read_lidar_pose_df(dataset_dir: str, context_name: str, laser_name: int):
   lidar_pose_df = read_df(dataset_dir, context_name, 'lidar_pose')
   lidar_pose_df = lidar_pose_df[lidar_pose_df['key.laser_name'] == laser_name]

   return lidar_pose_df


def read_vehicle_pose_df(dataset_dir: str, context_name: str):
   vehicle_pose_df = read_df(dataset_dir, context_name, 'vehicle_pose')

   return vehicle_pose_df

Configuration for range, width and height of the BEV image can be loaded from configuration file config.py

from easydict import EasyDict as edict


def load():
   config = edict()

   config.range_x = [0, 50]
   config.range_y = [-25, 25]
   config.range_z = [-1, 3]
   config.bev_width = 640
   config.bev_height = 640

   return config

Modify

~ultralytics/ultralytics/models/yolo/detect/train.py

to add WOD trainer

class WodDetectionTrainer(DetectionTrainer):

   def build_dataset(self, img_path, mode='train', batch=None):
       """Build YOLO Dataset

       Args:
           img_path (str): Path to the folder containing images.
           mode (str): `train` mode or `val` mode, users are able to customize different augmentations for each mode.
           batch (int, optional): Size of batches, this is for `rect`. Defaults to None.
       """
       gs = max(int(de_parallel(self.model).stride.max() if self.model else 0), 32)
       return build_wod_dataset(self.args, img_path, batch, self.data, mode=mode, rect=mode == 'val', stride=gs)

   def get_validator(self):
       """Returns a DetectionValidator for YOLO model validation."""
       self.loss_names = 'box_loss', 'cls_loss', 'dfl_loss'
       return yolo.detect.WodDetectionValidator(self.test_loader, save_dir=self.save_dir, args=copy(self.args))

   def plot_training_labels(self):
       """Create a labeled training plot of the YOLO model."""
       boxes = np.concatenate([lb['bboxes'] for lb in self.train_loader.dataset.labels], 0)
       cls = np.concatenate([lb['cls'] for lb in self.train_loader.dataset.labels], 0)
       plot_labels(boxes, cls.squeeze(), names=self.data['names'], save_dir=self.save_dir, on_plot=self.on_plot)

and modify

~ultralytics/ultralytics/models/yolo/detect/val.py
class WodDetectionValidator(DetectionValidator):

   def build_dataset(self, img_path, mode='val', batch=None):
       """
       Build YOLO Dataset.

       Args:
           img_path (str): Path to the folder containing images.
           mode (str): `train` mode or `val` mode, users are able to customize different augmentations for each mode.
           batch (int, optional): Size of batches, this is for `rect`. Defaults to None.
       """
       gs = max(int(de_parallel(self.model).stride if self.model else 0), 32)
       return build_wod_dataset(self.args, img_path, batch, self.data, mode=mode, stride=gs)

By this Ultralytics YOLOv8 code should be ready to read data from WOD Perception dataset, convert Lidar point cloud to BEV image and train the model to detect objects into BEV images.

Before to start training install required packages

tensorflow
dask
pandas
# export PATH=$PATH:/usr/sbin # in case problem to install OpenEXR
OpenEXR
waymo-open-dataset-tf-2-11-0
easydict

and set some environment variables to enable gradually consume memory

# export PYTHONUNBUFFERED=1
# export TF_FORCE_GPU_ALLOW_GROWTH=true

Training can be started via the following train_wod.py script

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="wod.yaml", epochs=30, imgsz=640, batch=32, cache='ram', augment=False)  # device=[0, 1]

And detection can be tested via detect.py

from PIL import Image
from ultralytics import YOLO

# Load a pretrained YOLOv8n model
model = YOLO('~/ultralytics/runs/detect/train35/weights/best.pt')

# Run inference on 'bus.jpg'
results = model('~/ultralytics/img.png')  # results list

# Show the results
for r in results:
   im_array = r.plot()  # plot a BGR numpy array of predictions
   im = Image.fromarray(im_array[..., ::-1])  # RGB PIL image
   im.show()  # show image

That should be pretty much all in order to be able to train a model from WOD dataset. Complete implementation can be find in the GitHub repository.

References

Ultralytics YOLOv8 doc

Waymo Open Dataset

WOD v2 tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*
*