Object Detection with Depth Estimation

Home Projects Object Detection with Depth Estimation

Project Overview

This project develops a cost-effective and lightweight framework for object detection and depth estimation using a single camera. By integrating YOLOv5 for object detection and a neural network for distance estimation, the system achieves robust real-time performance. This approach eliminates the need for expensive multi-sensor setups like LiDAR, making it ideal for low-cost autonomous systems.

The framework leverages the KITTI dataset, a benchmark in autonomous driving research. YOLOv5 detects and classifies objects in real-time, while a multi-layer perceptron (MLP) predicts object distances using bounding box coordinates. To improve robustness, the system was trained with transformed images simulating adverse weather conditions, such as rain and fog.

Key Features and Contributions

Seamless integration of object detection and depth estimation into a single-camera setup.
Enhanced performance in challenging conditions through training with augmented datasets.
Adaptation of YOLOv5 to display both object classes and distances on bounding boxes.
Real-time processing capabilities suitable for dynamic environments.

Technical Approach

The system combines state-of-the-art object detection with a custom depth estimation network, providing a comprehensive perception solution with minimal hardware requirements.

Object Detection

The YOLOv5 model identifies objects and provides bounding box coordinates and class predictions. This state-of-the-art algorithm ensures fast and accurate detection even in complex scenes with multiple objects.

Detection Pipeline

Real-time frame processing from camera feed
Multi-class object detection capability
Tracking consistency across video frames
Confidence filtering to reduce false positives

Distance Estimation

Custom MLP neural network architecture
Five fully connected layers with batch normalization
Dropout regularization to prevent overfitting
Bounding box geometric features as input

Environmental Robustness

Training on augmented datasets with added noise and blur enhances the system's ability to handle adverse conditions, such as fog or low-light scenarios. This was achieved through:

Data Augmentation: Random transformations including brightness variation, noise injection, blur, and simulated weather effects
Domain Adaptation: Training with synthetic and real-world data to improve generalization
Validation in Diverse Conditions: Testing across various lighting and weather scenarios to ensure consistent performance

Performance Evaluation

The system was evaluated on the KITTI validation dataset, showing promising results:

Metric	YOLOv5 + MLP	Stereo Camera System	LiDAR Reference
Mean Distance Error	0.61m	0.42m	0.05m
Processing Speed	24 FPS	18 FPS	10 FPS
Hardware Cost	Low	Medium	High
Accuracy in Adverse Weather	83%	76%	72%

While the system doesn't match the precision of LiDAR, it offers a compelling balance between accuracy, speed, and cost, making it suitable for many practical applications where extreme precision isn't critical.

Potential Applications

The framework's combination of lightweight design and robust performance makes it suitable for various applications:

Autonomous Vehicles

Cost-effective perception system for safe navigation and obstacle avoidance in budget-friendly autonomous or semi-autonomous vehicles.

Robotics

Efficient perception system for resource-constrained robots operating in dynamic environments where power and computational resources are limited.

Surveillance Systems

Enhanced monitoring capabilities with distance estimation to detect intrusions and track objects across various weather and lighting conditions.

Industrial Automation

Quality control and safety systems requiring spatial awareness without the expense of multiple sensors or complex calibration procedures.

Technologies Used

Python

PyTorch

YOLOv5

OpenCV

KITTI Dataset

NumPy

Conclusion

This framework demonstrates how a simple yet effective design can address complex perception challenges, paving the way for scalable and cost-efficient solutions in robotics and autonomous systems. By combining state-of-the-art object detection with custom depth estimation, the system provides a comprehensive environmental understanding using only a single camera, making advanced perception accessible to a wider range of applications where budget or resource constraints previously limited capabilities.

Project Information

Category: Computer Vision, Deep Learning
Duration: 3 months
Completed: 2023
Institution: University at Buffalo

View on GitHub

Interested in this project?

If you're interested in learning more about our object detection and depth estimation framework or exploring potential applications in computer vision, please get in touch.

Contact Me