L o a d i n g

Project Overview

This project develops a cost-effective and lightweight framework for object detection and depth estimation using a single camera. By integrating YOLOv5 for object detection and a neural network for distance estimation, the system achieves robust real-time performance. This approach eliminates the need for expensive multi-sensor setups like LiDAR, making it ideal for low-cost autonomous systems.

The framework leverages the KITTI dataset, a benchmark in autonomous driving research. YOLOv5 detects and classifies objects in real-time, while a multi-layer perceptron (MLP) predicts object distances using bounding box coordinates. To improve robustness, the system was trained with transformed images simulating adverse weather conditions, such as rain and fog.

Key Features and Contributions

  • Seamless integration of object detection and depth estimation into a single-camera setup.
  • Enhanced performance in challenging conditions through training with augmented datasets.
  • Adaptation of YOLOv5 to display both object classes and distances on bounding boxes.
  • Real-time processing capabilities suitable for dynamic environments.

Technical Approach

The system combines state-of-the-art object detection with a custom depth estimation network, providing a comprehensive perception solution with minimal hardware requirements.

Object Detection

The YOLOv5 model identifies objects and provides bounding box coordinates and class predictions. This state-of-the-art algorithm ensures fast and accurate detection even in complex scenes with multiple objects.

Detection Pipeline
  • Real-time frame processing from camera feed
  • Multi-class object detection capability
  • Tracking consistency across video frames
  • Confidence filtering to reduce false positives
Distance Estimation
  • Custom MLP neural network architecture
  • Five fully connected layers with batch normalization
  • Dropout regularization to prevent overfitting
  • Bounding box geometric features as input

Environmental Robustness

Training on augmented datasets with added noise and blur enhances the system's ability to handle adverse conditions, such as fog or low-light scenarios. This was achieved through:

  • Data Augmentation: Random transformations including brightness variation, noise injection, blur, and simulated weather effects
  • Domain Adaptation: Training with synthetic and real-world data to improve generalization
  • Validation in Diverse Conditions: Testing across various lighting and weather scenarios to ensure consistent performance

Performance Evaluation

The system was evaluated on the KITTI validation dataset, showing promising results:

Metric YOLOv5 + MLP Stereo Camera System LiDAR Reference
Mean Distance Error 0.61m 0.42m 0.05m
Processing Speed 24 FPS 18 FPS 10 FPS
Hardware Cost Low Medium High
Accuracy in Adverse Weather 83% 76% 72%

While the system doesn't match the precision of LiDAR, it offers a compelling balance between accuracy, speed, and cost, making it suitable for many practical applications where extreme precision isn't critical.

Potential Applications

The framework's combination of lightweight design and robust performance makes it suitable for various applications:

Autonomous Vehicles

Cost-effective perception system for safe navigation and obstacle avoidance in budget-friendly autonomous or semi-autonomous vehicles.

Robotics

Efficient perception system for resource-constrained robots operating in dynamic environments where power and computational resources are limited.

Surveillance Systems

Enhanced monitoring capabilities with distance estimation to detect intrusions and track objects across various weather and lighting conditions.

Industrial Automation

Quality control and safety systems requiring spatial awareness without the expense of multiple sensors or complex calibration procedures.

Technologies Used

Python
PyTorch
YOLOv5
OpenCV
KITTI Dataset
NumPy

Conclusion

This framework demonstrates how a simple yet effective design can address complex perception challenges, paving the way for scalable and cost-efficient solutions in robotics and autonomous systems. By combining state-of-the-art object detection with custom depth estimation, the system provides a comprehensive environmental understanding using only a single camera, making advanced perception accessible to a wider range of applications where budget or resource constraints previously limited capabilities.

Project Information

  • Category: Computer Vision, Deep Learning
  • Duration: 3 months
  • Completed: 2023
  • Institution: University at Buffalo

Interested in this project?

If you're interested in learning more about our object detection and depth estimation framework or exploring potential applications in computer vision, please get in touch.

Contact Me