Project Overview
This project develops a cost-effective and lightweight framework for object detection and depth estimation using a single camera. By integrating YOLOv5 for object detection and a neural network for distance estimation, the system achieves robust real-time performance. This approach eliminates the need for expensive multi-sensor setups like LiDAR, making it ideal for low-cost autonomous systems.
The framework leverages the KITTI dataset, a benchmark in autonomous driving research. YOLOv5 detects and classifies objects in real-time, while a multi-layer perceptron (MLP) predicts object distances using bounding box coordinates. To improve robustness, the system was trained with transformed images simulating adverse weather conditions, such as rain and fog.
Key Features and Contributions
- Seamless integration of object detection and depth estimation into a single-camera setup.
- Enhanced performance in challenging conditions through training with augmented datasets.
- Adaptation of YOLOv5 to display both object classes and distances on bounding boxes.
- Real-time processing capabilities suitable for dynamic environments.
Technical Approach
The system combines state-of-the-art object detection with a custom depth estimation network, providing a comprehensive perception solution with minimal hardware requirements.
Object Detection
The YOLOv5 model identifies objects and provides bounding box coordinates and class predictions. This state-of-the-art algorithm ensures fast and accurate detection even in complex scenes with multiple objects.
Detection Pipeline
- Real-time frame processing from camera feed
- Multi-class object detection capability
- Tracking consistency across video frames
- Confidence filtering to reduce false positives
Distance Estimation
- Custom MLP neural network architecture
- Five fully connected layers with batch normalization
- Dropout regularization to prevent overfitting
- Bounding box geometric features as input
Environmental Robustness
Training on augmented datasets with added noise and blur enhances the system's ability to handle adverse conditions, such as fog or low-light scenarios. This was achieved through:
- Data Augmentation: Random transformations including brightness variation, noise injection, blur, and simulated weather effects
- Domain Adaptation: Training with synthetic and real-world data to improve generalization
- Validation in Diverse Conditions: Testing across various lighting and weather scenarios to ensure consistent performance
Performance Evaluation
The system was evaluated on the KITTI validation dataset, showing promising results:
Metric | YOLOv5 + MLP | Stereo Camera System | LiDAR Reference |
---|---|---|---|
Mean Distance Error | 0.61m | 0.42m | 0.05m |
Processing Speed | 24 FPS | 18 FPS | 10 FPS |
Hardware Cost | Low | Medium | High |
Accuracy in Adverse Weather | 83% | 76% | 72% |
While the system doesn't match the precision of LiDAR, it offers a compelling balance between accuracy, speed, and cost, making it suitable for many practical applications where extreme precision isn't critical.
Potential Applications
The framework's combination of lightweight design and robust performance makes it suitable for various applications:
Autonomous Vehicles
Cost-effective perception system for safe navigation and obstacle avoidance in budget-friendly autonomous or semi-autonomous vehicles.
Robotics
Efficient perception system for resource-constrained robots operating in dynamic environments where power and computational resources are limited.
Surveillance Systems
Enhanced monitoring capabilities with distance estimation to detect intrusions and track objects across various weather and lighting conditions.
Industrial Automation
Quality control and safety systems requiring spatial awareness without the expense of multiple sensors or complex calibration procedures.
Technologies Used
Python
PyTorch
YOLOv5
OpenCV
KITTI Dataset
NumPy
Conclusion
This framework demonstrates how a simple yet effective design can address complex perception challenges, paving the way for scalable and cost-efficient solutions in robotics and autonomous systems. By combining state-of-the-art object detection with custom depth estimation, the system provides a comprehensive environmental understanding using only a single camera, making advanced perception accessible to a wider range of applications where budget or resource constraints previously limited capabilities.