Project Overview
The Gaze Detection project addresses a common problem in multi-monitor video conferencing: when participants look at different screens, others see the side of their face, creating an impression of disinterest and reducing call immersion.
This system automatically switches between multiple webcams based on which monitor the user is looking at, using gaze detection and machine learning to maintain natural eye contact during video calls.
Project Demo
Watch the gaze detection system automatically switch between webcams based on which monitor the user is looking at.
The Problem
In multi-monitor setups, webcams are typically attached to just one screen. When users look at different monitors during video calls, their gaze appears directed away from other participants, creating several issues:
- Reduced Engagement: Participants appear disinterested or distracted
- Poor Communication: Lack of eye contact affects meeting dynamics
- Unprofessional Appearance: Side-profile views look unprofessional
- Limited Functionality: Users can't effectively use multiple screens during calls
The Solution
By placing webcams on multiple monitors and using gaze detection algorithms, the system intelligently switches to the camera that best captures direct eye contact, regardless of which screen the user is actively viewing.
Technical Implementation
The system leverages computer vision and machine learning to track eye movements and determine gaze direction in real-time, then routes the appropriate camera feed through a virtual camera device.
Core Technology Stack
- Gaze Tracking Module: Pre-existing Python library for eye movement detection
- Computer Vision: OpenCV for camera feed processing and analysis
- Virtual Camera: Software camera device for video conferencing integration
- Real-time Processing: Low-latency frame analysis and switching logic
Algorithm Architecture
The system processes multiple camera feeds simultaneously:
- Frame Capture: Continuous capture from all connected webcams
- Gaze Analysis: Real-time eye tracking and direction estimation
- Confidence Scoring: Probability calculation for each potential camera
- Smart Switching: Intelligent camera selection with anti-flicker measures
- Virtual Output: Seamless feed routing to video conferencing apps
Reliability Improvements
Several optimization techniques ensure stable performance:
- Switching Timeout: Prevents rapid camera flickering
- Moving Averages: Smooths gaze readings over time periods
- Confidence Thresholds: Only switches when detection confidence is high
- Fallback Logic: Graceful handling of detection failures
Development Process
The project required extensive experimentation to balance detection accuracy with system stability, particularly in varying lighting conditions and user movements.
Hardware Setup
Using multiple webcams positioned on different monitors, the system required careful calibration to account for:
- Camera Positioning: Optimal placement for gaze detection accuracy
- Lighting Variations: Consistent performance across different lighting conditions
- Monitor Angles: Accommodating various screen orientations and distances
- User Movement: Handling natural head and body movements during calls
Software Challenges
Key technical challenges included:
- Real-time Performance: Maintaining low latency for natural interactions
- Detection Accuracy: Minimizing false positives in gaze detection
- System Integration: Seamless compatibility with video conferencing platforms
- Resource Management: Efficient processing of multiple video streams
Ongoing Improvements
The project continues to evolve with ongoing research into:
- Machine Learning Enhancement: Training custom models for improved accuracy
- Adaptive Algorithms: Self-tuning parameters based on user behavior
- Cross-platform Support: Compatibility with various operating systems
- Performance Optimization: Reduced computational overhead
Impact & Applications
This project demonstrates practical applications of computer vision in solving everyday workflow problems, particularly relevant in the era of remote work.
Use Cases
- Remote Work: Enhanced video conferencing for multi-monitor professionals
- Content Creation: Streamers and content creators with complex setups
- Education: Teachers managing multiple screens during online classes
- Presentations: Speakers who need to reference multiple displays
Learning Outcomes
The project provided valuable experience in:
- Computer Vision: Practical application of eye tracking and gaze detection
- Real-time Systems: Building responsive, low-latency applications
- Hardware Integration: Managing multiple camera inputs simultaneously
- User Experience: Designing systems that feel natural and intuitive
- Problem Solving: Addressing complex technical and usability challenges
Future Potential
The concepts explored in this project have broader applications in accessibility technology, gaming interfaces, and human-computer interaction research.