Project Overview

The Gaze Detection project addresses a common problem in multi-monitor video conferencing: when participants look at different screens, others see the side of their face, creating an impression of disinterest and reducing call immersion.

This system automatically switches between multiple webcams based on which monitor the user is looking at, using gaze detection and machine learning to maintain natural eye contact during video calls.

Project Demo

Watch the gaze detection system automatically switch between webcams based on which monitor the user is looking at.

The Problem

In multi-monitor setups, webcams are typically attached to just one screen. When users look at different monitors during video calls, their gaze appears directed away from other participants, creating several issues:

  • Reduced Engagement: Participants appear disinterested or distracted
  • Poor Communication: Lack of eye contact affects meeting dynamics
  • Unprofessional Appearance: Side-profile views look unprofessional
  • Limited Functionality: Users can't effectively use multiple screens during calls

The Solution

By placing webcams on multiple monitors and using gaze detection algorithms, the system intelligently switches to the camera that best captures direct eye contact, regardless of which screen the user is actively viewing.

Technical Implementation

The system leverages computer vision and machine learning to track eye movements and determine gaze direction in real-time, then routes the appropriate camera feed through a virtual camera device.

Core Technology Stack

  • Gaze Tracking Module: Pre-existing Python library for eye movement detection
  • Computer Vision: OpenCV for camera feed processing and analysis
  • Virtual Camera: Software camera device for video conferencing integration
  • Real-time Processing: Low-latency frame analysis and switching logic

Algorithm Architecture

The system processes multiple camera feeds simultaneously:

  1. Frame Capture: Continuous capture from all connected webcams
  2. Gaze Analysis: Real-time eye tracking and direction estimation
  3. Confidence Scoring: Probability calculation for each potential camera
  4. Smart Switching: Intelligent camera selection with anti-flicker measures
  5. Virtual Output: Seamless feed routing to video conferencing apps

Reliability Improvements

Several optimization techniques ensure stable performance:

  • Switching Timeout: Prevents rapid camera flickering
  • Moving Averages: Smooths gaze readings over time periods
  • Confidence Thresholds: Only switches when detection confidence is high
  • Fallback Logic: Graceful handling of detection failures

Development Process

The project required extensive experimentation to balance detection accuracy with system stability, particularly in varying lighting conditions and user movements.

Hardware Setup

Using multiple webcams positioned on different monitors, the system required careful calibration to account for:

  • Camera Positioning: Optimal placement for gaze detection accuracy
  • Lighting Variations: Consistent performance across different lighting conditions
  • Monitor Angles: Accommodating various screen orientations and distances
  • User Movement: Handling natural head and body movements during calls

Software Challenges

Key technical challenges included:

  • Real-time Performance: Maintaining low latency for natural interactions
  • Detection Accuracy: Minimizing false positives in gaze detection
  • System Integration: Seamless compatibility with video conferencing platforms
  • Resource Management: Efficient processing of multiple video streams

Ongoing Improvements

The project continues to evolve with ongoing research into:

  • Machine Learning Enhancement: Training custom models for improved accuracy
  • Adaptive Algorithms: Self-tuning parameters based on user behavior
  • Cross-platform Support: Compatibility with various operating systems
  • Performance Optimization: Reduced computational overhead

Impact & Applications

This project demonstrates practical applications of computer vision in solving everyday workflow problems, particularly relevant in the era of remote work.

Use Cases

  • Remote Work: Enhanced video conferencing for multi-monitor professionals
  • Content Creation: Streamers and content creators with complex setups
  • Education: Teachers managing multiple screens during online classes
  • Presentations: Speakers who need to reference multiple displays

Learning Outcomes

The project provided valuable experience in:

  • Computer Vision: Practical application of eye tracking and gaze detection
  • Real-time Systems: Building responsive, low-latency applications
  • Hardware Integration: Managing multiple camera inputs simultaneously
  • User Experience: Designing systems that feel natural and intuitive
  • Problem Solving: Addressing complex technical and usability challenges

Future Potential

The concepts explored in this project have broader applications in accessibility technology, gaming interfaces, and human-computer interaction research.