Gaze Detection - Jack Devonshire Portfolio

Project Overview

The Gaze Detection project addresses a common problem in multi-monitor video conferencing: when participants look at different screens, others see the side of their face, creating an impression of disinterest and reducing call immersion.

This system automatically switches between multiple webcams based on which monitor the user is looking at, using gaze detection and machine learning to maintain natural eye contact during video calls.

Project Demo

Watch the gaze detection system automatically switch between webcams based on which monitor the user is looking at.

The Problem

In multi-monitor setups, webcams are typically attached to just one screen. When users look at different monitors during video calls, their gaze appears directed away from other participants, creating several issues:

Reduced Engagement: Participants appear disinterested or distracted
Poor Communication: Lack of eye contact affects meeting dynamics
Unprofessional Appearance: Side-profile views look unprofessional
Limited Functionality: Users can't effectively use multiple screens during calls

The Solution

By placing webcams on multiple monitors and using gaze detection algorithms, the system intelligently switches to the camera that best captures direct eye contact, regardless of which screen the user is actively viewing.

Technical Implementation

The system leverages computer vision and machine learning to track eye movements and determine gaze direction in real-time, then routes the appropriate camera feed through a virtual camera device.

Core Technology Stack

Gaze Tracking Module: Pre-existing Python library for eye movement detection
Computer Vision: OpenCV for camera feed processing and analysis
Virtual Camera: Software camera device for video conferencing integration
Real-time Processing: Low-latency frame analysis and switching logic

Algorithm Architecture

The system processes multiple camera feeds simultaneously:

Frame Capture: Continuous capture from all connected webcams
Gaze Analysis: Real-time eye tracking and direction estimation
Confidence Scoring: Probability calculation for each potential camera
Smart Switching: Intelligent camera selection with anti-flicker measures
Virtual Output: Seamless feed routing to video conferencing apps

Reliability Improvements

Several optimization techniques ensure stable performance:

Switching Timeout: Prevents rapid camera flickering
Moving Averages: Smooths gaze readings over time periods
Confidence Thresholds: Only switches when detection confidence is high
Fallback Logic: Graceful handling of detection failures

Development Process

The project required extensive experimentation to balance detection accuracy with system stability, particularly in varying lighting conditions and user movements.

Hardware Setup

Using multiple webcams positioned on different monitors, the system required careful calibration to account for:

Camera Positioning: Optimal placement for gaze detection accuracy
Lighting Variations: Consistent performance across different lighting conditions
Monitor Angles: Accommodating various screen orientations and distances
User Movement: Handling natural head and body movements during calls

Software Challenges

Key technical challenges included:

Real-time Performance: Maintaining low latency for natural interactions
Detection Accuracy: Minimizing false positives in gaze detection
System Integration: Seamless compatibility with video conferencing platforms
Resource Management: Efficient processing of multiple video streams

Ongoing Improvements

The project continues to evolve with ongoing research into:

Machine Learning Enhancement: Training custom models for improved accuracy
Adaptive Algorithms: Self-tuning parameters based on user behavior
Cross-platform Support: Compatibility with various operating systems
Performance Optimization: Reduced computational overhead

Impact & Applications

This project demonstrates practical applications of computer vision in solving everyday workflow problems, particularly relevant in the era of remote work.

Use Cases

Remote Work: Enhanced video conferencing for multi-monitor professionals
Content Creation: Streamers and content creators with complex setups
Education: Teachers managing multiple screens during online classes
Presentations: Speakers who need to reference multiple displays

Learning Outcomes

The project provided valuable experience in:

Computer Vision: Practical application of eye tracking and gaze detection
Real-time Systems: Building responsive, low-latency applications
Hardware Integration: Managing multiple camera inputs simultaneously
User Experience: Designing systems that feel natural and intuitive
Problem Solving: Addressing complex technical and usability challenges

Future Potential

The concepts explored in this project have broader applications in accessibility technology, gaming interfaces, and human-computer interaction research.