Tabletop Camera With Intelligent Multi-Participant Framing
When I was approached for this project in 2018, virtual conferencing had evolved far beyond the era of clunky conference phones. Yet, no product fully met the needs of an increasingly hybrid work culture. After extensive research, collaboration, and testing, we developed the Logitech Sight–an AI-powered meeting room camera designed to give virtual participants a true seat at the table.
Logitech Sight’s innovative 315° camera displays the best angles for viewing in-room interactions by providing consistent frontal views of speakers and their respondents. The camera is designed to complement Rally Bar, a separate front-of-room video bar, to offer multiple perspectives. Sight blends individual views with a full-room perspective to capture both verbal and nonverbal exchanges, contributing to a more inclusive and engaging experience for all participants.
Reimagining Virtual Conferencing
I was tapped on the shoulder by the VP of Product B2B to develop a camera capable of capturing a 315° view from the center of a table. While front-of-room cameras offer holistic images of the environment, they had their limitations–they provided no clear indicators of who was speaking, and they only offered a single, unnatural perspective of the room. After reflecting on B2B Business Group’s vision, it became clear that integrating facial recognition would be key to developing a tabletop camera that replicates the feel of in-person meetings for remote viewers.
I connected with Thomas Triquet, a colleague in Lausanne, known for his work in facial recognition. Once Logitech greenlit the collaboration, Thomas was brought on board to join the project as co-developer. Our collaboration set the stage for designing an intelligent camera that would not only enhance remote participation but also set a new standard for inclusivity in hybrid meetings.
Translating Sound into Sight
We set forth with the goal of developing a meeting camera that would integrate facial recognition to capture specific participants rather than showing a single view of the entire room. However, performance quickly became a key challenge. Capturing everyone in the room meant that Sight would need to process vast amounts of data. Relying solely on facial recognition to identify active and inactive speakers was not ideal–this required analyzing much smaller pixel areas, which was too taxing for the camera. But without this level of facial recognition, we would need an alternative solution for identifying and mapping individual participants.
To close this design gap, I leveraged my knowledge of an existing Logitech innovation–the Rally Mic Pod. Thanks to earlier conversations with Logitech’s Director of Audio Engineering, I was able to apply insights about the Mic Pod’s capabilities to come up with a solution: we could identify active speakers by combining the camera's facial recognition ability with the Mic Pod’s machine learning capability for precise sound localization. By placing the Mic Pod at the base of the camera, active speakers could now be located through sound instead of having the camera process granular details like mouth movement.
Essential to this project were the Rally Mic Pod’s four beam-forming microphone arrays. The four omnidirectional microphones, which are set in a compact puck design, triangulate the sound’s origin by calculating the time difference of sound waves reaching individual microphones within the pod. With the aid of the Mic Pod, Logitech Sound locates individuals from within its 315° view to extrapolate and display speaking participants. The UI orchestrates the image cropping and audio selection to present adaptive views of the room. The result is a more natural meeting view that replicates the feel of in-person participation.
A Seat at the Table
As we continued refining the camera’s design, Logitech Sight’s ability to deliver more inclusive displays emerged as its defining feature. Rather than focusing solely on the active speaker, Sight simultaneously displays previous speakers side by side, creating a visual dialogue between participants and allowing remote viewers to experience reactions and responses in real time. When more than four speakers are detected, the camera cycles through set panels every three seconds.
And in place of a broad, static view of the room, participants have access to three different perspectives: a traditional Front View of the entire room, a Center View that maintains a head-on perspective of each individual speaker, and a combined Front and Center View for a comprehensive display of multiple speakers and their interactions. With participants shown side by side, the camera’s versatile interface adapts to both visual and auditory cues to capture the dynamic interactions of all participants.
Logitech Sight is not just the result of collaborative innovation; the conferencing camera was also designed to enhance collaboration by bridging the gap between in-person and remote participants.
-
Award-winning design: Recognized by Time’s “Best Inventions of 2024” list, where Logitech Sight is described as using AI capabilities to create “more natural conversation for video conferences on a range of platforms.”
Inclusive technology: Developed a meeting room camera with dynamic, adaptive multi-participant framing.
Intelligent tabletop camera: Optimized facial recognition through the use of a versatile interface and responsive auditory cues.
Integration of audio engineering innovations: Applied industry insights and developed strategic partnerships to create an adaptive video conferencing solution.