Post
The technology that makes characters' mouths match their words, bridging the uncanny gap between voice and face.
Lip sync technology ranges from simple phoneme-based mouth flapping to cutting-edge performance capture that records every micro-expression. At the basic level, systems map audio waveforms or phoneme data to a set of mouth shapes (visemes) and blend between them. Mid-tier solutions like JALI or OVR analyze dialogue audio and procedurally generate facial animation. At the high end, performance capture records the actor's actual face during voice recording, translating every lip movement, brow furrow, and eye squint directly onto the game character. The challenge scales with ambition: a stylized game can get away with simple sync, but a photorealistic game with close-up dialogue scenes needs near-perfect lip sync or the uncanny valley devours any emotional impact.
Example
L.A. Noire pioneered MotionScan technology that captured actors' full facial performances at extreme fidelity, making reading suspects' facial expressions a core gameplay mechanic. Unreal Engine's MetaHuman framework enables real-time lip sync that would have required months of hand animation just a few years ago.
Why it matters
Bad lip sync is one of the fastest ways to break immersion during dialogue scenes. Humans are extraordinarily attuned to facial movements, especially around the mouth. As games push toward cinematic storytelling, lip sync technology becomes the bridge between voice acting quality and visual believability.
Related concepts