12 December, New Delhi
Film directors are notorious for demanding that scenes be shot and re-shot repeatedly until actors express just the right emotion at the right time; but directors will now be able to fine-tune performances when they are editing the film, instead of reshooting the scene with the help of a new system film editing software.
Developed by Disney Research and the University of Surrey, the software system called called FaceDirector will enable a director to seamlessly blend facial images from a couple of video takes to achieve the desired effect.”It’s not unheard of for a director to re-shoot a crucial scene dozens of times, even 100 or more times, until satisfied,” said Markus Gross, vice-president of research at Disney Research.
“That not only takes a lot of time – it also can be quite expensive, he points out. Now our research team has shown that a director can exert control over an actor’s performance after the shoot with just a few takes, saving both time and money.”Jean-Charles Bazin, associate research scientist at Disney Research, and Charles Malleson, a PhD student at the University of Surrey’s Centre for Vision, Speech and Signal Processing, showed that FaceDirector is able to create a variety of novel, visually plausible versions of performances of actors in close-up and mid-range shots.
Moreover, the system works with normal 2D video input acquired by standard cameras, without the need for additional hardware or 3D face reconstruction.The researchers will present their findings at ICCV 2015, the International Conference on Computer Vision in Santiago, Chile.”The central challenge for combining an actor’s performances from separate takes is video synchronization,” Bazin said.
“But differences in head pose, emotion, expression intensity, as well as pitch accentuation and even the wording of the speech, are just a few of many difficulties in syncing video takes.”Bazin, Malleson and the rest of the team solved this problem by developing an automatic means of analysing both facial expressions and audio cues and then identify frames that correspond between the takes using a graph-based framework
.”To the best of our knowledge, our work is the first to combine audio and facial features for achieving an optimal nonlinear, temporal alignment of facial performance videos,” Malleson said.
Once this synchronization has occurred, the system enables a director to control the performance by choosing the desired facial expressions and timing from either video, which are then blended together using facial landmarks, optical flow and compositing.