Several weeks into the Screen Actors Guild – American Federation of Television and Radio Artists’ (SAG-AFTRA) historic strike, neither they nor their peers in the Writers Guild of America (WGA) seem close to an agreement with the Alliance of Motion Picture and Television Producers (AMPTP). Among the most pressing issues for SAG-AFTRA is the studios’ proposed use of artificial intelligence (A.I.) to reproduce actors’ likenesses in perpetuity. This would mean that renderings of actors could be used without their participation or compensation.
In response to skeptics who question how truly substantial a threat A.I. is to the actors of today and tomorrow, Science & Film recommends revisiting Anthony Kaufman’s 2014 interview with computer graphics researcher Paul Debevec about Ari Folman’s 2013 film THE CONGRESS. The film stars Robin Wright as an actor – Robin Wright – grappling with the fallout of having one’s likeness scanned and sold. Debevec helped to develop the real Light Stage scanning technology featured in the film, which creates photorealistic digital actors. Since our interview nearly a decade ago, Debevec’s research has remained at the heart of the industry’s future. He has since become Netflix’s Director of Research for Creative Algorithms and Technology, overseeing R&D for visual effects and virtual production with computer vision, graphics, and machine learning.
The interview has been re-published below in its entirety.
Sloan Science and Film: Can you take me through the Light Stage technology that creates photorealistic digital actors? What needs to happen on a technical or scientific level?
Paul Debevec: The first time we did something that we were happy with was the “Digital Emily” project in 2008. At the time, no one knew how to get a photo-digital actor to work. Essentially, what we developed at the lab was a technology for scanning the face at high resolutions and digitizing a 3D model of the actor’s face—of the surface face of the skin and the texture maps, the coloration, the freckles, skin color, where it’s shiny and where it isn’t shiny.
SSF: So how does the technology actually work?
PD: It uses polarized gradient illumination, which is a technique that we invented in the lab that looks at how light plays off of the shine of the skin to understand the high resolution detail of the face. The other piece of the puzzle is that you need to master the face in multiple facial expressions to understand what a smile looks like, like how the face wrinkles or crinkles. And then how do you drive this digital face so the way that it moves has realistic motion.
SSF: Can you explain in more detail? How does the computer programming work, for instance, to make this happen?
PD: We solved the problem with a combination of hardware and software. We really worked it from both ends. So our hardware is a sphere of white LED light sources. For our high resolution facial scans, we light people with gradient polarized light. And by gradient, I mean, the first thing is all the lights are on, and then we’ll do gradients, where it’s bright at the top, halfway in the middle, all the way off at the bottom, and then we’ll do it left to right, front and back, and then reverse it, bottom to top, right to left, back to front. And each of those gradient conditions we’ll shoot in two polarization states—one with vertically polarized light onto the face and one with horizontally polarized light coming onto the face. And we have an array of 7-8 cameras that are all vertically polarized.
Still from THE CONGRESS
Now when light hits skin, some different things can happen: It can reflect right off the skin—we call that a specular reflection—that’s the highlights of the skin. It can also refract into the skin and get absorbed. And that happens to most of the light. But the light that doesn’t get absorbed goes through a process of multiple scattering. And then it eventually comes out in some random direction a millimeter or so from where it came in. This is called a sub-surface scattering or a diffuse reflection.The result is that the light that ends up getting back to the camera is in two components: specular reflection and sub-surface scattering or diffuse reflection. So to build a model of how an actor’s face reflects light you need to image these two things separately. We do that with the polarization, because the light that reflects off of the surface remains polarized. So vertically polarized light stays vertically polarized, but if it’s horizontally polarized light, it won’t make it through the polarizers on the camera. That means if you light the face with horizontally polarized light, you strip the shine off of the skin, and you’re looking at just the sub-surface scattering, and this is the light that picks up skin color. If you light the face with vertically polarized light, then the specular reflection makes it through and the sub-surface scattering makes it through, and the difference between those two images gives you an image of just the specular on its own. If you then look at the different reflective components in the different gradients, it will produce a very high resolution map of the human face, so we get geometry down to the level of skin pores and fine creases by observing how the light reflects off of the shine of the skin when you change the direction of the light.
SSF: This is the hardware. What about the software?
PD:It’s the software that extracts the cross-polarized image from the parallel polarized image. Then we need to figure out the surface orientation for every pixel in the image. So it’s actually pretty simple math. You do it by computing ratios of images. So if you divide the right gradient image by the full-on image, it gives you the measure of the surface orientation right to left. And so with pretty simple math, you can get an XYZ vector to where that pixel is pointing. In addition, our software does a traditional computer algorithm: It will triangulate information from the seven cameras and it will search for pixels that seem like they have the same color and surroundings, and when you locate those points, you can triangulate that with a vector map and it will produce a 3-D image of that point. So we end up with the 3D shape of the face that obeys the consistency of the different views that we have and also the detailed surface orientation within each scan. And that’s how we get a hi-resolution facial scan.
SSF: What needs to be solved to get to the next level, where digital actors are indistinguishable from the real thing as seen in THE CONGRESS?
PD: We have a very nice solution for scanning faces. But we need better solutions for driving the animation of these faces. For every part of the face, how do you transition between the different scans and extrapolate from the different scans, for example? If you just have video of some actor shot with a cellphone, can you analyze that, and then use that to drive their digital character and have them pick up all the nuances that any human can see? Computers are still having trouble with this. And so we need better performance capture algorithms. There was great performance capture technology seen in the movie, DAWN OF THE PLANET OF THE APES. But it still takes animators a lot of effort to clean it up and to get the little lip curls and twitches in the eyes. Finally, there is the need to eventually simulate the intelligence of the actors. In a videogame, you don’t want to be limited to playing recorded versions of everything the actor said when they were making the game. Digital characters should be able to react to things in new and unexpected ways. And that’s why there are lots of artificial intelligence researchers, here, as well, to figure out the digital minds of the actors that will be appropriate for interactive applications.
SSF: THE CONGRESS ends up being fairly critical of these technologies. What do you feel are the implications for your work?
PD: I feel like it’s going to affect the epistemology of how we know what we know. Seeing a video of something doesn’t mean that it actually happened. But people should be relatively aware of that after having seen STAR WARS in 1977 or TRANSFORMERS in 2014. There weren’t X-Wing Fighters attacking a Death Star and there weren’t giant robots destroying cities.
We helped a little bit with some facial scanning that helped make the Michael Jackson hologram for the Billboard Music Awards. It’s not really a hologram, but a 2-D image reflected towards the audience. But I watched that a couple times and it looks like Michael Jackson and moves and speaks like him. The face is totally digital. Because it was someone who was not available for scanning, there’s a ton of artistic endeavor in there, as well. But it looks like Michael Jackson.
SSF: But is that a problem? Is it a problem if you could make a digital Obama say something that the real Obama wouldn’t say, and no one knew?
PD: With enough money and a bit of time, you can make anybody from any time at any point in history look like they’re doing or saying anything. It’s not impossible and it hasn’t been impossible for five years now, since the THE CURIOUS CASE OF BENJAMIN BUTTON. You can use a hammer to build a house, or you can use a hammer to bash somebody’s skull. It’s just a tool and it has multiple uses. And you hope that people will use it for good purposes. I don’t think anyone thinks we should ban hammers. We need to respect what the tool can do and use it appropriately and try to look after ourselves as a society in how we’re making use of these things.
♦
TOPICS