AI Case Study
Facebook provides technology able to turn 2D images into 3D models
Facebook's AI research team have developed a system based on deep learning which is capable of transforming 2D video and images into 3D models. They have demonstrated the technology on images of people where the system was able to generate 3D models of humans in real time. The system was trained on 50,000 labelled images of human bodies. However, the publishing of the code for the technology, DensePose, has raised concerns about potential misuse for surveillance.
Internet Services Consumer
"In early 2018, Facebook's AI researchers unveiled a deep-learning system that can transform 2D photo and video images of people into 3D mesh models of those human bodies in motion.
DensePose goes beyond basic object recognition. Besides detecting humans in pictures, it can also make 3D models of their bodies by estimating the positions of their torsos and limbs. Those models can then enable the technology to create real-time 3D re-creations of human movement in 2D videos. For example, it could produce videos that show models of several people kicking soccer balls or a single individual riding on a motorcycle.
But there is a "troubling implication of this research" that could enable "real-time surveillance," said Jack Clark, strategy and communications director at OpenAI, a nonprofit AI research company, in his popular newsletter, called Import AI. Clark first discussed the implications of Facebook's DensePose paper in the February issue of his newsletter, and followed up in June after Facebook released the DensePose code on the software development platform GitHub."
R And D
Core Research And Development
"The result is a system that can perform the 2D to 3D conversion at a rate of "20-26 frames per second for a 240 × 320 image or 4-5 frames per second for a 800 × 1100 image," Facebook researchers wrote in their paper. In other words, it's generally capable of creating 3D models of humans in a 2D video in real time."
"Facebook researchers first enlisted human annotators to create a training data set by manually labeling certain points on 50,000 images of human bodies. To make that job easier for the annotators and try to improve their accuracy, the researchers broke the task of labeling down into body segments such as head, torso, limbs, hands, and feet. They also "unfolded" each body part to present multiple viewpoints without requiring the annotators to manually rotate the image to get a better view."