Help support Harvard John A. Paulson School of Engineering and Applied Sciences. Make a gift.

News

Building 3D Scenes From 2D Images in an Instant

New algorithm speeds up computer’s reconstruction capabilities

Key takeaways

  • Harvard computer scientists have developed a new algorithm that allows computers to rapidly and accurately reconstruct high-quality 3D scenes from 2D images, addressing a longstanding challenge in computer vision and robotic perception.
  • The new method leverages AI depth prediction and convex optimization, enabling processing of 3D reconstructions in 10 seconds compared to two hours.
  • The innovation could make machines better able to interpret their environments.

3D reconstruction of the Roman Colosseum

A reconstructed 3D image of the Roman Colosseum using the new algorithm and about 2,000 camera frames. 

Imagine trying to make an accurate three-dimensional model of a building using only pictures taken from different angles — but you’re not sure where or how far away all the cameras were. Our big human brains can fill in a lot of those details, but computers have a much harder time doing so.

This scenario is a well-known problem in computer vision and robot navigation systems. Robots, for instance, must take in lots of 2D information and make 3D point clouds — collections of data points in 3D space — in order to interpret a scene. But the mathematics involved in this process is challenging and error-prone, with many ways for the computer to incorrectly estimate distances. It’s also slow, because it forces the computer to create its 3D point cloud bit by bit.

Computer scientists at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) think they have a better method: A breakthrough algorithm that lets computers reconstruct high-quality 3D scenes from 2D images much more quickly than existing methods.

Their research is described in a paper, “Building Rome with Convex Optimization,” which recently received the Best Systems Paper Award in Memory of Seth Teller at the Robotics: Science and Systems Conference. It was authored by graduate student Haoyu Han and Heng Yang, assistant professor of electrical engineering at SEAS.

“By combining state-of-the-art AI depth prediction with a powerful new approach in convex numerical optimization, the method can estimate the positions of all points in a scene at once, with no need for step-by-step guesswork,” Han said. “As a result, the reconstruction process is not only faster and more robust than traditional techniques, but is also free from the need for initial guesses by the computer.”

The research had federal support from the Office of Naval Research.

Read the paper.

reconstructed 3D image of a building with clocktower

A reconstructed image of St. Mark's Campanile, from over 10,000 camera frames. 

Topics: AI / Machine Learning, Computational Science & Engineering, Computer Science, Robotics

Scientist Profiles

Heng Yang

Assistant Professor of Electrical Engineering

Press Contact

Anne J. Manning | amanning@seas.harvard.edu