Tom Monnier

Research Scientist at Meta

I am a Research Scientist at Meta working on computer vision, with an emphasis on 3D modeling and 3D generation. I did my PhD in the amazing Imagine lab at ENPC under the guidance of Mathieu Aubry. During my PhD, I was fortunate to work with Jean Ponce (Inria), Matthew Fisher (Adobe Research), Alyosha Efros and Angjoo Kanazawa (UC Berkeley). Before that, I completed my engineer's degree (=M.Sc.) at Mines Paris.

My research is focused on learning things from images without annotations, with a particular interest in recovering the underlying 3D (see representative papers). I am always looking for PhD interns, feel free to reach out!

email.github.google scholar.twitter.

Publications

Twinner: Shining Light on Digital Twins in a Few Snaps
Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny
CVPR 2025
paper

We introduce an LRM capable of recovering illumination, geometry and material properties of real object scenes from a few posed images.

UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
CVPR 2025
paper | webpage | code

We present a new object-centric dataset for 3D deep learning and 3D generative AI.

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotny, Andrea Vedaldi
CVPR 2025
paper | webpage

We propose a method for compositional part-level 3D generation and reconstruction from various modalities including text, image or 3D models.

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
NeurIPS 2024
paper | webpage | video

We introduce a novel text- or image-conditioned generator of 3D assets with physically-based rendering materials and detailed geometry.

GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny
ECCV 2024
paper | webpage

We propose a method that encodes 2D images into any 3D representation, without requiring pre-trained image feature extractor.

The Learnable Typewriter: A Generative Approach to Text Line Analysis
Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry
ICDAR 2024 (IAPR best paper award)
paper | webpage | code | bibtex

We build upon sprite-based image decomposition approaches to design a generative method for character analysis and recognition in text lines.

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives
Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry
NeurIPS 2023
paper | webpage | code | slides | bibtex

We compute a primitive-based 3D reconstruction from multiple views by optimizing textured superquadric meshes with learnable transparency.

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-supervision
Antoine Guédon, Tom Monnier, Pascal Monasse, Vincent Lepetit
CVPR 2023
paper | webpage | code | video | slides | bibtex

We introduce MACARONS, a method that learns in a self-supervised fashion to explore new environments and reconstruct them in 3D using RGB images only.

Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
Monika Wysoczanska, Tom Monnier, Tomasz Trzcinski, David Picard
NeurIPS Workshops 2022
paper | bibtex

A Transformer-based framework to evaluate off-the-shelf features (object-centric and dense representations) for the reasoning task of VQA.

Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency
Tom Monnier, Matthew Fisher, Alexei A. Efros, Mathieu Aubry
ECCV 2022
paper | webpage | code | video | slides | bibtex

We present UNICORN, a self-supervised approach leveraging the consistency across different single-view images for high-quality 3D reconstructions.

Representing Shape Collections with Alignment-Aware Linear Models
Romain Loiseau, Tom Monnier, Mathieu Aubry, Loïc Landrieu
3DV 2021
paper | webpage | code | bibtex

We characterize 3D shapes as affine transformations of linear families learned without supervision, and showcase its advantages on large shape collections.

Unsupervised Layered Image Decomposition into Object Prototypes
Tom Monnier, Elliot Vincent, Jean Ponce, Mathieu Aubry
ICCV 2021
paper | webpage | code | video | slides | bibtex

We discover the objects recurrent in unlabeled image collections by modeling images as a composition of learnable sprites.

Deep Transformation-Invariant Clustering
Tom Monnier, Thibault Groueix, Mathieu Aubry
NeurIPS 2020 (oral presentation)
paper | webpage | code | video | slides | bibtex

A simple adaptation of K-means to make it work on pixels! We align prototypes to each sample image before computing cluster distances.

docExtractor: An off-the-shelf historical document element extraction
Tom Monnier, Mathieu Aubry
ICFHR 2020 (oral presentation)
paper | webpage | code | video | slides | bibtex

Leveraging synthetic training data to efficiently extract visual elements from historical document images.

Academic activities

Last updated: March 2025