Tom Monnier

Publications

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild
Morris Alper, David Novotny, Filippos Kokkinos, Hadar Averbuch-Elor, Tom Monnier
arxiv 2025
paper | webpage

WildCAT3D learns from in-the-wild image collections with diverse appearances, enabling appearance-controlled novel-view synthesis from a single image.

Twinner: Shining Light on Digital Twins in a Few Snaps
Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny
CVPR 2025
paper

We introduce an LRM capable of recovering illumination, geometry and material properties of real object scenes from a few posed images.

UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
CVPR 2025
paper | webpage | code

We present a new object-centric dataset for 3D deep learning and 3D generative AI.

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotny, Andrea Vedaldi
CVPR 2025 (Highlight award)
paper | webpage

We propose a method for compositional part-level 3D generation and reconstruction from various modalities including text, image or 3D models.

Meta 3D Gen
Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi
arXiv 2024
paper | webpage | video

We combine Meta 3D AssetGen and TextureGen for high-quality mesh generation.

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
NeurIPS 2024
paper | webpage | video

We introduce a novel text- or image-conditioned generator of 3D assets with physically-based rendering materials and detailed geometry.

GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny
ECCV 2024
paper | webpage

We propose a method that encodes 2D images into any 3D representation, without requiring pre-trained image feature extractor.

Unsupervised Image Analysis by Synthesis
Tom Monnier
PhD thesis (Accessit award from AFRIF)
paper | bibtex

This thesis studies unsupervised image analysis models, its central idea is to learn these models by synthesizing the images themselves.

The Learnable Typewriter: A Generative Approach to Text Line Analysis
Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry
ICDAR 2024 (Best paper award)
paper | webpage | code | bibtex

We build upon sprite-based image decomposition approaches to design a generative method for character analysis and recognition in text lines.

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives
Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry
NeurIPS 2023
paper | webpage | code | slides | bibtex

We compute a primitive-based 3D reconstruction from multiple views by optimizing textured superquadric meshes with learnable transparency.

We introduce MACARONS, a method that learns in a self-supervised fashion to explore new environments and reconstruct them in 3D using RGB images only.

Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
Monika Wysoczanska, Tom Monnier, Tomasz Trzcinski, David Picard
NeurIPS Workshops 2022
paper | bibtex

A Transformer-based framework to evaluate off-the-shelf features (object-centric and dense representations) for the reasoning task of VQA.

We present UNICORN, a self-supervised approach leveraging the consistency across different single-view images for high-quality 3D reconstructions.

Representing Shape Collections with Alignment-Aware Linear Models
Romain Loiseau, Tom Monnier, Mathieu Aubry, Loïc Landrieu
3DV 2021
paper | webpage | code | bibtex

We characterize 3D shapes as affine transformations of linear families learned without supervision, and showcase its advantages on large shape collections.

We discover the objects recurrent in unlabeled image collections by modeling images as a composition of learnable sprites.

A simple adaptation of K-means to make it work on pixels! We align prototypes to each sample image before computing cluster distances.

Leveraging synthetic training data to efficiently extract visual elements from historical document images.

Tom Monnier

Research Scientist at Meta

email. github. google scholar. twitter.

Publications

Academic activities