Hiroki Furuta

I am a research scientist at Google DeepMind, Japan, working mainly on interactive multimodal AI agents (Project Astra) and alignment for video diffusion models (Veo). I received Ph.D. at The University of Tokyo, advised by Yutaka Matsuo. I also received BEng and MEng at The University of Tokyo, advised by Yutaka Matsuo, and closely collaborated with Shixiang Shane Gu. During my Ph.D., I was a Student Researcher at Google DeepMind, hosted by David Ha (in 2022) and Heiga Zen (in 2023 - 2024).

My recent research interest is around Multimodal Understanding and Generation; that is, Multimodal AI agents for real-world applications, Diffusion Models for Multimodal Generation, Alignment for Generative AI through deep reinforcement learning, and Mechanistic Interpretability of LLMs.

Recent Preprints

Daisuke Oba, Hiroki Furuta, Naoaki Okazaki.
Diffusion-State Policy Optimization for Masked Diffusion Language Models
arXiv preprint arXiv:2602.06462, 2026.
[arxiv] [website]
Gouki Minegishi, Jingyuan Feng, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo.
Emergent Analogical Reasoning in Transformers
arXiv preprint arXiv:2602.01992, 2026.
[arxiv]
Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta.
WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
arXiv preprint arXiv:2512.02473, 2025.
[arxiv]
Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, Sherry Yang.
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
arXiv preprint arXiv:2412.02617, 2024.
[arxiv] [website]

Conference Publications

Yuta Oshima, Daiki Miyake, Kohsei Matsutani, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta.
MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026).
[arxiv] [code] [HuggingFace]
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Neural Information Processing Systems (NeurIPS 2025).
[arxiv]
Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta.
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
Neural Information Processing Systems (NeurIPS 2025).
[arxiv]
Gouki Minegishi, Hiroki Furuta, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo.
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
International Conference on Machine Learning (ICML 2025).
[arxiv]
Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami.
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
International Conference on Machine Learning (ICML 2025).
[arxiv]
Gouki Minegishi, Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo.
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
International Conference on Learning Representations (ICLR 2025).
[arxiv] [code]
Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur.
Geometric-Averaged Preference Optimization for Soft Preference Labels
Neural Information Processing Systems (NeurIPS 2024).
[arxiv]
Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer.
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
International Conference on Machine Learning (ICML 2024).
[arxiv] [website]
Open X-Embodiment Collaboration, et al. (including Hiroki Furuta)
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
IEEE International Conference on Robotics and Automation (ICRA 2024) (Best Conference Paper Award).
[arxiv] [website]
Izzeddin Gur*, Hiroki Furuta*, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust. (*Equal Contribution)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
International Conference on Learning Representations (ICLR 2024) (Oral, 1.2% of 7262 submissions).
[arxiv]
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur.
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
International Conference on Learning Representations (ICLR 2024).
[arxiv] [website]
Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu.
A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation
International Conference on Learning Representations (ICLR 2023) (Notable-top-25%, 8% of 4966 submissions).
[arxiv] [code] [website]
Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu.
Generalized Decision Transformer for Offline Hindsight Information Matching
International Conference on Learning Representations (ICLR 2022) (Spotlight, 5% of 3391 submissions).
[arxiv] [code] [website]
Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu.
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Neural Information Processing Systems (NeurIPS 2021).
[arxiv] [code]
Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu.
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
International Conference on Machine Learning (ICML 2021).
[arxiv] [code]
Tatsuya Matsushima*, Hiroki Furuta*, Yutaka Matsuo, Ofir Nachum, Shixiang Gu. (*Equal Contribution)
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
International Conference on Learning Representations (ICLR 2021).
[openreview] [code]

Journal Publications

Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur.
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Transactions on Machine Learning Research (TMLR), 2024.
[arxiv] [code]
Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo.
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
Transactions on Machine Learning Research (TMLR), 2024.
[arxiv] [code]
So Kuroki, Tatsuya Matsushima, Junpei Arima, Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu, Yujin Tang.
Collective Intelligence for 2D Push Manipulations With Mobile Robots
IEEE Robotics and Automation Letters (RA-L), 2023.
[paper]

Please check Publications for further details.

Talks

Hiroki Furuta. “Opportunities and Challenges of Language Model Agents in Web Automation”. Berkeley Artificial Intelligence Research Lab, 2023.
Hiroki Furuta. “Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning”. NeurIPS Meetup Japan 2021 $^{*}$, 2021.

Academic Activitites

Reviewer for Neural Information Processing Systems (NeurIPS), 2021, 2022 (Top Reviewer), 2023 (Top Reviewer), 2024, 2025.
Reviewer for International Conference on Learning Representations (ICLR), 2022 (Highlighted Reviewer), 2023, 2024, 2025, 2026.
Reviewer for International Conference on Machine Learning (ICML), 2021, 2022, 2023, 2024, 2025.
Reviewer for IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, 2026.
Reviewer for International Conference on Computer Vision (ICCV), 2025.
Reviewer for Association for Computational Linguistics (ACL) Rolling Review, 2025.
Reviewer for Transactions on Machine Learning Research (TMLR).
Reviewer for Advanced Robotics (AR).
Co-organizer for Workshop on Robotics World Modeling at CoRL 2025.
Co-organizer for Workshop on Building Physically Plausible World Models at ICML 2025.
Co-organizer for Ecological Theory of RL Workshop at NeurIPS 2021.
Program Committee for Foundation Models for Decision Making Workshop at NeurIPS 2022, 2023.

Honors & Awards

Dean’s Award (Ph.D.) (from Graduate School of Engineering, The University of Tokyo, 2025)
Forbes JAPAN 30 UNDER 30 2023 (August, 2023)
The Japan Society for the Promotion of Science Research Fellow (DC1) (April, 2022 - March, 2025)
Dean’s Award (Master) (from Graduate School of Engineering, The University of Tokyo, 2022)
Toyota/Dwango Scholarship for Advanced Artificial Intelligence Researcher (April, 2021 - March, 2022)

Education & Experience

Research Scientist at Google DeepMind (Jan, 2025 - Present)
Ph.D. from The University of Tokyo (March, 2025)
Student Researcher at Google DeepMind (May, 2023 - Jan, 2025)
Student Researcher at Google Research, Brain Team (July, 2022 - May, 2023)
MEng from The University of Tokyo (March, 2022)
BEng from The University of Tokyo (March, 2020)