Robotic Decision Making via Diffusion Models

SVG Icon Editor

Machine learning for robot decision making

In recent years, an increasing amount of research1 has focused on enabling robots to perform diverse tasks in complex environments using machine learning (ML)-based techniques. Major technology companies, such as NVIDIA and Tesla, are also advancing efforts to introduce service robots capable of extensive human interaction in daily life. The ability to make decisions based on task requirements and environmental changes has become a crucial factor to fully integrate into human life.

ML-based approaches, such as reinforcement learning and imitation learning, leverage training data to implicitly learn task requirements, as well as the dynamics of both the system and its environment, ultimately deriving adaptive strategies for various scenarios. However, these methods encounter significant challenges, particularly in terms of model training stability and their capacity to learn multimodal behaviors.

Diffusion models in robotic decision making

Deep generative models (DGMs) have demonstrated remarkable success in natural language processing and image generation, highlighting their potential for robot policy learning. Among the family of DGMs, diffusion models3 have been widely adopted in robotics, including trajectory planning4, control6, and grasping generation5, owing to their training stability and capability for long-horizon generation.

The core idea behind diffusion models is an iterative denoising procedure, where the neural network learn how to guide the samples from noise distribution to data distribution. In the forward phase, Gaussian noise is gradually injected to a clean data sample so the data is perturbed, while the neural network is trained to denoise and reconstruct the original sample in the reverse process, as depicted in Figure 1. Similar to image generation, the robot trajectory could also be learned by diffusion models and denoised to a task-performed path during generation.

DDPM figure
Fig. 1. Forward process and reverse process in diffusion models. Adapted from: Ho et al. (2020), Denoising Diffusion Probabilistic Models3.

The challenge of diffusion-based decision making

Even though diffusion models have shown potential in robotic decision making due to the strong capability to learn high dimensional behavior, they still face several critical challenges in robotic decision-making. These include, but are not limited to:

  1. Real-time application challenges: The limited inference speed hinders practical deployment.
  2. Lack of safety guarantees: Diffusion-based policies do not inherently ensure safety.
  3. Generalization limitations: These models struggle to handle out-of-distribution scenarios beyond the training data.

Addressing these challenges remains an open research problem2, and its resolution could play a key role in enabling robots to seamlessly integrate into human environments.

Reference

[1] Ravichandar, Harish, et al. "Recent advances in robot learning from demonstration." Annual review of control, robotics, and autonomous systems 3.1 (2020): 297-330. (https://doi.org/10.1146/annurev-control-100819-063206)
[2] Huang, Tzu-Yuan, et al. "SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning." arXiv preprint arXiv:2511.05355 (2025). (https://doi.org/10.48550/arXiv.2511.05355)
[3] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851. (https://doi.org/10.48550/arXiv.2006.11239)
[4] Janner, Michael, et al. "Planning with diffusion for flexible behavior synthesis." arXiv preprint arXiv:2205.09991 (2022). (https://doi.org/10.48550/arXiv.2205.09991)
[5] Urain, Julen, et al. "Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion." 2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2023. (https://doi.org/10.48550/arXiv.2209.03855)
[6] Huang, Tzu-Yuan, et al. "Toward near-globally optimal nonlinear model predictive control via diffusion models." arXiv preprint arXiv:2412.08278 (2024). (https://doi.org/10.48550/arXiv.2412.08278)


About the Author

Tzu-Yuan Huang, M.Sc. is a research associate and doctoral candidate at the Chair of Information-Oriented Control (ITR), Technical University of Munich (TUM), under the supervision of Prof. Sandra Hirche. As a member of relAI, his research focuses on data-driven control and safe, constraint-satisfying generative models for robotic systems, including his recent work on diffusion- and flow-matching-based planning.


RELATED

  • A Beginner’s Guide to Certifiable Robustness

    Image Credit: Generated by Google Gemini 3. Machine Learning (ML) models will be a cornerstone of our technical progress in this and the following decades. Especially since the launch of ChatGPT in November 2022, the transformative power of these models across a wide range of areas in our society has become clear to the wider public. What … Read more

    ... more
  • Responsible Textual Generative Models (Part I): Generating Truthful Content

    Figure 1: Multimodal illustration (MLLM). Subfigure (a): intrinsic hallucination—the output is inconsistent with the input (no fence appears in the image). Subfigure (b): extrinsic hallucination—the output adds a geographic claim that conflicts with a widely accepted fact (the species is associated with North America, not the United Kingdom). Source: Adapted from Ji et al. (2023). 2 The … Read more

    ... more
  • Random Convolutions: A Simple Way to Boost Generalization

    Figure 1: Source: [2] AI and deep learning have recently transformed medical imaging by enabling automated analysis of complex radiological data, such as detecting lesions, segmenting organs, and predicting disease progression. These methods learn visual representations directly from large datasets and have achieved impressive results across many clinical tasks. In standard computer vision tasks, deep learning models … Read more

    ... more
  • Neuromorphic Computing: A Brain-inspired Approach to Robot Intelligence

    Figure 1: Depiction of a humanoid robot and brain-inspired neural networks. (Note: The Craiyon tool was used to generate the image of the robot.) Looking to the Brain for Next-Gen AI With the explosive advent of artificial intelligence (AI), from impressively articulate conversational agents to increasingly autonomous robots of various embodiments, it is easy to forget the … Read more

    ... more
  • Introduction to Embodied Instruction Following

    Figure: A home robot helps to place the book following human instruction. The figure is generated by Gemini 2.5 Flash AI model. Imagine asking your home robot: ”Hey, robot – can you go check if there is a blue book on the table? If so, please place it on the shelf.” This isn’t just a scene from … Read more

    ... more
  • From Unlucky Strikers to Statistical Learning Theory

    Figure: A footbal fan excited for his team. Image generated by an AI model. Suppose a new striker joins your favorite Bundesliga team. Fans are excited, the club has paid an enormous transfer fee, and expectations are huge. The new season starts. And then, he only scores a single goal in his first ten games. As a … Read more

    ... more
  • Performative Prediction

    Performative Prediction Machine learning systems are increasingly used to support decision-making processes (Fischer-Abaigar et al., 2024). Yet, these systems do not merely reflect the world—they also reshape it. Once deployed, predictions can influence behaviors, alter policies, and redirect resources, creating feedback loops that change the very data-generating processes they aim to model. Consider a traffic routing application … Read more

    ... more
  • What even is differential privacy?

    Machine learning (ML) technologies are set to revolutionize various fields and sectors. ML models can learn from text, image and various other forms of data by automatically detecting patterns. Their successful application, however, relies heavily on access to extremely large datasets (some state-of-the-art language models are trained on the whole internet). For many interesting applications, such datasets … Read more

    ... more
  • Mitigating Domain shifts

    Deep neural networks often perform well on trained data. However, on unseen data they usually fail to generalize and accompany performance degradation (Vu et al., 2019). This degradation of performance affects systems deployed in real-world environments such as processing images for self-driving cars, processing street views, generating text, and examining cells and tissues through various scanners deployed. … Read more

    ... more
  • A gentle introduction to uncertainty quantification

    Success stories about artificial intelligence (AI) focus on its remarkable predictive power. Take, for instance, your smartphone’s ability to recognize your face on photos and collect them into a “Selfies” folder ready to supply snaps for social media. When it comes to more safety-critical tasks, like using facial recognition for security at a high-stakes research lab, simple … Read more

    ... more
  • Welcome to the relAI Blog

    Welcome to the relAI blog of the Konrad Zuse School of Excellence in Reliable AI (relAI). This blog will serve as a platform to share cutting-edge research and developments from our school, highlighting the significant strides we are making towards making AI systems safer, more trustworthy, and privacy-preserving. The vision of the relAI program is to train … Read more

    ... more