WeTalkRobots the AI and Robotics Research Archive

WeTalkRobots curates and summarizes the latest advancements in AI, robotics, humanoid control, reinforcement learning, and vision-language models. Explore research papers, podcasts, and videos focused on robot learning, manipulation, and autonomous systems. It holds a curated list of research papers, with summaries, images, and audio podcasts.

Browse some posts: F1VLA, InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions, ASAP: Aligning Simulation and Real-world Physics, DeepMimic, DreamVLA, Figure Helix VLA, Hitter: Humanoid Table Tennis Robot, OmniRetarget, Open-source Robot VLAs and VLMs, RDT-1: Robotic Diffusion Transformer, ResMimic, StarVLA: VLA Model Codebase, TWIST: Teleoperated Whole-Body Imitation System, SoftMimic: Learning Compliant Whole-body Control from Examples, Igniting VLMs toward the Embodied Space, EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration, Vision-Language Agent (VLA) from Scratch, BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion, VLAb: A Modular and Extensible Research Platform for Vision-Language Models, Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision–Language–Action Models via Latent Iterative Reasoning, Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera, Reconstructing Hands in 3D with Transformers, LARGE VIDEO PLANNER ENABLES GENERALIZABLE ROBOT CONTROL, ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation, COMPASS: Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis, SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control, DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos, DREAMGEN: Unlocking Generalization in Robot Learning through Video World Models, ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning, Scalable and General Whole-Body Control for Cross-Humanoid Locomotion, EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data, OmniXtreme: Breaking the Generality Barrier in High-Dynamic Humanoid Control, Masquerade: Learning from In-the-wild Human Videos using Data-Editing, Phantom: Training Robots Without Robots Using Only Human Videos, Interactive World Simulator for Robot Policy Training and Evaluation, MEM: Multi-Scale Embodied Memory for Vision Language Action Models, Evaluating Gemini Robotics Policies in a Veo World Simulator, WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL, Ψ0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation, TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System, GigaWorld-0: World Models as Data Engine to Empower Embodied AI, World Action Models are Zero-shot Policies, GigaWorld-Policy: An Efficient Action-Centered World–Action Model, Embodied COT, RL Token: Bootstrapping Online RL with Vision-Language-Action Models, Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos, π∗0.6: a VLA That Learns From Experience, ROBOMETER: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons, Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization, ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning, π0.5: a Vision-Language-Action Model with Open-World Generalization, π0: A Vision-Language-Action Flow Model for General Robot Control, EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos, π0.7: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities, SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning, ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents, CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation, TidyBot-Universe: A Modular and Service-Oriented Framework for Robotic Tidy-Up Tasks, Dexbotic: Open-Source Vision-Language-Action Toolbox, RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation, π3: PERMUTATION-EQUIVARIANT VISUAL GEOMETRY LEARNING, ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training, RoboClaw: A Universal and Flexible Robotic Claw System for High-Performance Grasping, LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction, PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models, VGGT-Ω, mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs, NaVILA: Legged Robot Vision-Language-Action Model for Navigation, UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers, DIT4DIT: JOINTLY MODELING VIDEO DYNAMICS AND ACTIONS FOR GENERALIZABLE ROBOT CONTROL, HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos, WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild, Being-H0.7: A Latent World-Action Model from Egocentric Videos, Fast-WAM: Do World Action Models Need Test-time Future Imagination?, Cross-Hand Latent Representation for Vision-Language-Action Models, Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Categories: VLA, Humanoid Robots, Open-source VLA, VLA Loco-manipulation, Learning from Humans, Navigation, Cross-embodiment Learning, Video Generative Models, Reasoning VLA, Memory for VLA, World Action Models, RL for VLA, Agentic AI, 3D Reconstruction, Legged Robots