Gemini Robotics-ER 1.5: Features, Benchmarks, and How to Get Started

Introduction

Robots are only useful when they can see, think, and act. Gemini Robotics-ER 1.5 is Google's newest model built exactly for that. It brings embodied reasoning AI to real robots, so they can understand a scene, plan a task, and execute steps safely.

You can try it today through the Gemini API for robots or the Google AI Studio robotics tools. That means faster prototyping, clearer demos, and fewer custom hacks to get a robot moving with confidence. (See: Developers blog)

In this first half, you'll learn what the model is, how it thinks, and what it can do in the lab and in the real world. Keep scrolling to see where it truly shines.

What is Gemini Robotics-ER 1.5?

Gemini Robotics-ER 1.5 sits inside the Google Gemini ecosystem as a robotics-focused AI. Unlike general language or vision models, it is tuned to be the brain for robot control. It takes a plain-English command, looks at camera input, builds a plan, and calls tools or APIs to act.

Think of it like a careful pilot: it reads the map (video), asks "what's the goal?", and then charts the safest route step by step. The goal is robust, high-level reasoning that holds up in messy, real spaces—not just in clean demos. (See: DeepMind blog, Ars Technica)

Inside the broader Google Gemini Robotics model family, ER 1.5 integrates multi-modal inputs, spatial understanding, planning tools, and safety checks. (Docs: Robotics overview)

Core Features of Gemini Robotics-ER 1.5

Spatial reasoning AI

The model maps a scene fast and with high precision. It can point to exact 2D points, draw boxes, and label objects for robot vision and labeling. Picture a robot in a kitchen: it can spot the dish rack, faucet, dish soap, or rice cooker, and choose a safe path to reach them. (Docs: Robotics overview)

Better localization → fewer misses when grabbing items.
Clearer labels → lower error rates in downstream planners.
Faster mapping → smoother motions and shorter task time.

Example: "Place the sponge next to the dish soap." The model identifies the bottle, computes a 2D target point, plans a short path, and places the sponge without knocking over nearby items.

Agentic behavior in robots

ER 1.5 supports agentic behavior: it breaks a complex command into ordered steps and adapts when the world changes. If a cup falls, it pauses, clears the path, and resumes—without a full reset. (See: Developers blog, The Robot Report)

Goal: "Tidy the counter."
Step 1: Identify dishes and group by type.
Step 2: Move plates to the dish rack.
Step 3: Wipe spills near the faucet.
Step 4: Put the rice cooker back in its spot.

Flexible thinking budget

Not every task needs the same level of "thinking." With a flexible thinking budget, you tune compute per request. Quick checks use a light budget; tricky tasks can use more steps for higher accuracy. (See: Developers blog)

High-speed pick-and-place → minimize latency.
Fragile handling → increase deliberation for safety.
Long-horizon chores → add budget to key decision points.

Improved safety filters

Robots share space with people. ER 1.5 adds improved safety filters to reduce risky moves and keep interactions controlled. Policies and guardrails catch unsafe intents early; simulation checks can stop a plan before motion. (Policy: Responsibly advancing AI & robotics)

Fewer near-collisions around hands, faces, and tools.
Blocklists for off-limits zones (hot surfaces, blades).
Clear fallbacks: pause, alert, or request human help.

Performance Benchmarks

Robotics pointing benchmark

A key test for spatial reasoning is the robotics pointing benchmark—the model must point to the correct item after "point to the dish rack." ER 1.5 surpasses 50% accuracy on advanced pointing tasks. (See: DeepMind blog, The Robot Report)

Proxy for precise object reference.
Stress-tests perception under occlusions and glare.
Ties language to coordinates a robot can act on.

2D points, robot vision and labeling

ER 1.5 can emit 2D points and structured labels in real time. That feeds motion and grasp planners without extra glue code and improves hand-eye coordination. (Docs: Robotics overview)

Faster grasp acquisition on small items.
Better repeatability for place actions.
Cleaner handoffs from perception to control.

Beyond single tests

ER 1.5 performs well across embodied tasks, including long-horizon chores that need planning, memory, and recovery. Teams have observed skill transfer across robot types. (See: Developers blog, Ars Technica)

Track success rate, interventions/hour, safety stops.
Measure latency vs. quality trade-offs.
Validate robustness across lighting and camera angles.

Practical Demonstrations

Given a prompt like "Point to dish soap, dish rack, faucet, and rice cooker," the model labels each item and keeps track as a robot arm moves them. This shows tight coupling between perception and AI for robot control. (See: DeepMind blog, Developers blog)

Visual tagging: Live labels with confidence scores.
Pick-and-place: Grasp dishware and stack in the dish rack.
Context checks: Pause if the faucet is running.
Recovery moves: If the rice cooker shifts, update the point and retry.

How to Try Gemini Robotics-ER 1.5

Google AI Studio robotics — quick start

You can test the model without wiring a full stack. Use Google AI Studio robotics to run demos, view labels, and export results.

Sign in and create a new robotics project.
Upload or stream a short camera feed.
Prompt: "Point to dish soap, dish rack, faucet, and rice cooker."
Enable outputs for 2D points and robot vision and labeling.
Inspect labels, points, confidence; export logs.

Tips: steady lighting, fixed camera first, start with few objects, track time and near-misses. (See: Developers blog)

Gemini API for robots — integration paths

When ready to build, call the Gemini API from your control loop. Send frames plus a goal. Receive labels, 2D points, and plans.

ROS/ROS 2 node: API outputs to MoveIt or custom MPC.
Edge gateway: batch frames, add prompts, stream to arm controller.
Simulator-first: validate in a digital twin before hardware.

Checklist: normalize intrinsics, timestamp frames, set a flexible thinking budget, layer improved safety filters, and log everything. (See: Ars Technica)

Latency hygiene: resize frames, send deltas, cache static context.

The Future of Embodied Reasoning AI in Robotics

Embodied reasoning AI brings "common sense" to machines: map what they see to what they should do, and adjust mid-task. That means fewer brittle scripts and more robust autonomy. (Policy: Responsibly advancing AI & robotics)

Manufacturing: assembly, kitting, quick re-tasking.
Smart homes: fetch-and-carry, load the dish rack, wipe near the faucet.
Logistics: mixed-SKU picking, pallet rework, safer shared aisles.
Healthcare: supply runs, gentle handoffs with improved safety filters.
Service robots: room resets, bussing, real-time robot vision and labeling.

The Gemini family supplies the high-level brain. ER 1.5 adds embodied parsing, better plans, and guardrails. Two standouts: skill transfer and a tunable thinking budget. (See: Developers blog, The Robot Report)

Conclusion

Robots need sight, sense, and restraint. Gemini Robotics-ER 1.5 brings all three in a package you can try today. In minutes, you can label scenes, point to targets, and drive real motion with AI for robot control. In weeks, you can move from demos to pilots that stand up to clutter and change. (See: DeepMind blog)

Use Google AI Studio robotics for quick trials. Wire the Gemini API for robots into your stack for production. Log well, tune the flexible thinking budget, and layer improved safety filters. Do that, and you'll ship robots that feel sharp, careful, and useful—at once.

The path from lab to floor gets shorter when your model plans like a teammate. That's why Gemini Robotics-ER 1.5 is more than a benchmark win—it's a practical way to put embodied reasoning AI to work now.

FAQ

1) How do I access the model?

Use Google AI Studio robotics for rapid tests, or call the Gemini API from your app.

2) Does it only work with Google hardware?

No. Send frames and prompts from most robots or simulators via API. ROS/ROS 2 and edge gateways are common paths. (Docs: Robotics overview)

3) What outputs can I get for perception?

2D points, labels, and other structured signals that anchor language to actable coordinates. (Docs: Robotics overview)

4) What is the robotics pointing benchmark and why should I care?

It tests whether the model can point to the correct item after a simple instruction. ER 1.5 surpasses 50% on advanced pointing tasks. (See: DeepMind blog, The Robot Report)

5) How do I tune speed vs. accuracy?

Adjust the flexible thinking budget. Use low budgets for fast, easy actions and higher budgets for delicate moves or long-horizon tasks. (See: Developers blog)

6) What safety measures are included?

Improved safety filters reduce risky actions. Pair them with robot limits (speed, torque, exclusion zones) and human-in-the-loop checks. (Policy: Responsibly advancing AI & robotics)

7) Can I prototype in simulation first?

Yes. Stream frames from a digital twin to the API, validate plans, then move to hardware. (Docs: Robotics overview)

8) How should I measure success?

Track end-to-end success rate, sub-task success, time-to-first-action, near-misses, safety stops, and human interventions.

9) Does it replace my motion planner?

No. Think of it as the high-level brain. It labels, reasons, and proposes steps; your motion planner executes trajectories and dynamics.

10) What types of tasks benefit most?

Cluttered pick-and-place, sorting, resets, and multi-step chores where agentic behavior in robots and precise 2D points pay off.

11) How is it different from a general LLM?

It's tuned for robots: fuses vision, language, and planning; supports robot vision and labeling; provides safety-aware outputs; and integrates with robot stacks. (See: DeepMind blog)

12) What's a good first project?

A kitchen demo with 3–5 objects: label, point, pick, place. Add clutter, then add motion. Log everything. Scale to bin-picking or cleaning once metrics are stable.