(Left) SplitCP calibrates dynamics uncertainty globally, possibly resulting in overconfident and/or overconservative motion. (Right) OCULAR calibrates dynamics uncertainty conditioned on velocity, action, and observation, and does not require data from the test-time environment.
The blue marker indicates the start location and the pink markers the subgoals.
Abstract We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain future system states with at least a user-set likelihood, despite both aleatoric and epistemic uncertainty. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state uncertainty, which is helpful for probabilistically safe planning. We validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, while being comparatively volume-efficient to baselines requiring environment-specific data.
Problem Statement
Let , , and denote a robot’s state, action, and observation, respectively. We consider stochastic systems evolving according to the unknown dynamics . The available approximate dynamics model has arbitrary fidelity, with uncertainty arising from both external disturbances and model mismatch. In our experiments, is a linear-Gaussian model.
Given access to an exchangeable dataset of robot transitions , collected in environments that are different from but visually similar to the deployment environment, our goal is to construct a state-action-observation-dependent and volume-efficient prediction region that is guaranteed to contain the unknown future system state with at least a user-selected likelihood , i.e., satisfy
OCULAR calibrates the approximate dynamics model to construct the prediction set without requiring data from the test-time environment.
Method: OCULAR
OCULAR performs local conformal calibration using robot-frame perception, body velocity, and action information. Given raw calibration samples , it maps each observation to a planar semantic footprint and encodes that footprint with a CAE. The processing step constructs , and a Regression Decision Tree partitions this processed input space using fitted scores. SplitCP is then performed per Decision Tree leaf, resulting in an input-dependent score threshold , which acts as a local probabilistic upper bound on motion uncertainty. The calibration data is split into two disjoint subsets .
Experiments
We validate OCULAR on a double-integrator in Isaac Sim using a floating camera providing depth and semantic segmentation images. We evaluate across three snowy T-junction environments. The white lower-friction regions are out-of-distribution (OOD) relative to the approximate linear-Gaussian model , which captures the robot dynamics over the asphalt road (ID). In all experiments, OCULAR uses no data from the test map (e.g., for icyMain evaluation, we use system transition data collected in icyMiddle and icySide).
The three tested Isaac Sim environments, based on Rivermark.
Using Monte Carlo propagation, we estimate the likelihood of the prediction regions containing the future unknown state . This containment likelihood (i.e., coverage) is reported separately for the nominal road regions (ID relative to ) and the icy regions (OOD relative to ). Prediction region volume-efficiency is reported as a ratio relative to a linear-Gaussian oracle with access to the unknown ground-truth dynamics.
Test-case results across three Isaac Sim roads.
| Method | Tested map not in ? | icySide ID | icySide OOD | icyMain ID | icyMain OOD | icyMiddle ID | icyMiddle OOD |
|---|---|---|---|---|---|---|---|
| Marginal coverage (%) | |||||||
| NoCP | N/A | 90.0 | 56.7 | 90.0 | 56.7 | 90.0 | 56.7 |
| SplitCP | × | 99.5 | 89.6 | 100.0 | 98.4 | 99.1 | 85.9 |
| LUCCa | × | 91.1 | 91.5 | 90.1 | 91.4 | 90.1 | 90.9 |
| OCULAR (ours) | ✓ | 91.5 | 90.1 | 90.4 | 90.1 | 91.1 | 90.6 |
| Median volume (relative to oracle) | |||||||
| NoCP | N/A | 1.00 | 0.28 | 1.00 | 0.28 | 1.00 | 0.28 |
| SplitCP | × | 3.73 | 1.03 | 8.68 | 2.40 | 3.07 | 0.85 |
| LUCCa | × | 1.08 | 1.13 | 1.02 | 1.10 | 1.02 | 1.13 |
| OCULAR (ours) | ✓ | 1.03 | 1.02 | 1.02 | 1.02 | 1.06 | 1.06 |
red means coverage below 90%. Prediction region volume ratio is reported relative to an oracle using the minimum scaling factor needed to achieve 90% coverage. Each map has 4464 test transitions.
OCULAR calibrates the approximate dynamics on both nominal and icy conditions without sacrificing volume efficiency relative to baselines using environment-specific data.
We also conduct motion planning experiments to demonstrate the utility of our method for probabilistically safe MPC under model mismatch and external perturbations. For all methods the objective function includes a distance-to-subgoal term, a collision penalty, and a penalty for transitions with high estimated uncertainty.
Video for this map/episode/method is not available yet.
We observe that using the uncalibrated dynamics directly leads to gaining significant momentum over ice and hence collisions. SplitCP can produce overconfident (i.e., collisions) or overconservative (i.e., timeouts) behavior depending on the calibration data distribution. LUCCa is a baseline requiring robot location data in an inertial frame and hence needs transitions collected in the test environment. OCULAR moves more slowly over ice and faster over the nominal road, being both safe and efficient. By using perception information, it does not require data from the test environment. Below we visualize rollouts from all compared methods at once.
Video for this map/episode is not available yet.
These results indicate that OCULAR can generalize to new unseen environments and achieve both adequate planning performance and safety by leveraging perception information obtained in other environments. Below are numerical results.
Planning results across three Isaac Sim roads (30 trials each).
| Method | Tested map not in ? | Success (%) | Steps to completion (mean ± std) | ||||
|---|---|---|---|---|---|---|---|
| icySide | icyMain | icyMiddle | icySide | icyMain | icyMiddle | ||
| NoCP | N/A | 0 | 0 | 0 | -- | -- | -- |
| SplitCP | × | 0 | 0 | 0 | -- | -- | -- |
| LUCCa | × | 100 | 100 | 100 | 401.4 ± 21.2 | 517.7 ± 9.5 | 302.3 ± 9.1 |
| OCULAR (ours) | ✓ | 100 | 100 | 100 | 238.6 ± 6.5 | 302.7 ± 5.6 | 294.7 ± 11.7 |
Success = reaching all subgoals without collisions.
BibTeX (cite this!)
@misc{marques2026localconformalcalibrationdynamics,
title={Local Conformal Calibration of Dynamics Uncertainty from Semantic Images},
author={Luís Marques and Dmitry Berenson},
year={2026},
eprint={2605.13028},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2605.13028},
}