Across the three test environments, OCULAR conditions calibration on velocity, action, and semantic observations, enabling uncertainty estimates to transfer to new maps without calibration data from the deployment environment.
Abstract We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain the future system states with, at least, a user-set likelihood, despite both aleatoric and epistemic uncertainty. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state uncertainty, which is helpful for probabilistically-safe planning. We validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, while being comparatively volume-efficient to baselines requiring environment-specific data.
Problem Statement
Let and denote a robot’s state and action. We consider discrete-time stochastic systems evolving according to unknown dynamics , while the robot observes depth and semantic images . The planner uses a fixed approximate probabilistic dynamics model of arbitrary fidelity; here, is a linear Gaussian model whose uncertainty can be uncalibrated due to aleatoric disturbances and epistemic model mismatch.
Given a finite exchangeable calibration dataset from environments different from, but visually similar to, the deployment environment, our aim is to construct an adaptive and volume-efficient prediction region over . For a user-selected acceptable failure rate , OCULAR calibrates without test-environment calibration data so that
Method: OCULAR
OCULAR performs local conformal calibration using robot-frame perception, body velocity, and action information. It projects each observation into a planar semantic footprint , encodes that footprint with a CAE, and partitions using a DTree trained on nonconformity scores. SplitCP is performed per leaf, yielding an input-dependent threshold and covariance scaling factor for the approximate Gaussian prediction.

Experiments
We validate OCULAR on a double-integrator in Isaac Sim using depth and semantic segmentation cameras across three snowy T-section environments. The white lower-friction regions are OOD relative to the linear Gaussian model; OCULAR uses calibration data from visually similar maps other than the tested map.
The three tested Isaac Sim environments (based off Rivermark).
The individual rollout picker shows per-method inference visualizations for selected test episodes.
Video for this map/episode/method is not available yet.
The grid comparison synchronizes all six methods on the same selected map and episode.
Video for this map/episode is not available yet.
Isaac Sim Test Cases
For held-out test cases, we propagate Monte Carlo particles under the true dynamics and report marginal coverage by ID/OOD region. Relative volume is measured against an oracle Gaussian scaling that achieves coverage.
| Metric | Method | Tested map not in ? | icySide | icyMain | icyMiddle | |||
|---|---|---|---|---|---|---|---|---|
| ID | OOD | ID | OOD | ID | OOD | |||
| Marginal coverage (%) | No CP | N/A | 90.0 | 56.7 | 90.0 | 56.7 | 90.0 | 56.7 |
| SplitCP | × | 99.5 | 89.6 | 100.0 | 98.4 | 99.1 | 85.9 | |
| LUCCa | × | 91.1 | 91.5 | 90.1 | 91.4 | 90.1 | 90.9 | |
| OCULAR (ours) | ✓ | 91.5 | 90.1 | 90.4 | 90.1 | 91.1 | 90.6 | |
| Median volume (relative to oracle) ↓ | No CP | N/A | 1.00 | 0.28 | 1.00 | 0.28 | 1.00 | 0.28 |
| SplitCP | × | 3.73 | 1.03 | 8.68 | 2.40 | 3.07 | 0.85 | |
| LUCCa | × | 1.08 | 1.13 | 1.02 | 1.10 | 1.02 | 1.13 | |
| OCULAR (ours) | ✓ | 1.03 | 1.02 | 1.02 | 1.02 | 1.06 | 1.06 | |
red means coverage below 90%. Each map has 4464 test transitions.
Isaac Sim Planning Performance
We run 30 planning trials on each Isaac Sim map. Success means reaching all subgoals without collisions; failures were collisions rather than timeouts.
| Method | Tested map not in ? | Success (%) ↑ | Steps to completion (mean ± std) ↓ | ||||
|---|---|---|---|---|---|---|---|
| icySide | icyMain | icyMiddle | icySide | icyMain | icyMiddle | ||
| No CP | N/A | 0 | 0 | 0 | -- | -- | -- |
| SplitCP | × | 0 | 0 | 0 | -- | -- | -- |
| LUCCa | × | 100 | 100 | 100 | 401.4 ± 21.2 | 517.7 ± 9.5 | 302.3 ± 9.1 |
| OCULAR (ours) | ✓ | 100 | 100 | 100 | 238.6 ± 6.5 | 302.7 ± 5.6 | 294.7 ± 11.7 |
BibTeX (cite this!)
@misc{marques2026localconformalcalibrationdynamics,
title={Local Conformal Calibration of Dynamics Uncertainty from Semantic Images},
author={Luís Marques and Dmitry Berenson},
year={2026},
eprint={2605.13028},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2605.13028},
}