Local Conformal Calibration of Dynamics Uncertainty from Semantic Images

ArXiv PDF Poster

17th International Workshop on the Algorithmic Foundations of Robotics (WAFR) 2026

Across the three test environments, OCULAR conditions calibration on velocity, action, and semantic observations, enabling uncertainty estimates to transfer to new maps without calibration data from the deployment environment.

Abstract We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain the future system states with, at least, a user-set likelihood, despite both aleatoric and epistemic uncertainty. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state uncertainty, which is helpful for probabilistically-safe planning. We validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, while being comparatively volume-efficient to baselines requiring environment-specific data.

Problem Statement

Let st=(pt,vt)Ss_t=(p_t,v_t)\in\mathcal S and atAa_t\in\mathcal A denote a robot’s state and action. We consider discrete-time stochastic systems evolving according to unknown dynamics st+1f(st,at)s_{t+1}\sim f(s_t,a_t), while the robot observes depth and semantic images ot=(otdepth,otsemantics)o_t=(o_t^{\mathrm{depth}},o_t^{\mathrm{semantics}}). The planner uses a fixed approximate probabilistic dynamics model f~\tilde f of arbitrary fidelity; here, f~\tilde f is a linear Gaussian model whose uncertainty can be uncalibrated due to aleatoric disturbances and epistemic model mismatch.

Given a finite exchangeable calibration dataset Dcal:={(st,at,ot,st+1)i}i=1nD_{\mathrm{cal}}:=\{(s_t,a_t,o_t,s_{t+1})_i\}_{i=1}^{n} from environments different from, but visually similar to, the deployment environment, our aim is to construct an adaptive and volume-efficient prediction region C^(X)\hat{\mathcal C}(X) over Y:=st+1Y:=s_{t+1}. For a user-selected acceptable failure rate α(0,1)\alpha\in(0,1), OCULAR calibrates f~\tilde f without test-environment calibration data so that

P ⁣(YC^(X))1α. \mathbb P\!\left(Y \in \hat{\mathcal C}(X)\right) \ge 1-\alpha.

Method: OCULAR

OCULAR performs local conformal calibration using robot-frame perception, body velocity, and action information. It projects each observation into a planar semantic footprint ot=φ(ot)o'_t=\varphi(o_t), encodes that footprint with a CAE, and partitions X:=process(Xraw)=(vt,at,Encode(ot))X:=\mathrm{process}(X^{\mathrm{raw}})=(v_t,a_t,\mathrm{Encode}(o'_t)) using a DTree trained on nonconformity scores. SplitCP is performed per leaf, yielding an input-dependent threshold q^k\hat q_k and covariance scaling factor ξk\xi_k for the approximate Gaussian prediction.

OCULAR offline calibration pipeline
Offline component of OCULAR. 1: observations oio_i from DcalpartD_{cal}^{part} are projected into planar footprints OiO_i by p()p(\cdot). A CAE is trained to reconstruct OiO_i, and the decoder is discarded. 2: All data in DcalD_{cal} is processed by process()\mathrm{process}(\cdot) into a learned representation XiX_i, and nonconformity scores sis_i are computed. 3: a Decision Tree is trained on DcalpartD_{cal}^{part} to partition the learned input space X\mathcal{X} into regions of approximately constant score. 4: The holdout processed DcalCPD_{cal}^{CP} data is fed through the DTree and scores are grouped per leaf node kk. SplitCP is performed on each input-space partition Xk\mathcal{X}_k to get an input-dependent probabilistic threshold q^k\hat{q}_k.
OCULAR online uncertainty calibration pipeline
Online component of OCULAR. Given an estimated Gaussian at time tt, a desired action ata_t, and observation oto_t, we create an approximate next-step Gaussian N~t+1\tilde{\mathcal N}_{t+1} via the approximate model f~\tilde{f}. The current-time information XirawX_i^{raw} is processed, and the learned representation XiX_i passed to the Decision Tree. The resulting leaf node Xk\mathcal{X}_k has an associated q^k\hat{q}_k, which is multiplied by a fixed constant to get ξk\xi_k. The approximate uncertainty estimate is then calibrated by scaling its covariance by ξk\xi_k, and the output is passed to the following planning step.

Experiments

We validate OCULAR on a double-integrator in Isaac Sim using depth and semantic segmentation cameras across three snowy T-section environments. The white lower-friction regions are OOD relative to the linear Gaussian model; OCULAR uses calibration data from visually similar maps other than the tested map.

The individual rollout picker shows per-method inference visualizations for selected test episodes.

Map
Episode
Method

The grid comparison synchronizes all six methods on the same selected map and episode.

Map
Episode

Isaac Sim Test Cases

For held-out test cases, we propagate 10k10k Monte Carlo particles under the true dynamics and report marginal coverage by ID/OOD region. Relative volume is measured against an oracle Gaussian scaling that achieves 90%90\% coverage.

Test-case results across three Isaac Sim roads.
MetricMethodTested map
not in DcalD_{cal}?
icySideicyMainicyMiddle
IDOODIDOODIDOOD
Marginal coverage
(%)
No CPN/A90.056.790.056.790.056.7
SplitCP×99.589.6100.098.499.185.9
LUCCa×91.191.590.191.490.190.9
OCULAR (ours)91.590.190.490.191.190.6
Median volume
(relative to oracle) ↓
No CPN/A1.000.281.000.281.000.28
SplitCP×3.731.038.682.403.070.85
LUCCa×1.081.131.021.101.021.13
OCULAR (ours)1.031.021.021.021.061.06

red means coverage below 90%. Each map has 4464 test transitions.

Isaac Sim Planning Performance

We run 30 planning trials on each Isaac Sim map. Success means reaching all subgoals without collisions; failures were collisions rather than timeouts.

Planning results across three Isaac Sim roads.
MethodTested map
not in DcalD_{cal}?
Success (%) ↑Steps to completion (mean ± std) ↓
icySideicyMainicyMiddleicySideicyMainicyMiddle
No CPN/A000------
SplitCP×000------
LUCCa×100100100401.4 ± 21.2517.7 ± 9.5302.3 ± 9.1
OCULAR (ours)100100100238.6 ± 6.5302.7 ± 5.6294.7 ± 11.7

BibTeX (cite this!)

@misc{marques2026localconformalcalibrationdynamics,
      title={Local Conformal Calibration of Dynamics Uncertainty from Semantic Images},
      author={Luís Marques and Dmitry Berenson},
      year={2026},
      eprint={2605.13028},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.13028},
}