Local Conformal Calibration of Dynamics Uncertainty from Semantic Images

ArXiv PDF Poster

17th International Workshop on the Algorithmic Foundations of Robotics (WAFR) 2026

(Left) SplitCP calibrates dynamics uncertainty globally, possibly resulting in overconfident and/or overconservative motion. (Right) OCULAR calibrates dynamics uncertainty conditioned on velocity, action, and observation, and does not require data from the test-time environment.

The blue marker indicates the start location and the pink markers the subgoals.

Abstract We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain future system states with at least a user-set likelihood, despite both aleatoric and epistemic uncertainty. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state uncertainty, which is helpful for probabilistically safe planning. We validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, while being comparatively volume-efficient to baselines requiring environment-specific data.

Problem Statement

Let st:=(pt,vt)s_t:=(p_t,v_t), ata_t, and oto_t denote a robot’s state, action, and observation, respectively. We consider stochastic systems evolving according to the unknown dynamics st+1f(st,at)s_{t+1}\sim f(s_t,a_t). The available approximate dynamics model f~\tilde f has arbitrary fidelity, with uncertainty arising from both external disturbances and model mismatch. In our experiments, f~\tilde f is a linear-Gaussian model.

Given access to an exchangeable dataset of robot transitions Dcal:={(st,at,ot,st+1)i}i=1nD_{\mathrm{cal}}:=\{(s_t,a_t,o_t,s_{t+1})_i\}_{i=1}^{n}, collected in environments that are different from but visually similar to the deployment environment, our goal is to construct a state-action-observation-dependent and volume-efficient prediction region C^S\hat{\mathcal C}\subseteq \mathcal S that is guaranteed to contain the unknown future system state st+1s_{t+1} with at least a user-selected likelihood 1α(0,1)1-\alpha \in(0,1), i.e., satisfy

P ⁣(st+1C^)1α. \mathbb P\!\left(s_{t+1} \in \hat{\mathcal C}\right) \ge 1-\alpha.

OCULAR calibrates the approximate dynamics model f~\tilde f to construct the prediction set C^\hat{\mathcal C} without requiring data from the test-time environment.

Method: OCULAR

OCULAR performs local conformal calibration using robot-frame perception, body velocity, and action information. Given raw calibration samples Xiraw:=(st,at,ot)iX_i^{\mathrm{raw}}:=\left(s_t,a_t,o_t\right)_i, it maps each observation to a planar semantic footprint Project(ot)\mathrm{Project}(o_t) and encodes that footprint with a CAE. The processing step constructs Xi:=Process(Xiraw)=(vt,at,Encode(Project(ot)))iX_i:=\mathrm{Process}(X_i^{\mathrm{raw}})=\left(v_t,a_t,\mathrm{Encode}(\mathrm{Project}(o_t))\right)_i, and a Regression Decision Tree partitions this processed input space using fitted scores. SplitCP is then performed per Decision Tree leaf, resulting in an input-dependent score threshold q^kR\hat q_k\in \mathbb R, which acts as a local probabilistic upper bound on motion uncertainty. The calibration data is split into two disjoint subsets Dcal=DcalpartDcalCPD_{\mathrm{cal}} = D_{\mathrm{cal}}^{\mathrm{part}} \sqcup D_{\mathrm{cal}}^{\mathrm{CP}}.

OCULAR offline calibration pipeline
Offline component of OCULAR. 1: observations oto_t from DcalpartD_{\mathrm{cal}}^{\mathrm{part}} are projected into planar footprints Project(ot)\mathrm{Project}(o_t). A CAE is trained to reconstruct these footprints, and the decoder is discarded. 2: All data in DcalD_{\mathrm{cal}} starts as raw tuples Xiraw:=(st,at,ot)iX_i^{\mathrm{raw}}:=\left(s_t,a_t,o_t\right)_i; these are processed into Xi:=Process(Xiraw)=(vt,at,Encode(Project(ot)))iX_i:=\mathrm{Process}(X_i^{\mathrm{raw}})=\left(v_t,a_t,\mathrm{Encode}(\mathrm{Project}(o_t))\right)_i and scored by Ri:=r(st+1,i,N~t+1,i)R_i := r(s_{t+1,i}, \tilde{\mathcal N}_{t+1,i}). 3: a Decision Tree is trained on DcalpartD_{\mathrm{cal}}^{\mathrm{part}} to partition the learned input space X\mathcal{X} into regions of approximately constant score. 4: The holdout processed DcalCPD_{\mathrm{cal}}^{\mathrm{CP}} data is fed through the DTree and scores are grouped per leaf node kk. SplitCP is performed on each input-space partition Xk\mathcal{X}_k to get an input-dependent probabilistic threshold q^k\hat{q}_k.
OCULAR online uncertainty calibration pipeline
Online component of OCULAR. Given an estimated Gaussian at time tt, a desired action ata_t, and observation oto_t, we create an approximate next-step Gaussian N~t+1\tilde{\mathcal N}_{t+1} via the approximate model f~\tilde f. The current raw tuple (st,at,ot)(s_t,a_t,o_t) is processed into features (vt,at,Encode(Project(ot)))(v_t,a_t,\mathrm{Encode}(\mathrm{Project}(o_t))), and this learned representation is passed to the Decision Tree. The resulting leaf node Xk\mathcal{X}_k has an associated q^k\hat{q}_k, which is multiplied by a fixed constant to get ξk\xi_k. The approximate uncertainty estimate is then calibrated by scaling its covariance by ξk\xi_k, and the output is passed to the following planning step.

Experiments

We validate OCULAR on a double-integrator in Isaac Sim using a floating camera providing depth and semantic segmentation images. We evaluate across three snowy T-junction environments. The white lower-friction regions are out-of-distribution (OOD) relative to the approximate linear-Gaussian model f~\tilde{f}, which captures the robot dynamics over the asphalt road (ID). In all experiments, OCULAR uses no data from the test map (e.g., for icyMain evaluation, we use system transition data collected in icyMiddle and icySide).

Using Monte Carlo propagation, we estimate the likelihood of the prediction regions C^S\hat{\mathcal C}\subseteq \mathcal S containing the future unknown state st+1s_{t+1}. This containment likelihood (i.e., coverage) is reported separately for the nominal road regions (ID relative to f~\tilde f) and the icy regions (OOD relative to f~\tilde f). Prediction region volume-efficiency is reported as a ratio relative to a linear-Gaussian oracle with access to the unknown ground-truth dynamics.

Test-case results across three Isaac Sim roads.

MethodTested map
not in DcalD_{\mathrm{cal}}?
icySide IDicySide OODicyMain IDicyMain OODicyMiddle IDicyMiddle OOD
Marginal coverage (%)
NoCPN/A90.056.790.056.790.056.7
SplitCP×99.589.6100.098.499.185.9
LUCCa×91.191.590.191.490.190.9
OCULAR (ours)91.590.190.490.191.190.6
Median volume (relative to oracle)
NoCPN/A1.000.281.000.281.000.28
SplitCP×3.731.038.682.403.070.85
LUCCa×1.081.131.021.101.021.13
OCULAR (ours)1.031.021.021.021.061.06

red means coverage below 90%. Prediction region volume ratio is reported relative to an oracle using the minimum scaling factor needed to achieve 90% coverage. Each map has 4464 test transitions.

OCULAR calibrates the approximate dynamics on both nominal and icy conditions without sacrificing volume efficiency relative to baselines using environment-specific data.

We also conduct motion planning experiments to demonstrate the utility of our method for probabilistically safe MPC under model mismatch and external perturbations. For all methods the objective function includes a distance-to-subgoal term, a collision penalty, and a penalty for transitions with high estimated uncertainty.

Map
Episode
Method

We observe that using the uncalibrated dynamics directly leads to gaining significant momentum over ice and hence collisions. SplitCP can produce overconfident (i.e., collisions) or overconservative (i.e., timeouts) behavior depending on the calibration data distribution. LUCCa is a baseline requiring robot location data in an inertial frame and hence needs transitions collected in the test environment. OCULAR moves more slowly over ice and faster over the nominal road, being both safe and efficient. By using perception information, it does not require data from the test environment. Below we visualize rollouts from all compared methods at once.

Map
Episode

These results indicate that OCULAR can generalize to new unseen environments and achieve both adequate planning performance and safety by leveraging perception information obtained in other environments. Below are numerical results.

Planning results across three Isaac Sim roads (30 trials each).

MethodTested map
not in DcalD_{\mathrm{cal}}?
Success (%) Steps to completion (mean ± std)
icySideicyMainicyMiddleicySideicyMainicyMiddle
NoCPN/A000------
SplitCP×000------
LUCCa×100100100401.4 ± 21.2517.7 ± 9.5302.3 ± 9.1
OCULAR (ours)100100100238.6 ± 6.5302.7 ± 5.6294.7 ± 11.7

Success = reaching all subgoals without collisions.

BibTeX (cite this!)

@misc{marques2026localconformalcalibrationdynamics,
      title={Local Conformal Calibration of Dynamics Uncertainty from Semantic Images},
      author={Luís Marques and Dmitry Berenson},
      year={2026},
      eprint={2605.13028},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.13028},
}