Abstention-Aware Personalized Object Rearrangement

Note: This work led to my first accepted research paper, “Abstention-Aware Personalized Object Rearrangement via Uncertainty-Guided LLM Assistance”, written with Dr. Ali Ayub and accepted at the 2026 IEEE 35th International Conference on Robot and Human Interactive Communication (RO-MAN 2026). The paper is available on arXiv, and the code is available in the APOLLO repository.

Abstract

Robotic assistants in homes need to do more than predict where an object should go. They also need to recognize when an object should remain unplaced, when a scene is too ambiguous, or when the available examples do not justify a confident action.

In this project, we introduce APOLLO, a hybrid framework for abstention-aware personalized object rearrangement. APOLLO combines a lightweight personalized embedding model with uncertainty-guided LLM assistance. The local model learns a user’s placement preferences from a small number of demonstrations and produces uncertainty estimates. Those estimates are then used to decide whether the system should place an object directly, abstain, or ask an LLM for additional reasoning on difficult cases.

We also introduce APOR, a synthetic benchmark designed to evaluate this setting more realistically. APOR includes multi-furniture rooms, diverse user organization profiles, explicit abstention behavior, and noisy partial scene context. Experiments on APOR and PARSEC show that APOLLO can improve over prior LLM-based baselines while reducing how often expensive LLM calls are needed.

Method

The core idea is to separate fast personalization from expensive reasoning. For each user-environment pair, APOLLO trains a small personalized embedding model on a few demonstrations. This model embeds objects and candidate surfaces, scores possible placements, and provides uncertainty estimates for its predictions.

When the model is confident, it acts locally. When it is uncertain, the router can defer the decision to an LLM-based reasoner or choose abstention if placing the object would be inappropriate. This makes the system more efficient than calling an LLM for every object, while still keeping reasoning capacity for cases where simple learned preferences are not enough.

The project also required building the APOR dataset generation pipeline. User profiles define different organization strategies, environments define room and furniture context, and generated arrangements include both placed and unplaced objects. This made it possible to test not only placement accuracy, but also whether a model understands when no placement should be made.

What I learned

Research framing matters as much as implementation. Turning a technical idea into a paper required defining a clear problem, separating it from prior personalized rearrangement work, and making abstention a first-class part of the task.
Uncertainty is practical, not just theoretical. The project made me think about confidence as a routing signal: when to trust a small model, when to abstain, and when to spend compute on a stronger reasoner.
Dataset design shapes what a model can learn. Building APOR showed me how much the benchmark controls the behavior that can be evaluated, especially for personalization, noisy context, and unplaced objects.
Hybrid AI systems involve trade-offs. APOLLO forced me to balance accuracy, cost, privacy, latency, and interpretability instead of optimizing a single metric.
Writing an accepted paper is an engineering process. I learned to iterate on experiments, ablations, baselines, figures, code organization, and wording until the work became reproducible and defensible.

Abstention-Aware Personalized Object Rearrangement

Abstract

Method

What I learned

Links