AI Safety Research

Back to portfolio

AI Safety Research

2504.13180v1PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Thu Apr 17 2025

Vision-language models are integral to computer vision research. Many high-performing models remain closed-source, obscuring their data, design and training recipe. We analyze standard training pipelines without distillation from proprietary models. We explore large-scale synthetic data to identify critical data gaps.

Keywords: video understanding,vision language,video captions,challenging video,videobench

2504.13178v1Aligning Constraint Generation with Design Intent in Parametric CAD

Thu Apr 17 2025

Engineering sketches consist of geometric primitives (e.g. points, lines) connected by constraints that define the relationships between them. We adapt alignment techniques from reasoning LLMs to the task of generating sketch constraints found in computer-aided design (CAD) models.

Keywords: sketch constraints,generate cad,constraint generation,generate constraints,generative cad

2504.13171v1Sleep-time Compute: Beyond Inference Scaling at Test-time

Thu Apr 17 2025

Sleep-time compute allows models to"think" offline about contexts before queries are presented. By anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time.

Keywords: query gsm,stateful gsm,gsm symbolic,gsm,anticipating queries

2504.13165v1RUKA: Rethinking the Design of Humanoid Hands with Learning

Thu Apr 17 2025

RUKA is a tendon-driven humanoid hand that is compact, affordable, and capable. Made from 3D-printed parts and off-the-shelf components, RUKA has 5 fingers with 15 underactuated degrees of freedom enabling diverse human-likegrasps

Keywords: humanoid hand,robotic hands,leveraging hand,hands teleoperation,grasping

2504.13151v1MIB: A Mechanistic Interpretability Benchmark

Thu Apr 17 2025

MIB favors methods that precisely and concisely recover relevant causalpathways or specific causal variables in neural language models. For causal variable localization, we find that the supervised DASmethod performs best. SAE features are not better than neurons, i.e.,standard dimensions of hidden vectors

Keywords: autoencoders,sparse autoencoders,autoencoders saes,distributed alignment,mechanistic interpretability

2504.13134v1Energy-Based Reward Models for Robust Language Model Alignment

Thu Apr 17 2025

Energy-Based Reward Model (EBRM) is a lightweight post-hoc framework that enhances RM robustness and generalization. EBRM models the reward distribution explicitly, capturing uncertainty in humanpreferences and mitigating the impact of noisy or misaligned annotations.

Keywords: reward models,models reward,reward model,based reward,language models

2504.13059v1RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins

Thu Apr 17 2025

RoboTwin uses 3D generative foundation models to produce diverse expert datasets. It also introduces a spatial relation-aware codegeneration framework. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples demonstrate significant potential for enhancing dual

Keywords: trained robotwin,robotwin generative,robotics dual,magic robot,robotic tasks

2504.13048v1Design Topological Materials by Reinforcement Fine-Tuned Generative Model

Thu Apr 17 2025

Topological insulators (TIs) and topological crystalline insulator (TCIs) are valuable for practical applications. Such materials, particularly those with a full band gap, remain scarce. We apply reinforcement fine-tuning to a pre-trained generative model.

Keywords: topological insulators,topological materials,topological crystalline,crystalline insulators,materials generative

2504.13042v1Event-Enhanced Blurry Video Super-Resolution

Thu Apr 17 2025

Current BVSR methods often fail to restore sharp details at high resolutions. We propose a novel event-enhanced network, Ev-DeblurVSR. On real data, our method is +2.59 dB more accurate and 7.28$\times$ faster than

Keywords: feature deblurring,deblurvsr effectively,blurry video,deblurring,deblur frame

2504.13035v1Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval

Thu Apr 17 2025

A PRVR framework encodes diverse contexts within a video into a fixed number of prototypes. We introduce strategies to enhance text association and video understanding within prototypes. To keep the prototypessearchable via text queries while accurately encoding video contexts, we implement cross- and uni-modal reconstruction

Keywords: video contexts,video retrieval,contexts video,video understanding,context representations

2504.12966v1Vision and Language Integration for Domain Generalization

Thu Apr 17 2025

We propose VLCA, which combine language space and vision space. We connect the multiple image domains by using semantic space as the bridge domain. In the end, the languagerepresentation is aligned with the vision representation through the multimodal space of text and image.

Keywords: image domains,domain generalization,multimodal space,capture semantic,image features

2504.12911v1Benchmarking Multi-National Value Alignment for Large Language Models

Thu Apr 17 2025

NaVAB is a comprehensive benchmark to evaluate the alignment of LLMs with the values of five major nations. NaVAB implements a national value extraction pipeline to efficiently construct value assessmentdatasets. It can be combined with alignment techniques to effectively reduce value concerns.

Keywords: country values,national values,value extraction,assessment datasets,value assessment

2504.12875v1A Client-level Assessment of Collaborative Backdoor Poisoning in Non-IID Federated Learning

Thu Apr 17 2025

Federated learning (FL) enables collaborative model training usingdecentralized private data from multiple clients. While FL has shown robustness against poisoning attacks with basic defenses, our research reveals new vulnerabilities stemming from non-independent and identically distributed data among clients. These vulnerabilities pose a

Keywords: malicious gradients,backdoor attacks,collaborative backdoor,backdoor defenses,federated learning

2504.12833v1Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Thu Apr 17 2025

We present a novel approach to training specialized instruction-based image-editing diffusion models. We create an online reinforcement learning framework that aligns the diffusionmodel with human preferences. Thisapproach simplifies users' efforts to achieve highly specific edits.

Keywords: intricate edits,visual edits,editing diffusion,image editing,scenes maintaining

2504.12734v1Pandora: A Code-Driven Large Language Model Agent for Unified Reasoning Across Diverse Structured Knowledge

Thu Apr 17 2025

Unified Structured Knowledge Reasoning (USKR) aims to answer natural languagequestions (NLQs) by using structured sources such as tables, databases, andknowledge graphs in a unified way. Existing USKR methods either rely onemploying task-specific strategies or custom

Keywords: textual reasoning,unified knowledge,structured knowledge,knowledge representation,knowledge reasoning

2504.12722v1SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation

Thu Apr 17 2025

SimUSER is an agentframework that serves as believable and cost-effective human proxies. It identifies self-consistent personas from historical data, enriching userprofiles with unique backgrounds and personalities. Users equipped with persona, memory, perception, and brain modules engage in interactions with the

Keywords: recommender simuser,profiles,recommender parameters,recommender systems,recommender

2504.12712v1Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

Thu Apr 17 2025

We study continual learning on multiple linear classification tasks bysequentially running gradient descent (GD) for a fixed budget of iterations pertask. When all tasks are jointly linearly separable and are presented in a cyclic/random order, we show the directional convergence of the trained linear class

Keywords: continual learning,catastrophic forgetting,averaged forgetting,forgetting,transfer forgetting

2504.12700v1A Two-Phase Perspective on Deep Learning Dynamics

Thu Apr 17 2025

We propose that learning in deep neural networks proceeds in two phases: arapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck.

Keywords: learning deep,deep neural,generalization training,neural,learning