Supervised Program on Alignment Research (SPAR) 2024

1 minute read

Published:

About SPAR

SPAR is an effort to connect students and researchers with mentors working on AI Alignment and AI Safety issues. I worked in the SatisfIA Project under the SPAR program in 2024, working on reinforcement learning without maximizing rewards. The team was led by Jobst Heitzig of Postdam Institute for Climate Impact Research.

Overview

Previous work by the SatisfIA team worked on aspiration-based learning, where they design algorithms that do not maximize rewards, but make sure that the expected total reward obtained via a Markov Decision Process (MDP) lies in a particular set which they call the “aspiration set”. For further background on their work, I would suggest you look at their paper Non-maximizing policies that fulfil multi-criterion aspirations in expectation.

The topic I worked on during SPAR was to analyze the probability distribution of the total in a generic Markov decision process, as an idea about this density provides a better idea on what is the probability that the total lies in a given aspiration set, compared to only the expectation value. As a first step, we started with single-criterion aspirations and looked at its extension to multi-criteria aspirations.

Preliminary Results

Our findings for the total distribution mainly lies in the conjecture that the total distribution for a simple MDP and a fixed policy, is distributed as a mixture of ExGaussians, where each component ExGaussian is determined by the strongly connected components of the underlying Markov chain. We have experimentally verified this property on several environments, with the presence and absence of discount factors.

The below figure shows the total distribution of an MDP with two ergodic components with a transient part leading into either of the two components, where delta(2,2) denotes a parameter controlling the reward occurred during the transient part of the Markov chain.

Reward-Distribution-Follows-Mixture-of-ExGaussians

Further investigation of the total distribution for single and multiple dimensional criteria is ongoing work. If you’re interested in working on this problem, feel free to drop me an email!

Final Report for SPAR 2024 - SatisfIA

The final report for the SPAR program is available here.