Our new preprint, KL Penalty Control via Perturbation for Direct Preference Optimization, is released.