Text-to-Image · University of Science and Technology of China
Flow-OPD: On-Policy Distillation Fixes Reward Conflict in Text-to-Image RL
Flow-OPD trains one specialist teacher per reward, then distills them on-policy into one SD 3.5 student — lifting GenEval 0.63 to 0.92 and OCR 0.59 to 0.94 without the aesthetic collapse of multi-reward GRPO.