Thank you for sharing this nice work. Could you explain why the number of QA pairs per image has a negative impact? Off the top of my head, a single QA pair can only represent one small factor regarding the quality of a caption—so more QA pairs should be able to reflect the quality better. Why this method can avoid reward hacking?
Thank you for sharing this nice work. Could you explain why the number of QA pairs per image has a negative impact? Off the top of my head, a single QA pair can only represent one small factor regarding the quality of a caption—so more QA pairs should be able to reflect the quality better. Why this method can avoid reward hacking?