†: lead authors *: major contributors Accepted to TMLR 2023 How well do large language models perform on the MATH dataset?