†: lead authors *: major contributors Accepted to TMLR 2023. How well do large language models perform on the MATH dataset?