frieda rong
🍳
< home
Showing all blog posts, papers, service, talks tagged
"math"
talk
Jun 2, 2024
Benchmarking and evaluation frameworks for reasoning in language models
Waabi
A version of the talk given to DGP catered towards self-driving cars.
#
ai
#
education
#
games
#
hci
#
math
#
nlp
#
reasoning
#
rl
talk
May 22, 2024
Evaluating artificial intelligence (AI) and human reasoning
Dynamic Graphics Project
at the University of Toronto
A summary presentation of my work on evaluating
reasoning
across mathematics, creative writing, video games, and card games. ➝
#
ai
#
education
#
games
#
hci
#
math
#
nlp
#
physics
#
reasoning
#
rl
paper
Holistic Evaluation of Language Models
MATH dataset
scenario
•
code
•
page
Many authors... via
CRFM
Accepted to TMLR 2023.
How well do large language models perform on the MATH dataset?
#
ai
#
math
#
nlp
#
reasoning
Tags:
ai
3
•
education
2
•
games
2
•
hci
2
•
math
3
•
nlp
3
•
physics
1
•
reasoning
3
•
rl
2
•
all
3
© 2019–2025