Evaluating artificial intelligence (AI) and human reasoning

A summary presentation of my work on evaluating reasoning across mathematics, creative writing, video games, and card games.

Abstract: I will survey the background of AI and go on to discuss what AI is today, then compare it to human reasoning. We study AI in two settings, mathematics and creative writing, while we study humans in video games and card games. We find that 1) the best performing AI, GPT-4 Turbo, currently achieves 85.7% on level 1 of the MATH dataset, 2) no single model is strictly best for metaphor generation, 3) humans exhibit learning progress in the video game Baba is You, and 4) humans adapt to a modified version of the card game Blackjack. These works have been published in Transactions on Machine Learning Research (TMLR) 2023 x 2, Educational Data Mining (EDM) 2023 and the last is in progress. In future work, we hope to study intrinsic motivation in machines using games, linking fun, creativity, and supercriticality.

< talks

Evaluating artificial intelligence (AI) and human reasoning