Benchmarking and evaluation frameworks for reasoning in language models

Waabi

A version of the talk given to DGP catered towards self-driving cars.