Accepted to TMLR 2023. How well do large language models perform on an unnatural in-context learning task?