Can AI Pass Freshman CS?
Turns out, barely.
I was back TA-ing CS 2112, the Honors Data Structures course at Cornell, for my fifth and final time last semester.
Out of curiosity, I decided to run a little experiment where I took every single assignment, exam, and quiz from the class and ran them through the paid versions of ChatGPT, Claude, and Gemini, including their agentic coding models for the projects, and graded the result as if they were students.
The results were both exactly what I expected but also surprising in multiple ways. I expected them to do really well on exams, and indeed they did. I expected them to catastrophically fail the projects once they got big, and some of them held out longer than I expected but eventually all did succumb.
But I didn't expect just how unusably broken and buggy all these products actually are when you get past their investor presentations and actually try to use them. And I certainly wasn't expecting which AI did the... "best" is a strong word, but perhaps "least worst" is suitable.
I made a video essay about the whole experience, for all the gory technical details:
Posted By: Michael Xing