As a (senior) lecturer in a university, I’m with you on most of what you wrote. ...

jay_kyburz · 2024-11-14T09:17:09 1731575829

A genuine question, have you evaluated AI for marking written work?

I'm not an educator, but it seems to me like gippity would be better at analyzing a students paper than writing it in the first place.

Your prompt could provide the AI the marking criteria, or the rubric, and have it summarize how well the paper hits the important points.

low_tech_love · 2024-11-14T15:53:12 1731599592

Never say never, but I do not plan on doing this. This sounds quite surreal: a loop where the students pretend to learn and I pretend to teach? I would… hm… I’ve never heard of such… I mean, this is definitely not how it is in reality… right…

(Jokes aside, I have an unhealthy, unstoppable need to feel proud of my work, so no I won’t do that. For now…)

jay_kyburz · 2024-11-14T19:25:31 1731612331

I would have thought that the teaching comes before the test, and that the test is really just a way to measure how well the student soaked up the knowledge.

You could take pride in a well crafted technology that could mark an assignment and provide feedback in far more detail that you yourself could ever provide given time constraints.

I asked my partner about it last night, she teaches at ANU and she made some joke about how variable the quality of tutor marking is. At least the AI would be impartial and consistent.

I have no idea how well an AI can assess a paper against a rubric. Might be a complete waist of time, but if there were some teachers out there who wanted to do some tests, I would be interested in helping set up the tests and evaluating the results.

wrp · 2024-11-14T21:23:53 1731619433

In discussing how to adapt teaching methods, we have also looked at evaluation by LLM. The most talked about concern now is the unreliability of LLM output. However, say that in the future, accuracy of LLMs improves to the point that it is no longer a problem. Would it then be good to have evaluation by LLM?

I would say generally not, for two reasons. First, the teacher needs to know how the student is developing. To get a thorough understanding takes working through the student's output, not just checking a summary score. Second, the teacher needs to provide selective feedback, to focus student attention on the most important areas needing development. This requires knowledge of the goals of the teacher and the developmental history of the student.

I won't argue that LLM evaluation could never be applied usefully. If the task to be evaluated is simple and the skills to be learned are straightforward, I imagine that it could benefit the students of some grossly overloaded teacher.

Eisenstein · 2024-11-14T09:34:10 1731576850

I know I would have had a blast finding ways to direct the model into giving me top scores by manipulating it through the submitted text. I think that without a bespoke model that has been vetted, is supervised, and is constrained, you are going to end up with some interesting results running classwork through a language model for grading.