Neat, love new approaches to old problems. At first sight I thought it would be ...

jacques_chester · on May 26, 2013

I've seen cheating-detection programs that use tree diff scores to compare code from different students.

The theory goes that students who share solutions will probably change the variable names, do some reformatting etc, which would fool a text diff. But the actual structure of the AST will be the same or similar.

So if you find close matches, you inspect them more closely.

Some quick Googling reveals that plagiarism detection using tree comparisons is a common idea.

RBerenguel · on May 26, 2013

Yup, when I thought "this looks neat!" I didn't think it was very novel, the idea is very clear once you are actually looking at cheated code (IIRC my brother in law told me a few years ago - later an I wrote my code, which was written before meeting him - he had used a similar method years earlier in another university). But I didn't bother much with looking for existing solutions: this was just soo fun! I love reinventing wheels for fun/learning :)

andrewflnr · on May 25, 2013

Am I correct in guessing, though, that it would only work for relatively sophisticated assignments that can be done in lots of different ways? Or is there always enough variety in different people's structure to tell when things are being copied?

RBerenguel · on May 25, 2013

It was more like a quick way to check without having to remember 90 assignments. Visual inspection was far, far better (almost like a game of match 2). I taught numerical analysis, and usually there was more than one way to structure the code, in fact most assignments were clearly very different. You'd be surprised how many ways our students found to write the same Gauss inversion algorithm, for example (my personal experience is that I write always the exact same code with very minor variations, but of course I alwas refer to Golub-Van Loan for the pseudocode...) Other assignments were harder to check, there are not that many ways to write a Runge-Kutta integrator with variable stepsizes (it's basically a set of for statements in a precise order, except for the variable step integrator part), but the "fingerprints" pinpointed some odd stuff that could be checked manually.

Basically it was a fun project that got me a little into Lex, the kind of odd stuff I do in Saturday afternoons