Hacker News new | past | comments | ask | show | jobs | submit login

Neat, love new approaches to old problems. At first sight I thought it would be an awesome implementation of an idea I had (and probably tons of other people before and after :) a few years ago to find cheating among C programming assignments: convert programs to PostScript "drawings": ifs, fors and certain other functions gave rise to different "movements" of the cursor, eventually drawing paths for different expressions. Since it didn't take into account variable names, malloc-calloc-free order or any other extraneous thing, it could pinpoint cheating with a relatively good accuracy (and indeed, we caught 3 cheaters among ~90 submissions, and one of them would have been pretty hard to spot without the tool.) If anyone's interested, it was created with lex and basically parsed the expressions I was interested. I think I wrote about it once in my blog, but I'm not sure if I ever published the code (it was hacky as hell!)



I've seen cheating-detection programs that use tree diff scores to compare code from different students.

The theory goes that students who share solutions will probably change the variable names, do some reformatting etc, which would fool a text diff. But the actual structure of the AST will be the same or similar.

So if you find close matches, you inspect them more closely.

Some quick Googling reveals that plagiarism detection using tree comparisons is a common idea.


Yup, when I thought "this looks neat!" I didn't think it was very novel, the idea is very clear once you are actually looking at cheated code (IIRC my brother in law told me a few years ago - later an I wrote my code, which was written before meeting him - he had used a similar method years earlier in another university). But I didn't bother much with looking for existing solutions: this was just soo fun! I love reinventing wheels for fun/learning :)


Am I correct in guessing, though, that it would only work for relatively sophisticated assignments that can be done in lots of different ways? Or is there always enough variety in different people's structure to tell when things are being copied?


It was more like a quick way to check without having to remember 90 assignments. Visual inspection was far, far better (almost like a game of match 2). I taught numerical analysis, and usually there was more than one way to structure the code, in fact most assignments were clearly very different. You'd be surprised how many ways our students found to write the same Gauss inversion algorithm, for example (my personal experience is that I write always the exact same code with very minor variations, but of course I alwas refer to Golub-Van Loan for the pseudocode...) Other assignments were harder to check, there are not that many ways to write a Runge-Kutta integrator with variable stepsizes (it's basically a set of for statements in a precise order, except for the variable step integrator part), but the "fingerprints" pinpointed some odd stuff that could be checked manually.

Basically it was a fun project that got me a little into Lex, the kind of odd stuff I do in Saturday afternoons




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: