Bayes nets by example with Python and Khan Academy data

hucker · on March 27, 2012

After taking multiple courses that touches on bayesian nets and other AI-techniques in general, I have attained what I feel is a good general feel for how stuff works and its mathematical underpinnings. However, I still stumble when trying to implement what I've learned in code, so thank you for this!

My school is using AIMA for every introductory AI-course, and if you're looking for pointers on how to implement some of the algorithms and methods described in there in python, have a look at http://code.google.com/p/aima-python/ . Since it's Norvig himself coding, the quality of the python is practically flawless.

abecedarius · on March 28, 2012

A little correction: aima-python is primarily by Norvig, but unfinished and less polished than the essays on his website. I took it up last fall to get it into better shape for the free AI class, and about 2/3 of the code now is his unchanged from before then. He is keeping an eye on things, but, you know, busy guy.

That said, Bayes nets are fully implemented (as presented in the book, i.e. for boolean variables) and I think worth reading and trying out if you're learning the subject. Also worth checking out is aima-java, in a much more complete state, though of course a lot wordier the way it is with Java.

If you'd like to contribute, there's plenty to do. (For example the E-M algorithm used in this post to learn the Bayes net parameters.) I set up a mirror on github the other day to make it easier: https://github.com/darius/aima-python-mirror

(I don't have the spare energy right now to do a lot, myself. Including editing too many patches coming in, though it'd be a surprise to get that problem.)

hucker · on March 28, 2012

Ah, you're correct. I can only speak of the code in there I have myself used for self study, which was great, but I see now that a lot is missing. I wouldn't mind contributing to such a project myself, but I'm afraid my python is not as uber-"pythonistic" as Norvig's is...

abecedarius · on March 28, 2012

Understood. :) I'm a big fan of his work, too.

a1k0n · on March 28, 2012

Isn't the coefficient on 'T', the predicted 'mastery' variable, pointing the wrong way in the final logistic regression? And 'E', which is just getting 85% of the exercises right, is completely dominating all the other stuff (exponential moving average, etc) they did in the model.

So while all this is really cool, and I like the idea of doing EM on missing data (I've done it myself), it doesn't seem like it actually adds much, if I'm reading that right.

kohlmeier · on March 28, 2012

The 'E' in the regression is the inferred/predicted value of the E variable for that exercise, using no problem history from that exercise--only what's pulled in through the Bayes net. (Sorry that wasn't clear)

The 'T' variable is likely just a case of multicollinearity with the 'E' variable and should go away on a full-scale data set. If not it can easily be removed from the model. The 'E' variable is dominating because is additionally captures cross-sectional affects across the various exercises in the regression.

a1k0n · on March 28, 2012

Ah, ok. So is this a sort of Markov model, where you are predicting the probability of getting an exercise right after observing (some subset of) the previous exercises? And E is not 1 or 0, but the expected probability of getting it right? I'm still confused where all the different E_i's fit in.

That would explain the magnitude, and I agree the negative weight on T would just be due to the direct correlation between E and T.

Edit: I just realized that an exercise consists of multiple problems, so you're predicting whether or not the student will get >= 85% of the problems right on an exercise.

mamp · on March 28, 2012

Very helpful article. Note that this is for an implementation of a Naive Bayes model, sometimes called 'idiot' Bayes. It assumes independent observations and therefore can be overconfident. More complex Bayes net models are way harder to implement. Here's a good overview of general networks: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

kohlmeier · on March 28, 2012

This is not naive bayes and does not assume independent observations on the exercises. The point of using a network is to model the joint distribution with dependencies.