As a long time pandas user, I'd say this is one of the better write-ups I've seen that illustrates the versatility and functions of the Series and DataFrame objects without being too long winded.
Just one thing to point out regarding the final example: read_csv will actually fetch a URL if it's given as input, so there is no need to use urllib2 and StringIO. Instead, you can just do:
Thank you for a great writeup! Your examples are excellent, I can see a direct reason why using IPython Notebooks would improve my workflow, AND I am already convinced that pandas could make some of my tasks (e.g. writing Excel reports) easier.
One thing I do have a issue with in pandas is the type conversion on sparse data, i.e. a column with missing values.
It's a pity you can convert that to a float for example.
Great write up, thanks - wish I'd had this when I got started with pandas because the thing I found tricky was converting the SQL I knew into its pandas equivalent.
One thing I would point out for new users is the .loc and .iloc functions which I think make selecting data more intuitive because they are a bit more explicit.
I'm a little ashamed to say that although I've heard plenty about Pandas, for some reason I've remained adamant with data processing for my PhD reason to hack ugly solutions using numpy array of arrays, complicated structured arrays etc. instead of just reading through a Pandas tutorial.
great tutorial. I was able to get through all of it this evening. it was fun having a data set to play around with the whole time. everything was well explained and it was easy to read.
and looks like working for the city of chicago is the thing to do...not bad salaries out there :D
Just one thing to point out regarding the final example: read_csv will actually fetch a URL if it's given as input, so there is no need to use urllib2 and StringIO. Instead, you can just do:
from_url = pd.read_csv('http://www.example.com/data.tsv', sep='\t')