Intro to Pandas data structures

zissou · on Oct 27, 2013

As a long time pandas user, I'd say this is one of the better write-ups I've seen that illustrates the versatility and functions of the Series and DataFrame objects without being too long winded.

Just one thing to point out regarding the final example: read_csv will actually fetch a URL if it's given as input, so there is no need to use urllib2 and StringIO. Instead, you can just do:

from_url = pd.read_csv('http://www.example.com/data.tsv', sep='\t')

gjreda · on Oct 27, 2013

Thanks! My concern was that it actually _was_ too long winded, so I'm glad to hear otherwise.

Thanks for pointing this out too - had no idea it'd handle a URL.

0003 · on Oct 27, 2013

Nice job. I really find the best way to grasp pandas is through these ipython notebook walkthroughs. I noticed that http://inundata.org/R_talks/meetup/images/splitapply.png has row 'a' as 2.5, but it should be 3.

gjreda · on Oct 27, 2013

If anyone is interested in reading it as one long Notebook, you can use NBViewer: http://nbviewer.ipython.org/urls/raw.github.com/gjreda/gregr...

gknoy · on Oct 27, 2013

Thank you for a great writeup! Your examples are excellent, I can see a direct reason why using IPython Notebooks would improve my workflow, AND I am already convinced that pandas could make some of my tasks (e.g. writing Excel reports) easier.

grej · on Oct 27, 2013

I wish I could take credit! Alas I didn't do the write-up, just found it and thought it was a great resource.

macarthy12 · on Oct 27, 2013

Pandas is great, nice write up.

One thing I do have a issue with in pandas is the type conversion on sparse data, i.e. a column with missing values. It's a pity you can convert that to a float for example.

stdbrouw · on Oct 27, 2013

This should do the trick.

    myframe['myfield'] = myframe['myfield'].dropna().map(float)

gjreda · on Oct 27, 2013

Alternatively, I think you could also do:

    myframe.myfield.dropna().astype(float)

I'd need to test that out though - type conversion in pandas/NumPy has always been a little confusing for me.

easytiger · on Oct 27, 2013

I thought one of the things it did well was dealing with data exactly as you describe

bp123 · on Oct 27, 2013

df.fillna(0.0)

RobinL · on Oct 27, 2013

Great write up, thanks - wish I'd had this when I got started with pandas because the thing I found tricky was converting the SQL I knew into its pandas equivalent.

One thing I would point out for new users is the .loc and .iloc functions which I think make selecting data more intuitive because they are a bit more explicit.

grej · on Oct 27, 2013

All credit on the writeup goes to Greg Reda. I just found this while working with Pandas and thought it was a great intro resource!

cdavid · on Oct 27, 2013

you may want to look at pandasql, where you can do something like

    from pandasql import sqldf, load_meat, load_births
    pysqldf = lambda q: sqldf(q, globals())
    meat = load_meat()
    births = load_births()
    print pysqldf("SELECT * FROM meat LIMIT 10;").head()

kartikkumar · on Oct 28, 2013

I'm a little ashamed to say that although I've heard plenty about Pandas, for some reason I've remained adamant with data processing for my PhD reason to hack ugly solutions using numpy array of arrays, complicated structured arrays etc. instead of just reading through a Pandas tutorial.

Damn.

agumonkey · on Oct 27, 2013

I love the pragmatism of it, clipboard and excel bridges are major wins in most offices.

rebootthebox · on Oct 27, 2013

Excellent post, I just started using Pandas last week and this will be helpful.

priyadarshy · on Oct 28, 2013

great tutorial. I was able to get through all of it this evening. it was fun having a data set to play around with the whole time. everything was well explained and it was easy to read.

and looks like working for the city of chicago is the thing to do...not bad salaries out there :D