Hacker News new | past | comments | ask | show | jobs | submit login
Intro to Pandas data structures (gregreda.com)
128 points by grej on Oct 27, 2013 | hide | past | favorite | 18 comments



As a long time pandas user, I'd say this is one of the better write-ups I've seen that illustrates the versatility and functions of the Series and DataFrame objects without being too long winded.

Just one thing to point out regarding the final example: read_csv will actually fetch a URL if it's given as input, so there is no need to use urllib2 and StringIO. Instead, you can just do:

from_url = pd.read_csv('http://www.example.com/data.tsv', sep='\t')


Thanks! My concern was that it actually _was_ too long winded, so I'm glad to hear otherwise.

Thanks for pointing this out too - had no idea it'd handle a URL.


Nice job. I really find the best way to grasp pandas is through these ipython notebook walkthroughs. I noticed that http://inundata.org/R_talks/meetup/images/splitapply.png has row 'a' as 2.5, but it should be 3.


If anyone is interested in reading it as one long Notebook, you can use NBViewer: http://nbviewer.ipython.org/urls/raw.github.com/gjreda/gregr...


Thank you for a great writeup! Your examples are excellent, I can see a direct reason why using IPython Notebooks would improve my workflow, AND I am already convinced that pandas could make some of my tasks (e.g. writing Excel reports) easier.


I wish I could take credit! Alas I didn't do the write-up, just found it and thought it was a great resource.


Pandas is great, nice write up.

One thing I do have a issue with in pandas is the type conversion on sparse data, i.e. a column with missing values. It's a pity you can convert that to a float for example.


This should do the trick.

    myframe['myfield'] = myframe['myfield'].dropna().map(float)


Alternatively, I think you could also do:

    myframe.myfield.dropna().astype(float)
I'd need to test that out though - type conversion in pandas/NumPy has always been a little confusing for me.


I thought one of the things it did well was dealing with data exactly as you describe


df.fillna(0.0)


Great write up, thanks - wish I'd had this when I got started with pandas because the thing I found tricky was converting the SQL I knew into its pandas equivalent.

One thing I would point out for new users is the .loc and .iloc functions which I think make selecting data more intuitive because they are a bit more explicit.


All credit on the writeup goes to Greg Reda. I just found this while working with Pandas and thought it was a great intro resource!


you may want to look at pandasql, where you can do something like

    from pandasql import sqldf, load_meat, load_births
    pysqldf = lambda q: sqldf(q, globals())
    meat = load_meat()
    births = load_births()
    print pysqldf("SELECT * FROM meat LIMIT 10;").head()


I'm a little ashamed to say that although I've heard plenty about Pandas, for some reason I've remained adamant with data processing for my PhD reason to hack ugly solutions using numpy array of arrays, complicated structured arrays etc. instead of just reading through a Pandas tutorial.

Damn.


I love the pragmatism of it, clipboard and excel bridges are major wins in most offices.


Excellent post, I just started using Pandas last week and this will be helpful.


great tutorial. I was able to get through all of it this evening. it was fun having a data set to play around with the whole time. everything was well explained and it was easy to read.

and looks like working for the city of chicago is the thing to do...not bad salaries out there :D




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: