Hacker News new | past | comments | ask | show | jobs | submit login
Python For The Web (gun.io)
213 points by Mizza on Oct 18, 2011 | hide | past | favorite | 98 comments



> Python is the best language in the world for interacting with the web, and I'm going to show you why.

I disagree. I didn't need to read it, but I did, and I still feel that way.

The statement says language, but the actual points made are about the libraries and the community (i. e. the ecosystem).

I think the ruby ecosystem is actually better, for a few reasons:

* Deep integration with JavaScript (ExecJS and TheRubyRacer are way ahead of what is available in Python, Perl, and PHP)

* Many different excellent choices for talking to HTTP APIs, including the highly innovative Faraday

* A top-notch JVM implementation (JRuby has had more traction than Jython over the last five years)

* Lots of good auth libraries: OAuth (for authenticating with a plethora of outside services), Devise (pre-built auth with confirmation emails, password resets), and OAuth2 provider (server) implementations

* Ruby has evented programming too. Goliath is pretty good.

Python is great but I take issue with saying that it's the best web dev language out there.


My cranky old age is going to show here, but... Python or Perl, all the way.

Both have a long history of "standard" modules for common cases and well-maintained (if not "standard") modules for alternative approaches to the common cases, or for the uncommon cases.

More importantly, both languages have a well-established culture of boringness. This is a quality I actively look for, because it tells me someone wrote code to solve a problem, not because it was the hip thing to do.


There's still room for improvement in both of them, too. If you want to have a lot of connections open, you have to choose between lots of threads with blocking I/O, or using event-driven I/O, or using some kind of magical light-weight cooperatively multitasked greenlet/fiber thing.

As Erlang and Haskell show, it's totally possible to combine an event-driven I/O subsystem with lightweight user-space preemptive multithreading that supports a huge number of threads, takes advantage of multiple cores, and doesn't need to worry about blocking-vs-nonblocking magic. I would love to see that in a more mainstream language.


Deep integration with JavaScript (ExecJS and TheRubyRacer are way ahead of what is available in Python, Perl, and PHP)

There are a lot of libraries to check out on each language before you can make that claim!

Here for example are just some of the Javascript libraries on CPAN:

* JavaScript (SpiderMonkey) - https://metacpan.org/module/JavaScript

* JavaScript::SpiderMonkey - https://metacpan.org/module/JavaScript::SpiderMonkey

* JavaScript::V8 - https://metacpan.org/module/JavaScript::V8

* JE (Pure Perl JS engine)- https://metacpan.org/module/JE

* JSPL (SpiderMonkey) - https://metacpan.org/module/JSPL


It also depends on what you need to do with the web. Perl is currently the best language for interacting with Unicode, for example. If you web needs include Unicode, you might have trouble with Ruby or Python, unless you're willing to accept half-ass, partial, or just plain wrong Unicode implementations.


I agree, I love Python but I've spent untold hours dealing with its unicode issues.


Everything unencoded is unicode in 3 isn't it?


Strings are unicode. Bytes are encoded data.

With Python 2 strings you have to be proactive about it and - always use u"" unless you know what you're doing - segregate non-unicode data to boundaries by decode()ing early and encode()ing late, or you're guaranteed to shoot yourself in the foot sooner or later

but if you do it it's quite smooth sailing.

Also,

    #/usr/bin/env python
    # -*- encoding: utf-8 -*-
helps a lot.

All that just brings you closer to Python 3 anyway and really helps when using 2to3.


Agreed, however I've run in to compatibility issues with 3 a few times and so resolved to stick w/ 2, deal with the text parsing issues myself, keep the libraries I like and leave the ops team alone.


As someone building websites using ruby & dealing with Japanese, I have to agree..

    incompatible character encodings: UTF-8 and ASCII-8BIT 
gets pretty tiresome! Using 1.9.2 at least you have .force_encoding ... feels pretty wrong but sometimes it's just needed to get the job done..


Ruby's character encoding and Unicode support is pretty strong. I'm intrigued how you think it's half-ass, partial or just plain wrong (Really! If there's something really borked with it, it's in my interest to know :-)). Every string has full encoding support and it's baked right in to the language.


tchrist's OSCON Unicode talks:

http://98.245.80.27/tcpc/OSCON2011/index.html

Specifically the third talk:

http://98.245.80.27/tcpc/OSCON2011/gbu.html http://98.245.80.27/tcpc/OSCON2011/gbu.pdf

Excerpts:

  Its String functions like upcase or capitalize won’t even look at
  anything but ASCII. 

  It’s completely missing a whole lot of critical Unicode
  functionality:

    casemapping & -folding
    grapheme support
    normalization
    collation
    text segmentation, &c &c &c. 


  Every Ruby string carries around its encoding, instead of sanely
  unifying into Unicode internally like nearly everything else does.
Also:

  > baked right in to the language
is not synonymous with "intelligently implemented"

Note that I wasn't implying that "half-ass," "partial," and "just plain wrong" necessarily all apply to Python and/or Ruby's implementations. Some may apply to some areas while others may not, and really this extends outside of just Python and Ruby, but I'm trying to stay in context here.


This is interesting stuff - thanks for sharing, I'll be checking it out. The upcase/downcase stuff definitely checks out so far :-)


require 'net/http'

Net::HTTP.get_print 'www.gun.io', '/'

That's it! In only two lines of ruby, you can grab a whole webpage and print it to the screen. Awesome! - and it's in the standard lib ;)

I agree, when I left PHPland I chose Ruby over Python because the (web) community was larger, there were just more books, blogs, screencasts, etc for Ruby/Rails.

There seems to be more ideas generated out of the Ruby community, I'd add Sinatra and HAML to the list above, like Rails they've both been ported to other languages.

OF course Heroku was Ruby first and it's so good that my PHP/Java/Python friends jumped for joy when their language became supported.


arguably, in PHP

    echo file_get_contents('http://www.gun.io')
you lose even more of the line noise of the other languages.

For a language that claims to be "batteries included" it amazes me how many Python tutorials begin with "go get and install this or that package..." How much of the Python standard library is no longer the "Pythonic" way of doing standard things? Why aren't APIs with more refined design, like Requests, pushed back into the standard library, and made the real one way to do it?

Well, OK, you'd have the problem of redundant libraries starting to stack up--it's already looking tiresome. For instance, Requests would have to be urllib3, since there is already a urllib2 and urllib. This article wants me to install simplejson, but there's already a Python json module: oh I see, it was merged in from simplejson, and now they are developed in parallel, so using simplejson may still be better, wtf? Why is there an htmllib in the standard library if I'm always going to use lxml instead? Let's not even discuss eyesores like the heritage of the subprocess and threading libraries...


To all your questions:

Because it takes time. It takes time to add things to the stdlib, especially if the are replacing "an old reliable." In the case of Requests, it is just a wrapper on the stdlib. There is nothing in Requests you can't do in the stdlib, ditto pretty much everything.

As far as lxml goes, there are a ton of XML parsing/query libraries out there. In lxml's case it is built on libxml2. I believe there is another library that is also build on libxml2, however lxml has better syntax for the most part. Should it be in the standard library? Probably not, it depends on a pretty heavy weight C library and most people don't really need it. When you do, you use it.

I have no idea what is going on with the simplejson business. I just use the json lib.

I don't see anything wrong with subprocess or the threading libraries. They do what they need to do. I use subprocess quite often, it works fine. It is way better than the actual legacy version (in the os library).


It seems performance issues with stdlib's json version are not in the past http://news.ycombinator.com/item?id=3128433


Good to know. When I said I don't know what is going on, I meant why the simplejson people are not focusing on the stdlib version. It seems strange to basically have a fork...


> arguably, in PHP

> echo file_get_contents('http://www.gun.io)

I thought it had to be "too good..." - links are broken. And since src links are broken, too, images do not display.


Of course, these are all examples of piping the body of an HTTP GET response. There is no parsing going on. You'd need to parse the HTML to translate the links, if this is being served from your own webserver. If you're trying to mirror actual content, wget -r might be a better tool (and can translate the URLs).


What? Both the Python and PHP samples output the same thing (bar one line) - images don't show because the pages contain relative paths (which only work if you have the images stored locally).

    (development)ross@debian:~/hntest$ python download.py > download.py.html
    (development)ross@debian:~/hntest$ php download.php > download.php.html
    (development)ross@debian:~/hntest$ diff download.py.html download.php.html
    242c242
    <             <div style='display:none'><input type='hidden' name='csrfmiddlewaretoken' value='b0c35970dfd374f2b138ed89a4f83a76' /></div>
    ---
    >             <div style='display:none'><input type='hidden' name='csrfmiddlewaretoken' value='018fe81570d710d88ca3f46d1db4c8b7' /></div>
303d302 <


urllib is in Python's standard library:

  import urllib
  the_page = urllib.urlopen("http://www.gun.io/")
  print ''.join( the_page.readlines() )
I could jam it all into one line, but in Python we prefer readable code ;) ;)

The Python community also tends to be quieter than those raving Ruby fanbois - if you'd said appeared to be larger, then I would have agreed with you...


Why don't you just use:

  print the_page.read()


Pythonistas should look at the "Requests"[1] lib for Python mentioned here on HN[2] a few days ago, too.

[1]: http://docs.python-requests.org/en/latest/ [2]: http://news.ycombinator.com/item?id=3094695


Explicit is better than implicit. I think keeping a bit of the verbose helps people know what's going on. That knowledge is infinitely valuable when debugging certain problems or creating solutions that work well.


Have you heard of something called Java? ;)


A good language, as a good anything, is a reasonable language. Giving extremes as arguments against sensible middle positions is the way of madness, paralysis, or extremism..


Thats what the post linked to talks about.


Sure, but good to repeat it in this case - Python's urllib/urllib2/etc. libraries are famously hard(er than necessary) to use.


I wouldn't count the need for a good JVM implementation as a plus mark in a Ruby vs Python comparison. Jython is used when you need to interact with Java libs, not because the C implementation of Python language is slower than the JVM one. The same does not apply for Ruby, you use JRuby because it's better than CRuby. If you want to compare speed you'd be up against pypy.


Your statement sounds like sour grapes to me.

When a language has multiple viable implementations, it means the language has a good specification. It also means that it doesn't depend too much on platform-specific characteristics. It means it is more portable. Python is both blessed and cursed with a couple of problems ... its behavior is sometimes related to the CPython implementation (e.g. reference counting, __del__), and also some libraries are too big and important to live without them.

One such library is NumPy. Currently you cannot talk about an alternative Python implementation if you don't have NumPy running on it, and that's a fact.

I'm a Python developer in my day job and I never used Pypy for anything. I only played with it and became frustrated that libraries I relied upon don't work on it.

Btw, if you're a fan of Pypy, checkout http://rubini.us/


Web developer here; I live without NumPy all the time. :)


Flask is also awesome for python-based web development. I found it very easy to jump in as the docs are great: http://flask.pocoo.org/

Here's the post's Django example written using Flask: https://gist.github.com/1296926


This is really, really cool. Thanks for making this.

I really dig the Flask project (love the website and the docs, and everybody who uses it raves about it) and I hope to play with it more in the future, but I don't think I'd recommend it to somebody who is a first timer, largely because of one issue: Data. Flask leaves you to sort it out on your own, which is great if you're capable of that, but Django holds your hand, which is more appropriate for a beginner.


Data? Can you explain that? Not sure what you mean.


I'm pretty sure he means that there's no database support built in, ie. you need to go download something like SQLAlchemy, SQLObject, Storm, etc. If you're already going to do that anyway it's not so much of an issue, but for people who are just getting their feet wet, it's yet another obstacle to overcome.


Ah, of course.

And it makes sense -- I think Flask is a much easier introduction to config and the VC of MVC development... but once you get to the M, well... SQLAlchemy is great, but the learning curve is steeper. And even though Flask has a well-documented extension for that, it's still another package and a separate piece of documentation.


I think Mazzo is referring to a lack of Django's ORM.


As a first-timer, I really appreciated having Flask around. It might not have a database layer, but it did have really excellent documentation. I felt like I had a much firmer understanding of what was happening with Flask than I did with Django.


Nice. It does seem like one of the lighter-weight Python web frameworks might fit in better with the overall theme of this intro, and Flask is a great choice.

I thought about doing the same thing as I read through it.


I wrote this for a friend who is just getting started out in python/web stuff. It might be a little n00bly for lots of people here, but it could be interesting for people who aren't pythonic who are thinking about making the switch!


Great that you promote requests, lxml, json (just json, not simplejson since Python 2.6), django to use by a beginner in Python. Pages from http://wiki.python.org/moin/WebProgramming might provide too much choice for a beginner.

You might add a tiny example of a microframework such as Bottle, Flask or an example from Pyramid written in the microframework-ish style.

Recommending `sudo` with `pip install` to a novice is not a good idea. System package manager or `pip install --user` could be used as alternatives. `virtualenv` might be out of topic for a short tutorial.


About JSON: Python 2.6 added a "json" module to the standard library that was forked from simplejson, but in that time, simplejson has improved; the interface is exactly the same as the built-in json module, but simplejson tends to be quite a bit faster.

On my informal benchmark of JSON parsing speed, simplejson was 27x faster than json.

The idiom I've been using is:

    try:
      import simplejson as json
    except ImportError:
      import json


Documentation for simplejson mentions that it should be faster. Though it contains reference to Python 2.5, 2.6 so It is unclear whether the claim is outdated:

It is the externally maintained version of the json library contained in Python 2.6, but maintains compatibility with Python 2.5 and (currently) has significant performance advantages, even without using the optional C extension for speedups. </quote> http://simplejson.readthedocs.org/en/latest/index.html

I see there are performance patches to stdlib's json since 2.6: http://hg.python.org/cpython/search/?rev=_json.c&revcoun...

An order of magnitude difference might be an issue with your python installation.

On my machine version from 2.6 (1.9) is 15 times slower than from 2.7 (2.0.9). Though simplejson (2.2.1) is still faster on both 40 and 2 times correspondingly for .loads() timeline.js from the example as measured by:

  % timeit json.loads(timeline_text)
it seems that C speedups available on both versions:

  python -v -c'import json' 2>&1 | grep _json.so
  python -c'import json.decoder as d; print d.scanstring'
Is your python version old? Otherwise it is worth to open issues at http://bugs.python.org to merge speed improvements and at https://github.com/simplejson/simplejson/issues to mention in the docs and on http://pypi.python.org/pypi/simplejson more clearly that Python 2.7+ stdlib's version is also slower.


I'm stuck on 2.6 for now, so that's probably the main issue.


Thank you for using a large font. Makes it much more readable.


Nicely done, very clear and concise.

A question: is lxml what we would use today instead of Beautiful Soup?


lxml is definitely faster, but I've found BSoup to be more forgiving with poorly formatted DOMs


BeautifulSoup is poorly maintained — you have to be very specific with which version you're using.

Note: Lxml has a number of repair modes that allows it to parse virtually anything. Cpu cycles and memory go up quite a bit when they're activated, but it's still better than BeautifulSoup.


Thankfully lxml has a slower-but-more-forgiving mode that you can use when interacting with poorly formatted HTML, which takes advantage of BeautifulSoup http://lxml.de/elementsoup.html


I’ve found the exact opposite. BSoup will choke on invalid tags in the DOM, such as: <div id=“content”><content>…</content></div>

If I try to return the innerHTML of #content, I get '<div id=“content”><content>’ as a string, nothing else.

While I know that’s inexcusable markup, it’s nothing I have control over.

lxml (if it builds on the target system) has been much better for my scripts.


lxml's default parsers are good for xml, atom and xhtml; for html5 and html tag soup, lxml.html.html5parser (which depends on html5lib) is the way to go. For feed tag soup, feedparser still uses BeautifulSoup internally.


I'm a big fan of pyquery[1] - it uses lxml under the hood, but is much nicer to interact with, especially if you're familiar with jQuery already.

[1] http://packages.python.org/pyquery/api.html


Very nice article and thanks for the intro to Requests. Though, I'm not sure we would agree about Django being the best thing to learn in the long run.

I undertook to move to Python for web development about 8 months ago and embarked on a thorough quest for the one true framework. Obviously Django was the first and most recurrent recommendation. Having briefly toyed with it in the past, I really could see myself commit to it for the long run. The various raving reviews also made the choice all that simple. I was all ready to give it my seal of approval, when I encountered my first complain, which was so compelling that it raised an eyebrow.

It was about an experienced Python programmer explaining that as time went and as he progressed with Django, he found himself increasingly swapping parts out of it for external libraries. SQLAlchemy, WTForms, Jinja2. In the end he had only the routing module and the admin, which wasn't that big a deal for him. He was asking what was the point of using a full-stack framework not necessarily designed with interchangeability in mind, if you end up just using it like a glue mini-framework?

As I dug deeper, I found more similar complains, all from similarly experienced developers, who all ended up adopting something else with a light plugable base approach. I heard of Repose.bfg, Pyramid, Werkzeug and a slew of other ones, that allow you to get down and dirty fast, while still allowing you to get big in the long run.

Just as you, I was recently asked by a friend wanting to get into web applications development to recommend a platform to work from. I also did point to Django, but explained that it wasn't because it's necessarily the best, but rather because it's the gentler introduction. It comes with batteries, crutches, first aid kit and a nice box of goodies, perfect for someone who has no clue what they're doing.

Note that I'm not dissing Django or relegating it as an amateurish framework. I agonized on my decision and still sometimes experience some Django envy (FYI I adopted Flask and don't regret it one bit). Nonetheless, it's hard to deny that it does a particularly good job of introducing newbies to good concepts fast, while at the same time being notorious for getting in the way of more experienced developer than some other frameworks.


The issue that I have with that attitude (as an experienced Python and Django developer) is that there's a lot more to it than just "swapping out Django's DB and replacing it with SQLAlchemy". You miss out on most of the infrastructure around Django's database stuff that makes it so easy and worthwhile to use.

For example, fixtures. In Django, you can include them in your tests and have it All Just Work(TM):

https://docs.djangoproject.com/en/dev/topics/testing/#fixtur...

You can also use them outside of testing, too:

https://docs.djangoproject.com/en/dev/howto/initial-data/#pr...

Fixtures in Flask/SQLAlchemy? Not so much - you have to roll your own (crappy) implementation:

http://farmdev.com/projects/fixture/using-fixture-with-pylon...

http://flask.pocoo.org/mailinglist/archive/2011/3/6/flask-sq...

Also note that, while I'm picking on fixtures here, Django also has a bunch of other database related features, like introspecting a pre-existing database, and generating a bunch of Model classes from it. Combine that with South and you're 90% of the way there when migrating the data from a legacy system.

This has turned into a bit of a rant, but I've seen a lot of half-baked reimplementations in "pluggable" architectures of stuff which Django just gets right.


It's not a matter of attitude, it's about having tools, evaluating them for what they're doing for you and making a choice. There's always a conscious trade-off.

You can't focus on what's good about Django, as if someone picking SQLAlchemy or Werkzeug is a fool at a loss. The same goes for the other libraries that I listed.

In my own case, I'm not particularly a fan of opinionated frameworks. I've had my share of griefs with them. Django looked nice, but I easily could identify with the pains those developers went through when dealing with its lack of flexibility at certain corners. I was new at Python, not at web development, I did not need the hand holding, no matter how nice and clever the code was.

This was not a one evening process, I read blog posts, forums, perused StackOverflow and even HN. It took weeks. Feel free to take the journey, then tell me it didn't give you pause.

Flask is a small framework that is far from pretending to be what Django is. It's not even at version 1, but with its flaws, I'm quite happy with what it's allowing me to do.

I don't mind you ranting about my post, just don't assume that I would use Django the way you do. Stuff that you relish might be what turns me off and the beauty of it all is that it's all acceptable.


I really don't understand the "lack of flexibility" thing myself. I've made Django and Django's ORM and admin do all sorts of weird stuff, and it's much easier to build on top of that than to reroll everything, which is what you have to do with Pylons/Flask/Bottle/...

And database/fixture set up is something that you will have to do, assuming that you test your app, and that your app is more sophisticated than "I have some strings which I want to upper case". Ditto for working with legacy databases, migrating data, bla bla blah.

The default option from what I've seen tends to be to set your DB up once and hope for the best, or else repopulate it after every test. Both of these options work great(ish) when you first start your project, but then grind you down six months later when your test suite takes an hour to run. A decent ORM, along with in-memory SQLite and fixture setup is generally what's settled on for most integration suites that I've seen, and Django does that out of the box.

Also - a minor nit, but WTForms and Jinja2 are based on Django's form and template libraries. Switching to them doesn't get you that much - they certainly don't replace any of Django's core infrastructure. And I've found SQLAlchemy to be basically unusable unless you use the new 'declarative base' stuff, which looks suspiciously similar to... Django's ORM :)


> And I've found SQLAlchemy to be basically unusable unless you use the new 'declarative base' stuff, which looks suspiciously similar to... Django's ORM :)

I can relieve your suspicions as I never looked at Django's ORM at all when designing declarative. It's merely poking normal SQLAlchemy attributes, all of which existed before Django was ever released, onto a class. I can assure you active-record style class mapping is not an idea Django invented.


The same design pressure (to make the simple stuff easier) is at work in both cases. Much like, say, Bottle and Flask looking very similar.

The "basically unusable" is probably a little harsh, but I've never really understood the motivation behind some of SQLAlchemy's design decisions. eg. separating one class/table definition into interacting table, model and mapping classes makes no sense at least 99% of the time - that seems like something which should be hidden internally (but accessible if you really need it).


well if you read our docs you'll see that they're entirely in agreement with that. separate mapping/table design is called "classical mapping" and was years ago superceded by the declarative style. Update your sqlalchemy knowledge before commenting on it.

interestingly enough a ton of users still prefer the mappers/tables to be separate.


Your comment, in particular the part about SQLAlchemy, reads a lot like the blub programmer.


Ad-hominem much?


Django is where the jobs are. If you want to profit from your skills, the smart bet is in learning technologies that are in demand.


I'd say the smart bet is in learning technologies that are relevant and to be comfortable with your tools. I don't recall the last time a client asked/cared what technology stack we're using. They usually just need stuff built.


Note: In Python 2.6+, you can just `import json` instead of needing to install and import simplejson.


I believe that the python2.6 json module (basically forked form simplejson) is much slower than the current simplejson version. The speedups (simplejson rewrite) didn't get included until 2.7 as I recall (memory is vague on this point).


According to my un-reproducible benchmark, simplejson was 27x faster than built-in json. It's not a small difference.


Yeah, I remembered it being significant, but wow. That is larger than I recalled it being. Probably due to a very unscientific test at the time. ;)


This is an article professing love for a specific library in a given language. Move along, nothing to see here.


First, why is the author recommending SimpleJSON? Just use json, it's built into the standard lib! (Also if I'm not mistaken the json implementation included in Python may even be based on simplejson, the APIs are very similar even if not.)

Second, I disagree that Django is the best web framework. It might be the best web framework, but it depends on what you're doing. I've come to prefer Flask for its simplicity and overall the way it feels more Pythonic.

That said, requests cannot be recommended enough! It is an awesome package that should not be missed if you're doing web programming in Python.


requests comes close to Perl's LWP::UserAgent in terms of usability, but LWP::UserAgent has been around for years. I don't know how that makes Python the 'best language for interacting with the web.'


And also LWP::UserAgent has two younger whipper snappers biting at its heels!

* HTTP::Tiny - lightweight useragent that comes with 5.14

  use HTTP::Tiny;
  print HTTP::Tiny->new->get('http://gun.io')->{content};
* And the all singing and dancing Mojo::UserAgent

  use Mojo::UserAgent;
  print Mojo::UserAgent->new
          ->get('https://github.com/timeline.json')
          ->res->json->{repository}->{name};


Simplejson is completely API-compatible with standard library json, but an order of magnitude faster.


  import requests

  r = requests.get('http://gun.io')
  print r.content
That's it! In only three lines of python, you can grab a whole webpage and print it to the screen. Awesome!

Well, that's a poor selling point that unduly diminishes the credibility of the article. Your audience is likely to know snappy alternatives such as

  curl gun.io


Or even just urllib?

     import urllib

     foo = urllib.urlopen("http://gun.io).read()
     print foo
I mean...maybe urllib (or urllib2) are old and unhip at this point, but they're heavily documented.

I use urllib all over my projects and have never encountered any problems.


In fact, it's even better than what the author said:

  import requests
  print requests.get('http://gun.io').content
Only two lines!


Well...urllib is the same

    import urllib
    print urllib.urlopen('http://gun.io').read()
And to be clear, you'd really never want to just print the content like that. You'd be doing something like.

     import urllib
     import simplejson

     req = urllib.urlopen("https://graph.facebook.com/some_id")
     response = req.read()

     json_data = simplejson.loads(response)
(Python is syntactically pretty, imho)


    import requests; print requests.get('http://gun.io').content
my god, python is cloud scale.


In one statement:

    print __import__('requests').get('http://gun.io').content


+1


I was a little surpised that lxml is used in this example over Beautiful Soup. Any reason?


It's much faster than Beautiful Soup, better maintained, and does an excellent job of handling crazy broken HTML.


Why on earth I didn't heard about Requests before? How does it compares against urllib2?


Where have you been indeed? It's been posted on HN at least twice and made page 1 both times:

http://news.ycombinator.com/item?id=2882301

http://news.ycombinator.com/item?id=3094695


Requests is really good. urllib2 is overly complicated, with a loathsome API. The world would be better of more people used Requests.


In my experience I have to reference the local path to a python file when using manage.py

`python manage.py runserver` becomes `python ./manage.py runserver`

But your mileage may vary.


Are you doing anything funky with the file or the file system? Windows, Linux or Mac OS? There should be no need to do what you are saying.

Perhaps you are trying to run "./manage.py runserver" without calling python? In this case you'd be right that you need to have "./" as part of the command call, but you still would have to add the shebang to manage.py and make it executable.


I'm not sure why the avoiding of the Python standard library.

Why import an external json library, when there's one built in?


But how to I get Python in my browser. Python on the back and front end is my dream.



I don't understand - you can generate html easily with python. Can you elaborate?


I guess he means as a replacement for Javascript.


also you have to write: from lxml import html to import the html-parser


Noted!


The whole article is a bit heavy on the keyword stuffing, isn't it?


To parse Html and XML, BeautifulSoup is awesome.


great post, I enjoyed reading it and following along!!!


You got downvoted for a simple compliment? Many people here are really smart... but many of them are not nice at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: