Hacker News new | past | comments | ask | show | jobs | submit login
Python URL manipulation made simple. (github.com/gruns)
96 points by grun on Dec 12, 2011 | hide | past | favorite | 23 comments



Neat! One problem I can see is that query strings aren't dictionaries, so valid parameters can vanish:

    >>> furl('http://www.example.com?foo=1&foo=2')
    furl('http://www.example.com?foo=1')
See Werkzeug's MultiDict for one way to handle this correctly.


Query parameters as a dictionary was an ease of use tradeoff over the rarely utilized flexibility of repeated query parameters.

  https://github.com/gruns/furl/blob/master/furl.py#L142
Werkzeug's MultiDict looks like a good combination for the best of both worlds, ease of use and flexibility. Thanks for the reference.


Repeated query parameters aren't "rarely utilized" -- look at how any sort of checkbox or other multi-select works.


I have to second the MultiDict suggestion -- I was very happy to see Werkzeug had gone that route when I had to throw some Python in front of Solr.


Interesting - would this make sense to be included into Requests [http://docs.python-requests.org/en/latest/index.html] in some way?


I've been contemplating the same thing, actually. I'm not sure if this exact library belongs (maybe), but some of the functionality could definitely be useful.


Most of those tools don't like IRI like: http://müller.de/ or the most famous http://☃.net (http://xn--n3h.net/) and that's a shame! Just like many URL shorteners…

For example, Django do it wrong (even though I tried: https://code.djangoproject.com/ticket/11522):

    >>> print(django.VERSION)
    (1, 3, 1, 'final', 0)
    >>> print(django.http.HttpResponseRedirect(u"http://müller.de))
    Content-Type: text/html; charset=utf-8
    Location: http://m%C3%BCller.de
While werkzeug is good!

    >>> print(werkzeug.__version__)
    0.8.1
    >>> werkzeug.redirect(u"http://müller.de).headers
    Headers([('Content-Type', 'text/html; charset=utf-8'), 
             ('Content-Length', '247'),
             ('Location', 'http://xn--mller-kva.de)]


Similar project: https://github.com/zacharyvoase/urlobject

I built a template library for Django on top of it: https://github.com/j4mie/django-spurl


Looks awesome! I don't have a use for it at the moment, but I'll keep it in mind for future projects.

Edit: Does anyone have more information on the license? I do plan to build amazing things, but it would be nice to know under what conditions I can release those amazing things to the public. I assume it means no restrictions, but I'd hate to go stepping on someone's toes.


No restrictions. Use however you deem fit.


You should include an explicit license on github, if you want others to use it.


Can you share some pointers?

E.g. a page giving an overview of BSD, GPLv2, GPLv3, LGPL, the Creative Commons bunch (no idea if they are usable for source code). Basically I'm wondering if there's a "Intro to licensing your crap on github" post somewhere on the net...


Woo! The best kind of license!


The best kind of license would be something like that, but in an attorney-referenceable print copy.


Interesting, but, to be fair, I don't find urlparse or urllib to be all that frustrating or tedious to use.


Yea, this just seems like rearranging the tokens to achieve the same result. Though, the same sort of thing could be said about path.py, which I absolutely love using.


I wrote a similar Python library (named yuri) a couple of months back: https://github.com/jparise/yuri

I was motivated to write it in response to one of Quora's Programming Challenges (http://www.quora.com/about/challenges). It seemed like a fun problem, and I had never had a good reason to read through the applicable RFCs. I wonder if furl was similarly motivated?

I don't find the standard library modules very difficult to use, however, so I haven't spent much more time on it since then.


This looks like a nice library. So is python-requests. Every time I see this sort of libraries though, I can't help to wonder - why standard python libraries are so bad that people need to create these helpers.


They aren't "so bad", you can't expect the standard library to meet the requirements of everyone. Most of what this library achieves can be done with urlparse, it just has some additional "nice to haves". Also, see the comments about not using a MultiDict as a good indicator of why this problem isn't solved in the stdlib.


Much of the stdlib was written before niceties like keyword params and generators, others just don't have as nice a design as they could, hence the helpers.


I attempted something similar, but got caught up trying to make it span all URIs and attempting to create an RFC-compliant implementation:

  https://github.com/bsandrow/urilib


I love anything that adds developer productivity and reduces developer pain, so I definitely love this. Here's hoping it catches on within the Python community.


very interesting, i ran into urlparse's shortcomings while prototyping a web crawler just last week! i created something similar to identify domain, subdomain, directories, pages, and fqdn. note that if you run sockets.getfqdn against most cloud servers then you get some weird string with ip numbers e.g. 182.43.210.102.static.cloud-server.com

let's merge some code!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: