Hacker News new | past | comments | ask | show | jobs | submit login
Improving your (Python) code with modern idioms (python3porting.com)
160 points by rbanffy on May 8, 2012 | hide | past | favorite | 48 comments



Why change the excellent printf-style string formatting syntax to the vastly inferior slf4j-style? If I want simple concatenation, i use concatenation. If I want really complex formatting I write explicit code to do it. Printf style hits a sweet spot: it formats common datatypes to common presentation formats. Python 3 adds a formatting mini-language which looks more convoluted for simple tasks and unreadable for complex ones. Why change the syntax?


I'm in the same boat. It's an old joke that Java tried to kill off printf style formatting but had to bring it back because it's a really great idea:

1980: C

printf("%10.2f", x);

1988: C++

cout << setw(10) << setprecision(2) << showpoint << x;

1996: Java

java.text.NumberFormat formatter = java.text.NumberFormat.getNumberInstance(); formatter.setMinimumFractionDigits(2); formatter.setMaximumFractionDigits(2); String s = formatter.format(x); for (int i = s.length(); i < 10; i++) System.out.print(' '); System.out.print(s);

2004: Java

System.out.printf("%10.2f", x);

2008: Scala and Groovy

printf("%10.2f", x)

(from http://www.horstmann.com)

To complete the circle I'd add:

2009: Go

fmt.Printf("%10.2f", x)


I wouldn't argue against using a formatting string in general, but the printf style order-based formatting strings are really bad once you need to localize your app and have to support languages with different word order. In this sense, .NET or Python3-like formatting strings with position-based formatting is orders of magnitude better and I was surprised that Sun decided to use %* instead.

Take a simple example in the lines of "I've seen {0} {1} {2}".format("John", "eat", "an apple") ...and try localizing this message into e.g. German.

I have a feeling that printf-style is one of the reasons, why the texts in localized versions of some programs are as bad as they are.


glibc sprintf supports that as something like %2$s %1$s which Python could potentially support.

I don't buy the internationalization argument however. You have to explicitly select strings for i18n anyway and run them through something that retrieves the data from a catalogue based on the current language. It's not like you will override str.__mod__ to start i18n.

And if you want to do it right, there are more complex rules that need a special formatting language. For example if you want to display {count} error/errors, then the text of error/errors will vary based on language not just based on count 0/1/more but even more wildly based on language. E.g. some have a special noun declension for 2. For example in Polish, that might be when there are 5 errors: http://blogs.transparent.com/polish/cardinal-numbers/


Python supports `"%(name)s"`, which means you're not dependent on order at all -- in my opinion, much nicer than indexing into the parameter list:

  >>> "%(name)s has %(count)d %(thing)s" % {'thing': 'bananas', 'count': 10, 'name': "Phil"}
  'Phil has 10 bananas'


The new-style formatting allows that as well:

    >>> "{name} has {count} {thing}".format(
          thing='bananas', count=10, name='Phil')
    'Phil has 10 bananas'
It also allows for more complex expressions inside of format specifiers, so if you need to grab data out of an object, you can do something like:

    >>> from collections import namedtuple
    >>> Sentence = namedtuple('Sentence', ['name', 'thing', 'count'])
    >>> s = Sentence(name='Phil', thing='bananas', count=10)
    >>> '{dat[0]} has {dat.count} {dat.thing}'.format(dat=s)
    'Phil has 10 bananas'
which is verbose and contrived here, but could be immensely useful in cases like

    >>> '... {numeral[5]} {item.plural}'.format(
          numeral=fr_numerals, item=animal)
    '... cinq animaux'


How do you handle it when you need to convert, e.g., "American woman" to "femme américaine"?


You use something else. It's still just a format string, not a whole localization solution. Named arguments means it might be easier to hack together something like

    templates = {'en': '{adj} {noun}', 'fr': '{noun} {adj}'}
    print(templates['fr'].format(noun='fromage', adj='délicieux')
but that's still a hack, and doesn't even come close to addressing cases like Chinese's "{adj} {counter_word[noun]} {noun}" or gender concord or any of the myriad other things you come across in practice.

Edit: used 'positional' instead of 'named'.


I found that I often (~1 time in 10) make a mistake with that syntax, and omit the trailing 's'. I think it's because I'm not used to having anything after a closing ')'.


Same problem here. Maybe the braces approach using {name} is an improvement upon %(name)s in that respect.


Do you really want to keep the same format strings when localizing an application? Isn't it much cleaner to just define multiple format strings? Translations in general don't work that way, you can't just move around words and translate them individually, usually the whole expression changes.


That's the point. You can't change the format strings to a different word order if the selection of values used to fill the fields is entirely dependent on word order (i.e., how the basic printf syntax works), unless you have a separate line/block of code that executes when the word order of the current locale differs.


I see your point now. Yes, I can see Python 3's syntax facilitate some of that work. I still think you will need branches and/or polymorphisms to do more complex localizations, especially when you need to support east asian languages.


We just have a database of sentences and a translation is a lookup in the DB by some unique identifier. Trying to semi-translate a string just doesn't work very well in the long run (for us).


2012: Python

    "{:10.2f}".format(x)


Yes and no. You still need to print it.


Fine. This is the equivalent of sprintf.


Mhhh, don't know when: Haskell

  printf "%10.2f" x
(Note the lack of parenthesis or commas)


As of Python 3.3.0a3, both printf and PEP 3101 [1] string formatting are still available, despite references to a possible deprecation of the '%' operator in the documentation [2]. I hope this is one of the areas where the cpython devs can turn a blind eye on the whole there should be preferably only one way to do it way of though.

A few other PEP 3101 examples worth mentioning (and their printf equivalents):

    '{1} {0} and {2} {0}'.format('Wiggin', 'Andrew', 'Valentine')
    '%s, %s and %s' % tuple('%s Wiggin' % i for i in ('Andrew', 'Valentine'))
    >>> 'Andrew Wiggin and Valentine Wiggin'

    '{:<{w}} , {:<{w}} ,'.format('one', 'two', w=10)
    '%s , %s ,' % ('one'.ljust(10), 'two'.ljust(10))
    >>> 'one        , two        ,'

    '{:^{w}} , {:^{w}} ,'.format('one', 'two', w=10)
    '%s , %s ,' % ('one'.center(10), 'two'.center(10)
    >>> '   one     ,    two     ,'

    # see object.__format__() [4]
    'today is: {:%a %b %d}'.format(datetime.now())
    'today is: %s' % datetime.now().strftime('%a %b %d') 
    >>> 'today is: Tue May 08'
Imho, they really are quite flexible and allow for more elegant constructs when needed. Calling them 'vastly inferior' is unwarranted, especially since using them is not mandatory in any way.

[1] http://www.python.org/dev/peps/pep-3101/

[2] http://docs.python.org/release/3.0.1/whatsnew/3.0.html#chang...

[3] http://docs.python.org/release/3.1.5/library/string.html#for...

[4] http://docs.python.org/release/3.1.5/reference/datamodel.htm...


The printf-style is really only deprecated in the sense that most of the core devs would prefer string.format. There's no formal deprecation policy in place to remove the printf-style. The standard library still uses printf-style, the documentation still uses printf-style, and printf-style isn't going away any time soon, if ever.

Many core contributors have sided with printf-style and the community has spoken up a few times about wanting printf-style to stay around. I don't think we'll ever reach a majority siding with removal of it.


Printf format is great, but there's an issue

You're repeating yourself. Why? Because C can't know what types you are passing to it.

So you have to explicitly do '%d' or '%s' to let printf know what this is.

But yes, I think it's easier to type than {0}


In Python's old style you can use %s everywhere, and Python will figure out the types.


This article contains excellent recommendations except for one (the numbers.py module is nearly useless).


Could you elaborate on that?


Raymond, what's your take on the ``redirect_stdout`` class not inheriting? I thought with new-style classes inheriting from ``object`` was the standard.


Python 3 removes old-style classes and thus removes the need to explicitly inherit from object.

(Also it appears it's just an example to illustrate the use of context managers, not of the perfect Python program. You'd never want to use the code in practice, if only because it relies on changing global state and thus will mess up when multiple threads use it at once)


The bit on the yield is a little thin. The other day I noticed that generators have a `send` and `throw` method which provide interesting ways for controlling flow.

http://www.python.org/dev/peps/pep-0342/


Before you decide to go with advanced string formatting, beware the encoding bug: http://bugs.python.org/issue7300


The Python vs Ruby thing is tired by now but reading this reminds me again of how many problems Ruby managed to solve with just one construct (blocks).

Python was my first "scripting" language but modern idiomatic Python is almost unrecognizable compared to the Python I started out with in 1997.

The fact that the Ruby I was writing in 2002 looks almost modern is a testament to Matz' foresight.


You are right - it's very tired. Also, most Python users couldn't care less about language wars.

Maybe with the exception of Java.

I'm also sure the idiomatic FORTRAN or C I wrote in college would be odd to someone who just came in contact with current versions of both languages, but all it means is that they have been around literally for ages.


Here's code I wrote in early 1998, and posted to comp.lang.python: http://lwn.net/1998/1105/a/crosscopy.html .

The code still looks stylistically modern, except for its use of getopt, but that's a library issue, not a syntax one. It's by far from "almost unrecognizable."

Can you therefore elaborate with an example from that era?


Python from that era didn't have generators, new classes, list comprehensions, annotations etc. If you're not using any of these in your code now you're not writing idiomatic modern Python.

But the point of my post wasn't to bash Python but rather to note that Ruby got an unusually large number of thing right on the first try.


Or that it hasn't fixed the warts it has. I use Ruby rarely enough to really, really miss explicit imports. I guess it's not a problem if you use it all the time and magically know what's in the global namespace at any given time but it's a pain in the ass when you have to touch a legacy rails app twice a year.


When I read http://www.rubyinside.com/ruby-2-0-implementation-work-begin... I can't help but interpret it as Ruby having a similar track to Python. Future Ruby 2.0 code will support keyword arguments and refinements.

Ruby 1.9 had various changes over 1.8, like those listed at http://blog.grayproductions.net/articles/getting_code_ready_... .

Some of the new things in 1.8 include allocate, respond_to, fully qualified names, Array#zip, and "In Ruby 1.8.0, both break and return exit the scope of the proc, but do not exit the scope of the caller."

That reads like Ruby of 1.6 (about 10 years ago) would be stylistically different from modern Ruby, in about the same way that Python is stylistically different.

Going back further, early Ruby had a lot more influence from Perl, like magic variable names and the "discouraged" flip-flop operator. The plan is to remove these in some still hypothetical Ruby 3.


Well...

List comprehensions have been around in Python since 2000 and generator expressions were introduced in 2.3 (but you could import them from __future__ as early as 2.2) and also build generators without them.

Every future version of Python that tweaks the class system will always be able to claim every previous version didn't have "new classes".

Decorators (we don't call them annotations) have syntactic support since 2.4.

I'm not an expert in Ruby's evolution but it certainly looks like it got a lot of stuff right. I can't find much about its evolution, however. I'd like to see how it changed over time.


Amusingly, Lisps had these things before Python, Perl, or Ruby.

=)


Maybe you both have a point. Code you wrote for Python 1.6 (1998) could still look like normal Python.

But, code I write now for Python 2.7 looks very different. I use:

list comprehensions sets with...as yield new "collections" classes like Counter looping constructs like enumerate and zip

It's true that you could implement the last two in Python 1.6. But the first four are new language constructs. And list comprehensions, in particular, can be sprinkled all around, not just localized like with/as and yield.

In short, the language has grown. Old stuff looks OK, new stuff looks new.


You could probably make similar arguments about Perl 5.


The difference is that Perl 5 had a ton of warts that really needed fixing but the Perl 6 cavalry never showed up.

Ruby isn't perfect but it isn't 1/10th the mess that Perl 5 is.


I remember a lot more Perlisms in early Ruby.


They're still there but people generally avoid them. Matz has admitted he probably borrowed too much from Perl.


There is however no benefit in replacing a mylist.sort() with mylist = sorted(mylist), in fact it will use more memory.

Isn't that kind of a deal-breaker?


It depends on you case.

If you are doing:

   slist = list(mylist) #create new copy of the list
   slist.sort() #sort the list
   use_sorted_list(slist)
The use of sorted is more more convenient, and really doesn't use more memory (since the memory from that statement you quoted is for the list copy).

More usefully: sorted will take anything that is iterable, allowing you to write simpler code, rather than testing for various types and dealing with them in type specific ways.


The benefit is only lost if you want to store the sorted mylist.

However, sorted() is useful if you want to create a list, sort it, use it, and throw it away... all in a single line.

The blog gives an example where a file is read, sorted and transformed. If he loaded the file to a list and sorted that list before transforming it, there would be an unnecessary copy of the list sitting around consuming memory.


He doesn't do a clean comparision.

If you have a list of data and need this data in a specific order and coincidentially iterate over this list every once in a while, then sorted will be less efficient, because you can sort in-place once.

If you have a list of data and you want to iterate over it in a specific order just this single time, you will need to spend a linear amount of memory to sort the list and then iterate over it.

In the first case, sorted() is not useful, because .sort() sorts in-place without additional memory overhead. In the second case, copy/.sort is required and just requires more code for pretty much the same effect. In this case, sorted() is just as bad as the alternative, but shorter.


And sorted() and reversed() behave differently (return different object classes) while mylist.sort() and mylist.reverse() do. In that way it's also confusing.

The modify-in-place method does seems more natural when you do want the destructive updating case.


Not at all. With .sort() you're suddenly injecting state into your code, with all the complexity that implies.


sometimes clarity and elegance win over performance




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: