String interpolation is one of those features where I'm really confused about the direction the Python team wants to take. It's implicit and magic, and yet another non-obvious way to do do string formatting.
What's more, it's being hailed at a plus for localization which it isn't. Localizers should never, ever deal with string interpolation - anything past what .format() does is essentially untranslatable.
Why is it implicit and magic? Looking at this post, and the PEP, it seems like interpolation is actually pretty close to .format in semantics, but syntactically simpler. There are a few bits that I didn't quite follow in the PEP, so I may just be missing the issue, however.
The important thing for localization is to allow the strings to be swapped at runtime by loading them from some kind of database in a way where the arguments are at least numbered, if not named, so as to let them appear in different orders for different languages. The interpolation syntax does not seem, to me, allow for this: it is an "implicit" syntax offered during parsing to take a constant string and immediately build it out of parts, as opposed to the "explicit" % operator or .format methods: it is extremely clear with these two models how one would load a template that has the strings in a different order, or which used a different subset of the strings.
The PEP that is looking at this problem has been deferred: at the end of the document they kind of indicate that they didn't really think about the i18n problem correctly and so don't actually have a good solution to present, and are going back to think about it more.
If you need to localize, you still have format(). You just happend to have a shorthand to do it for the 80% case, just like you have @decorator as a shortcut for decorated = decorator(decorated) or [x for x in z] as a classic loop alternative.
Yes, having so many solutions is not ideal, espacially when it comes to teach the language. However, it would be foolish to avoid improving the usability of Python just to avoir having "one more way to do it".
I do wish they'd deprecate Template though. It's more than useless.
I suppose they need to add an API so that f-strings can be automatically presented to gettext or another translation database layer before substitution is performed.
Being a die-hard Python user on a daily basis, I concur. I think these core devs should not invent too many new syntax and ways to solve problems which don't need to be added to the language.
I am not sure why i18n is a big deal, let another library deal with it.
Whatever, I have the choice not using this feature after it is accepted and implemented. Their PEP discussion on email always ended up in tangential. Problem I always have with string manipulation is dealing with long string, which for coding style I'd split into multiple +, and thus using format is pretty ugly.
I agree with you. It is already possible to use quote and plus in the same way as the bracket in the new way, with the added advantage that the original syntax is more orthogonal.
Is it really worth updating all the Python syntax formatting and analysis code out there just to save one character on an operator? I don't think it's a good tradeoff.
> (Here we can fix up the silly US date formatting, too, although really you shouldn't be localising dates in your format strings).
ISO 8601 (similar to Japanese format) is the most logical way to format dates: YYYY-MM-DD, easily sortable, not the silly way US and Europeans format their dates. ;)
Date formatting preferences are done at the desktop level and apps should query for it. All frameworks I can think of have a way of doing so, and you can just tell them "I want a long-form date & time", "I want a short date", "I want a short non-numeric date".
If you are a fan of ISO8601 (which I am), you can then set this globally on your desktop, rather than expect the app developers / translators to choose for you.
Someone made a good argument that in order to support lots of languages, you actually need general functions not just strings to be put into .format() or interpolation.
That's because some languages have complicated changes in the text depending on eg the number of things: not just singular/plural, but more complicated. Russian is one example.
In practice, what you need is a notion of a "context" for your formatting (often something that is aware of both the current i18n and l10n factors), and that context needs access to the fundamental objects that would be rendered in to a string.
Effectively, the "format string" becomes an identifier for the particular message you want to render, though often the objects themselves are definitive enough.
In that paradigm, interpolation/formatting/whatever might be the default mechanism employed by the context, but you don't want to have that as your explicit mechanism. You want one more level of indirection before you get to it.
Some platforms make the mistake of having that context be tied to the thread, out worse still, a global, and that falls apart once you have any multiplexing logic (and the way python currently works with posix locales fully captures how terribly you can do this). Either way, I'd argue out really is a different problem from interpolation, and if you are trying to solve it using interpolation, you don't understand the problem.
Can you clarify the last point? I do not get it. Can you make an example of where an extra indirect mechanism (besides interpolation) is _required_ for translation?
I generally localize for western languages and write in many programming languages. The combination of gettext + python format strings has been working really great for me, and generally much better than other systems I've seen and put to use. In fact, the simplicity of gettext provides a very fast turn-around, and with translators experienced with the tool I never had problems. Python format strings also work great in this context, as I can supply an arbitrary dictionary of elements that the translator might need. The only real problem has been plural forms in complex text strings, where ngettext is not always sufficient.
What method (name a project I can inspect) do you recommend as a good localization architecture?
'file_not_found' is just an identifier to lookup the actual format template (likely a format string) that will be combined with the file object to render the error message.
There's not much difference. In fact, you cannot expect translators to know how the underlying object is handled or write data extractor out of the file object.
If translators are cooperating with you, it's very easy to provide the needed elements directly in the format's dictionary (that is: you extract the translatable pieces for them). It also means you don't have to worry that they're going to fiddle with mutable state.
I generally write everything in english, and do back-translation to my own locale (I also cooperate for translating external projects into my locale), so I eat my own dogfood here.
I know I do not want to deal with extra lower-level subtleties here. Translation is hard already by itself. It's impressive how a good translation of a simple UI can take so much time. If I had to inspect the object to know what I can get out of it I would get crazy.
I'd take a pre-baked dictionary any time.
I've also already used the string-catalog approach in the past (heh, XUL), and I'd personally take gettext any day.
I do, yes. But the translator needs to be aware of the extra indirection to produce the text he needs, and between a custom layer and a standard formatting syntax, the second is definitely friendlier for anyone approaching translation, even when technically less powerful.
At some point the translator will have to format some string himself.
OK. How is that different from using functions? Conceptually, there's a function that does arbitrary computation under the hood.
(I think we are already agreeing, only that you express your point differently in Python specific terms. I don't care whether you stuff your code into a context object or something else. I was talking about having the full power of the programming language available, vs using a limited language like format strings.)
I'm a bit confused about all the doubt on such a common feature industry wide. No, it's not magic, it's a simple transformation, takes about 30 seconds to learn, and is becoming an industry standard among modern languages. See C#, Scala, JS, Swift, etc, not to mention bash and perl.
It also has little in common with i18n, the use cases differ too much. Perhaps in the future someone can figure out how to bring them together, but not today.
Well, Rust formatting is magical, unlike that of Go: it has to be because it has type-safe format strings, which can't be expressed in the normal language. But the magic is encapsulated into a macro, so the complexity doesn't leak into the language per se. Someone could write a similar string interpolation macro and replicate Python's feature without modifying the compiler if they wanted--though the fact that nobody has so far indicates to me that it probably isn't needed, as the {} syntax is awfully lightweight already.
Python was and still is to some diminishing degree influenced by C, C++, and Java - none of which have language syntax for interpolating strings based on variables in scope.
I wouldn't say it was influenced by Java, seeing as all the core syntax (functions, classes, modules, etc) was already present when Java was first released publicly. The stuff added later doesn't come from it either (list comprehensions, generators, etc).
Sorry that's about 1000% better. This should have been the one way to do it, originally. It isn't magic either, rather a simple compile-time transformation to existing format syntax. There's nothing new to remember besides a large reduction in noise.
I'm in the "wish it was more explicit" camp. Would a fmt function be that terrible?
fmt("{a} {b} {c}", a=a, b=b, c=c)
If you want to save typing, maybe use :a instead of {a}. Or ?a would have made plain old ? a nice positional variant:
fmt("?a ? ?", a=a, b, c)
The main benefit of the fmt function is that it requires no syntax changes to the language and is trivially provided by a third party library for all past versions of Python.
That being said this ship has sailed. I guess I just take a more conservative approach to syntax changes than most.
Update: a bit sad to see my votes fluctuating wildly on this post. Please don't use votes to support or disagree with me: that's not what they're for. Please vote only based on whether you find this relevant.
There's a fantastic "more explicit" version of it: str.format. This really is just an implicit version of str.format(). I just don't understand why such a significant syntax change would make it into python when it deviates so much from their usual mantra.
Guido is not a purity robot, and has been looking for a way to have this feature since perl and python were duking it out in the early nineties. Practicality beats purity.
It isn't a large syntax change, string prefixes have existed since the beginning.
It is explicit, in that one must use the f'' prefix. There are no syntax changes as u'', b'', and r'' already exist. Positionals make bugs more likely.
It is implicit in that the interpolated values are specified anywhere in scope, not inline as with every other formatting option.
I consider specifying the values or variables alongside the formatting string a requirement to be considered explicit, but I can see how it's a matter of opinion.
The switch is implicit too. It's syntaxic sugar to gain a pratical and elegant syntax for a common use case.
It's not going to introduce vulnerability.
It's going to make your code easier to write and read.
It's going to make bug easier to spot in formating.
It's going to make shell sessions easier.
The manipulation (@dec) is still right next to the thing being manipulated (def dec).
I'm not trying to be pedantic. It really does affect readability when syntactic sugar's affect spans an entire scope. Whether or not that effect on readability is greater or less than the gain by the syntactic sugar is always a matter of opinion. Obviously my opinion is out of line with Python's core devs.
> ... though similar to adding a function that didn't exist before.
This is not true and exactly the distinction I'm trying to make:
Adding new packages, functions, objects, etc. can all be backported to older versions and alternative implementations. They also require no updates to ASTs, linters, syntax highlighters, static analysis tools etc.
Adding new syntax is backward incompatible (unless it's added as a from __future__ import to new old releases) and requires changes to all tools that parse Python syntax (the interpreter, ASTs, linters, transpilers, etc).
Ok, true, though I would argue that it isn't entirely new syntax but rather a variation of an existing one.
It is a shame that linters will have to add a letter to their grammar also, but I argue that the everyday usability and readability for millions will outweigh this drawback.
I suppose technically true, though similar to adding a function that didn't exist before. The syntax extension part of the feature (adding a letter to the grammar) is minuscule.
Some people deride string interpolation as not explicit enough, but I think it's extremely readable and adds clarity. Plus, with the `f` prefix, it's plenty explicit IMO.
String interpolation is only being added for string literals, not for generic strings. This means that you still wouldn't be able to read a template from a file, then format text in.
with open('template.txt') as f:
template = f.read()
formatted = template.format(**values)
Arbitrary string interpolation was proposed, in a way sure to result in security problems:
"PEP 498 proposes new syntactic support for string interpolation that is transparent to the compiler, allow name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP as "f-strings" (a mnemonic for "formatted strings")."
"Full access to containing namespaces?" From strings? Bad, bad idea. This is currently marked as "deferred", but should be marked "rejected with extreme prejudice".
plenty of things can go wrong, since you have control of how many and which variables to interpolate, e.g:
i"SELECT {settings.SECRET_KEY};"
PS: (I don't know why but HN is not updating the page with the reply links needed, so I'll just edit this)
Yes, I agree. I think that the misunderstanding happened when mixmastamyk wrote
> That's a good thing due to security reasons as arbitrary expressions are allowed. There are plenty of templating solutions available
The point is that even if you don't allow arbitrary expressions (which imho are a mistake, and of which I haven't seen a single use case yet), having this kind of interpolation from strings that are not literals (i.e. are not in the trusted source code) would still be a security issue
Since the PEPs apparently don't propose to extend this to non-literals, we're safe. But it's better to be wary and attentively review such proposals...
In fact, I just realized right now that Animats might have misunderstood PEP 501, since
sql(i"SELECT {column} FROM {table};")
should be perfectly safe from sqli vulns
PPS: Unless Animats is pointing out how switching i'' for f'' is a terribly simple mistake to do and hard to spot during a code review... I agree with that
I agree. I was pointing out that the f'{a} {b}' syntax can't accomplish everything. Since it isn't intended as a complete replacement, the way `.format` is intended as a complete replacement for `%`, it will just add another string formatting technique without ever deprecating the old way.
The motivation for PEP-0498 given in the article was the difference in verbosity between these two lines:
"{} {}".format(a, b)
"%s %s" % (a, b,)
That's not very convincing. I was hoping this article would make a good case for interpolated strings, since it's starting to feel like Python is having an identity crisis. Type annotations especially took me by surprise, but string interpolation is another good example of an addition that doesn't feel like Python (imho, anyway).
>>> # Explicit but tedious and doesn't help readability:
>>> print('{very_long_var_name_1}: {very_long_var_name_2}'.format(
... very_long_var_name_1=very_long_var_name_1,
... very_long_var_name_2=very_long_var_name_2))
spam: ham
with
>>> # Explicit but somehow feels dirty:
>>> print('{very_long_var_name_1}: {very_long_var_name_2}'.format(
... **locals()))
spam: ham
and
>>> # Still fits on one line. I think f prefix makes intent clear.
>>> print(f'{very_long_var_name_1}: {very_long_var_name_2}')
spam: ham
Thing is, * * locals() "feels dirty" but does exactly the same thing as f''. Except f'' does it implicitly, hiding dirt under the carpet. One day, that dirt will turn something into a bug; which is exactly why the Zen says "explicit is better than implicit".
The best way imho would be:
vars = {'short1': very_long_var_name1,
'short2': very_long_var_name2}
print('{short1} {short2}'.format(**vars))
Easy and extremely unlikely to ever include the wrong variable.
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
This will probably be TIL for many people. It is a surprisingly hidden feature.
I for one, still like the "%s" % x instead of "{0}".format(x). It is simply shorter and I already know or use printf for C in other parts of the project. But with the new f"..." interpolation, I can see liking that more. I am all for being as concise as possible while still being explicit (if that makes any sense at all ;-) ).
I've always found templates way overkill for most of my needs, but they cover a specific and legitimate use-case (large and complex blocks of text).
In the same way, .format() solved the problem of inconsistency and bugs happening all the time with %. It was a restriction and formalization effort, trying to root out bad practices and hence slightly more explicit, but it made sense.
The new f'' IMHO does not solve anything beyond pandering to developer laziness. It will likely introduce bugs in places where developers are not clear about the context ("oh, I thought we didn't have a 'x' var at this point, turns out we do!"), because (from what I understand) it takes away the ability to define which variables should be considered.
Luckily it will take a while before it percolates in any significant library, but I sincerely hope it just doesn't gain much traction.
I don't know why Python re match objects don't support indexing; a __getitem__ method would allow m[1] and m[2] instead (or m['name'] for named groups).
That aside, though, I don't consider brevity alone the most critical criteria for a programming language; that way lies APL. Expressiveness, yes, but not at the expense of clarity.
At the very least, it sure looks a hell of a lot better. And using symbols and single-character abbreviations for types (in a dynamically typed language, mind you) and an overloaded % which has the extra syntactical rule that it only takes a single argument and so multiple replacements have to be done by throwing them all into a tuple is FAR more intuitive and readable than the way most languages do string formatting. Plus the alternatives, that require you bounce back and forth between the string and the variable it is replacing rather than read left to right the way humans are meant to read strings of text is a terrible design. Something as simple as string formatting should not rely on arcane and nuanced rules which are more or less arbitrary.
You've all been programming in C-like languages for far too long to realize what a horrible design string formatting is. You can argue over "explicitness" all you want, the new way is easier to learn, easier to read, makes more intuitive sense, requires learning fewer rules, ad is close enough to the format string method that they work well together.
As an outsider to the python community and full time ruby dev, this "controversy" baffles me every time it comes up. Lightweight string interpolation is obviously better! I think you all have string Stockholm Syndrome or something.
The one counter argument that makes sense to me is that in general we shouldn't be doing easy string interpolation, since that way lies SQL injection, XSS, etc, and should instead rely on a stronger type system with binary text blobs, HtmlStrings, SqlStrings, etc, with automatic escaping into and out of the data type.
But then that's not the case with Python now. If you're only trying to stick this string inside that string in a quick and dirty manner, I totally don't understand the reticence folks have to something the way ruby does it: "Name: #{first_name}".
If it's obviously better, would you care to share some better arguments than "people who disagree have stockholm syndrome"?
Don't get me wrong, I'd like Python to have better, more obvious, more concise string formatting. However the last time we had this discussion, it was about str.format() and how it was going to be awesome and don't worry modulo-formatting will go away.
Turns out it did not; modulo formatting is still there because why would it be removed. This is history repeating itself - are you actually baffled that some people learn from past mistakes?
"Try it!" is not an objective argument (I have tried it extensively eg. in Bash and I neither like or hate it), and positional mismatch is not a concern when you use name-based syntax, which interpolation forces you into due to its nature.
I've made numerous posts in this discussion, so not going to copy them again here. The previous named-based syntax comes with significant redundancy and noise.
As a ruby dev posted here, it is obviously better in most respects in most common cases.
Not a fan. We already have 2 ways of formating strings in Python, we really should not bring in a third one.
It may be true that
"{} {}".format(a, b)
is a bit verbose, but it is crystal clear and clean. Just remember the Python Zen: "There should be one-- and preferably only one --obvious way to do it." _and_ "Explicit is better than implicit."
If they're adding features purely for the convenience of python shell users, how about, I don't know, remove the need for significant whitespace (by adding "end block" statement). Because Python shell is a major pain to use when you try to copy-paste some code into it and it doesn't like the indentation levels. On the other hand I can copy-paste whatever into my Lisp REPL and it will run just fine. Or allow several statements per lambda. Saving a few keystrokes writing ".format" doesn't even register on the same scale of annoyance.
Approximately 2.5 times since then has whitespace been a problem, and which I fixed in under 10 seconds each time.
Yet, the readability gains from removal of block delimiters in that time frame is uncountable.
> Saving a few keystrokes writing ".format" doesn't even register on the same scale of annoyance
Am I crazy or PHP had this feature since forever? String templates can significantly improve programming speed specially for those of us who are keyboard impaired. Not having to remember how many %s' you need or if one of them should be %d and so on may to most seem just knit picking but personally it makes a day and night difference.
PHP has had it forever. Given PHP's reputation, I'm not sure if that is a good argument for this feature. Rather than '%s' and '%d', why not just use '{}' and '.format'? That way, each argument is converted to a string appropriately. If it becomes confusing with many arguments, it is easy to name the parameters '{myparam}'
If you are in a context where you are concerned about "privileged vs. unprivileged" code, you have to sandbox way more than that, and Lua provides you many mechanisms to control that (more than any other mainstream scripting language). Disabling the "debug" library which I used there for traversing block scopes is the very first thing one does when sandboxing in Lua.
Though I think the communication has been pretty clear that developers should stop using % formatting in favor of .format when moving to Python 3. Not so much a deviation, just a migration. Now that there's another alternative, there definitely needs to be some clear communication about what's going to be idiomatic going forward.
Except that they went and made .format() useless for bytes... which made everyone have to hang on to % for both 2/3 compatibility cases, and for all the cases where bytes templates were actually needed (all kinda of lowlevel wire protocols and file storage formats).
The rift between Python 3 and Python 2 seems to be a fallout of the one-true-way philosophy. In fact, it almost stands the reason that Python 3.6 ought to be Python 4 if one-true-way needs to be upheld.
If Python 3.6 is going to introduce multiple ways to do the same thing, there is no good reason to not merge Python 2 and 3 together and have both set of behavior co-exist with each other (__future__ or __past__).
Apprently you haven't being using Python 3 much. It reduces the numbers of ways to do stuff a lot.
- class stuff(object) vs class stuff;
- range vs xrange;
- itertools.izip vs zip;
- itertools.imap vs map;
- itertools.ifilter vs filter;
- dict.items vs dict.iteritems vs dict.viewitems;
- dict.items vs dict.itervalues vs dict.viewvalues;
- dict.items vs dict.iterkeys vs dict.viewkeys;
- __cmp__ vs __eq__ + __gt__;
- sorted(cmp) vs sorted(key);
They didn't fallout from the philosophy. They have been pragmatic and tried to balance the language design : gaining modern features vs making a robust base vs pleasing the legacy crowd. Is. Is. Very. Hard.
And yes, we would all prefer to have less way to format. Would it mean I would prefer NOT to have fstring ? Certainly not, it's a great feature. We can't live in the past because it will make us not stand perfectly to the ideal we have.
Real life is not ideal.
But merging Python 2 and 3 ?
With the string model completly reworked, that would be apocaliptic. Most people don't realize how deep the unicode change has been.
I have been dev and teaching Python 2 and 3 for years. The amount of problems linked to UnicodeDecodeError dropped by 90% after the switch.
Not because Python 2 model didn't work.
Because nobody understand text.
Most dev don't understand what text is. They just want to format string. That's what Python 3 helps to do, and it does it well.
Mixing both would be like mixing olive oil and vanilla ice. Great on their own way, but use them together all you'll get a terrible meal.
at no point am I arguing whether python 3 is better or worse than python 2. That is an uninteresting conversation because the fact remains that less than 30% of all software is in python 3. Even when Google released Tensorflow, it was in Python 2 (and they employ core developers of Python).
Do you seriously think anybody is thinking of dropping python 2 support by 2020 ? that will only create a fork. None of the core frameworks have upgraded in a decade. Look at Flask for example. So yes, I havent been using Python 3 much - there has been no reason to.
the point I'm trying to make is how to get everyone on the same page. The reason Python 3 was api incompatible was because of the core tenet of one-true-way. All the functions you mentioned may be superior, but unless you give people a way to mix and match both in the same source , you will not have adoption.
Or do you think Python 3 adoption has been successful ?
"PEP-0498 tries to improve this situation by offering something that has been common to other languages like Ruby, Scala and Perl for quite some time: Interpolated strings."
P3 '.format 'is fine, the only problem I have is forgetting the last ')' and vim picks this up. Is interpolation that good to introduce another way of doing things?
I suppose you mean to take the variable names of the arguments without using keyword arguments? That's not possible because .format() is just a simple method call on str/unicode objects.
That statement is against my religious beliefs. Apparently we can stick an f in front of the string, but we can't have default values that work with locals()?
That would mean the formattee would have to dig thru its surrounding namespace. This solution is more elegant, the string is transformed into the equivalent format call at compile time, making the implementation tiny.
Python provides many ways to 'dig thru' the surrounding namespace. I find it ironic that do so is effortless in the language while at the same time is considered unpythonic. Which is it? Is the language itself unpythonic?
What's more, it's being hailed at a plus for localization which it isn't. Localizers should never, ever deal with string interpolation - anything past what .format() does is essentially untranslatable.