While it's far from a functional language, I find that it's possible to build functional style abstractions without too much headache. The iterator protocol is so well supported by the language and the standard libraries that using map, filter, itertools, generator expressions & comprehensions gets you a long way. You get the rest of the way by writing any specialized combinators you need and then imperative code to tie things together. You can build up a lazy pipeline of data transformations that works pretty well.
There are some gotchas where you need to explicitly copy (itertools.tee) iterators when you have multiple consumers, and be careful with mutable values, but it's manageable.
It would be nice to use curried functions as functools.partial gets verbose, but at that point you're far from Pythonic code.
The state of a generator is always a problem that trips me up the most when I try to do FP-Python. In general a function shouldn't care whether the input was a list, set or generator. While iterating over it (or yielding from it), we always mutate some internal state of the iterator.
One solution would be, I think, to always pass tee-d copies of iterators and never(!) any iterator like it is (because you don't know whether next() has side effects or not). The other solution would be to not ever do anything lazy. One could also always consume all iterators fully. That way any generator re-use would be apparent immediately.
Some function that moves a generator forward just one or two items (something with the semantics of findFirst, for example), can subtly introduce bugs.
Personally, I run by a rule that I must consume a generator on the same line it is created, otherwise I use something that isn't lazy.
If you get some huge collection to scan, or wants to sync your code by sharing an iterator internal state, you may want to break that rule. But they are just too troublesome for my taste.
That is a very bad idea in general and will lead to enormous memory leaks when large tee'd iterators get out of sync with one another.
See here:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
Are there any public code bases out there that use this? Would be cool to see what sort of pipelines like this people are building and how they get used.
I never understood his desire to remove 'reduce' from Python. I would love to hear a cogent reason as to why we should have to write a for-loop for everything 'reduce' like. In other languages, 'reduce' (akin to 'fold' for non-Python people) is one of the things you reach for daily.
The main issue is syntactic: Python's syntax is such that reduce() isn't terribly useful. If Python had an equivalent of blocks in Ruby or at least the ability to write non-trivial lambdas, I can see reduce() being useful, but as things stand, it's just kind of useless.
I dunno. I love the functional parts of Groovy. But with Python I generally only break out the functional stuff when it's a choice between one unreadable line of code with a comment or else six unreadable lines of code. (E.g. when parsing an API response with weird and arbitrary nesting.)
The one line would be functional. Here is a good example:
list(filter(lambda x : ('widgets' in x), mixed_widgets))[0]['widgets']
I almost always go for more lines of readable code rather than less lines of unreadable code. But in cases like this, you can either write that as one line of garbage unreadable code or six lines of garbage unreadable code. So I'd rather just leave the overall codebase more dense to make it easier to understand the program flow, and then leave a comment explaining what that line is doing.
Implementing a find function makes this read a lot nicer. You could also call it find_first or find_next.
def find(predicate, iterable, default=None):
"""Returns the first value that matches predicate, otherwise default=None"""
return next(
(x for x in iterable if predicate(x)),
default
)
Which turns the expression to
find(lambda x: 'widgets' in x, mixed_widgets)['widgets']
First, if your’re only interested in the first value, you should probably use a generator expression instead of a list comprehension, otherwise the loop will run for all mixed_widgets even though they won’t be used:
(x for x in mixed_widgets if 'widgets' in x)[0]['widgets']
but this doesn't work, since you can’t do indexing [0] on a generator expression. No matter, next() returns the first value of any iterator:
next(x for x in mixed_widgets if 'widgets' in x)['widgets']
Also, I think it the repetition of x in the 'x for x in' part is a bit ugly, we can fix that by moving the ['widgets'] attribute retrieval operation to inside the generator expression:
next(x['widgets'] for x in mixed_widgets if 'widgets' in x)
definitely more readable - it describes in basic english words what it's doing and it's basically the same as mathematical set notation. The main question to me is the performance implication though. Are generator expressions doing the iteration at C level like map and does that give performance parity then? What about branching - am I correctly assuming that filter is always faster than comprehensions or generator expressions with if-parts?
I believe that the list comprehension will be faster than filter, but as always, any time you replace readable code with unreadable code for performance reasons, you damn well better time it.
In [11]: timeit.timeit('''list(filter(lambda x : ('widgets' in x), mixed_widgets))[0]['widgets']''', '''nw={'abc': 'def'};w={'widgets':'s'};mixed_widgets=[nw]*100+[w]+[nw]*100''', number=100000)
Out[11]: 2.6956532129988773
In [12]: timeit.timeit('''[x for x in mixed_widgets if 'widgets' in x][0]['widgets']''', '''nw={'abc': 'def'};w={'widgets':'s'};mixed_widgets=[nw]*100+[w]+[nw]*100''', number=100000)
Out[12]: 0.5911771030077944
But not generating the list at all is still going to be faster (with bigger gains for bigger data)
In [13]: timeit.timeit('''next(x for x in mixed_widgets if 'widgets' in x)['widgets']''', '''nw={'abc': 'def'};w={'widgets':'s'};mixed_widgets=[nw]*100+[w]+[nw]*100''', number=100000)
Out[13]: 0.3324074839911191
I guess my next question is then - why is anyone using filter/map instead of comprehensions or even generator expressions? Familiarity when coming from FP?
I can see it together with partial, yes, that's when it can become a bit cleaner. Another reason why I use map is when I want to use multiprocessing or multithreading (with IO heavy functions). But on a fine grained level of code I find it really hurts readability compared to comprehensions.
Generally, when the operation I'm applying to each element happens to already be a named function, I find "map(f, seq)" preferable to "(f(x) for x in seq)".
Coming from Ruby it's easier to use map, because it's what Ruby's standard library offers.
However comprehensions are not that harder when one finally decides to understand how they work. Not as readable as map() IMHO. Example:
Ruby
[1, 2, 3].map {|x| x*x} # object.method(args)
vs Python
[x*x for x in [1, 2, 3]]
where we have the function first, then the definition of the variable, then the data. This is the opposite of the object.method OO notation and using a variable before defining it is not what we usually do. But it's almost the usual mathematical notation "for i in set do f(i)" with the function at the beginning.
In my experience filter/map are used by people who just don't know about comprehensions, or are not used to having them available. It takes some time to start using them where properly.
That's actually the talk I was thinking about, but I guess I forgot how exactly Raymond described generator expressions there. Thanks for linking it again!
I have sometimes wondered why python used 'lambda' for anonymous functions rather than just... well literally anonymous function - normal function syntax without the name. Could have been something like
`def (x,y): x+y`
This makes it sound like that was just how it was implemented. Nothing fundamentally wrong with lambda i guess
is valid one-line syntax for a regular function definition (that in this case computes `x+y` and returns `None` becaus Python does not have implicit returns). Maybe the parser could distinguish between them by saying that "if the function definition is all on one line and the function is missing its name, it's a lambda":
def f(x,y): return x+y
g = def x,y: x+y
assert f(3,4) == g(3,4)
assert f != g
That said I have no
Nothing gives me the willies in mathematical code like writing out Greek letters. I don't know why, it just does. The fact that the otherwise excellent Stan (https://mc-stan.org) only accepts ASCII variable names is depressing. I would love to be able to actually write `map(λ x: x+7, range(5))`. I think a big reason why people don't use `map`/`reduce`/etc. more is that writing anonymous functions is still verbose (`lambda` is just too long a word) and relatively hard to read compared to list comprehension. Better still a syntax like `x -> x+7` would be ideal, since I'm pretty sure `->` would be a unique syntax element.
It's because lambda technically isn't an anonymous function. It's more like an anonymous expression. It probably helps to avoid confusion for people trying to do things that can't be expressions in it. E.g.:
There are some gotchas where you need to explicitly copy (itertools.tee) iterators when you have multiple consumers, and be careful with mutable values, but it's manageable.
It would be nice to use curried functions as functools.partial gets verbose, but at that point you're far from Pythonic code.