Writing C in Cython

Smerity · on Oct 20, 2014

Python + Numpy: core loop is computationally intensive + benefits from vector ops

Cython: repeated invocations overpower any core loop advantage

In the case of NLP, which is the author's field, you'll find not only do you have an intensive core task, but the number of times you'll call something is insane.

To give an idea: the time complexity for parsing is O(n^3) [n = words in a sentence]. For Combinatory Categorial Grammar (CCG), the formalism that the author Matthew and I commonly use, it's O(n^6). At that point, the inner loop almost doesn't matter -- even if it was just a null op, the function call overheads and everything around it are performance killers. Both of those are what Cython works incredibly well for, especially when you get full control of the data structures.

This is one of those situations where the concept that premature optimization is evil is really quite far off..! =]

davvolun · on Oct 21, 2014

I don't see where this is premature optimization. You, and the author, clearly know a lot about the field and the nature of the problem, so you're already past the 'premature' portion. If I know from the start that I've got a function that will be called 10,000 times per second, there's no reason I should write the un-optimized version first. That's like starting every sort off with a bubble sort; I don't need to optimize to quick sort switching off to insertion as appropriate from the beginning of coding, but I know I can do better than bubble from the get-go.

Defining "premature optimization" as the less hyperbolic "optimization before we know we need it," I think it's clear that the problem you've written shows you already know you need it!

qewrffewqwfqew · on Oct 21, 2014

I prefer Writing C in Tcl:

https://chiselapp.com/user/rkeene/repository/tcc4tcl/wiki/Ex...

_pmf_ · on Oct 21, 2014

That's amazing and makes me really sad that Tcl has not won the DSL world over.

pjmlp · on Oct 21, 2014

Back in the late 90's I worked for a startup that used it as their core language.

Quite a good experience.

jey · on Oct 21, 2014

Can't tell if serious.

krick · on Oct 20, 2014

Speaking of "really is just C with some syntactic sugar" — so, is there really a point in putting effort in learning (and using) Cython over C? Reading the code in the article I wasn't completely sure if I should feel like it is "cool" or "confusing". Not that there was something I actually dislike, but it's somewhat awkward to see C written like Python, so I can't "rate" it myself right away.

syllogism · on Oct 20, 2014

Well, here's a less trivial example:

https://github.com/syllog1sm/redshift/blob/develop/redshift/...

The task/algorithm is roughly (but not precisely) the one described here:

http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algor...

If you just scroll through parser.pyx, you might get the flavour of why this is better than straight C. The "core", or "skeleton" of the code is all C-typed, but I "fall-back" to Python for work-a-day stuff like reading and writing config files, collecting the set of labels, etc.

There are also places in the algorithmic code where I use Python data structures. In Parser._count_feats, I use a dictionary of dictionaries. This would be quite annoying in C, and efficiency is quite irrelevant here.

I think using Cython in this way has given me a competitive edge in my research. This library has by far the best published results on speech parsing[1], and its speed/accuracy trade-off on written text is as good as any other libraries out there.

[1] http://www.transacl.org/wp-content/uploads/2014/04/46.pdf

krick · on Oct 20, 2014

Oh, thanks, that gives some insights indeed.

> I think using Cython in this way has given me a competitive edge in my research.

May I ask, have you ever considered a possibility using something like OCaml? As far as I heard it's quite convenient to use ML-family languages for writing parsers and that OCaml generates pretty well-performing code while being more like "Python-expressive" than "C-expressive".

> This library has by far the best published results on speech parsing

Ah, yes, I remember it from seeing a while ago, when you just released it. Never looked in the code, though. Didn't recognize it was you, until this post, however. =)

Beltiras · on Oct 21, 2014

Which profiling tool do you use to target your optimizations?

andreasvc · on Oct 21, 2014

I use cProfile; for the Cython code you need to set profile=True as a compilation directive.

srean · on Oct 20, 2014

Your heart is in the same place as mine. Mileage may vary, but I found unless it is the case that I am speeding up a legacy application that has already invested in Python and the Python ecosystem, Cython is less valuable. It may be so because I can churn out C or C++ code productively enough. If I do need to add some scripting, I find adding Lua (now) or Tcl (earlier) lighter.

With Cython I have to remember which subset of Python syntax does the version I am using support, the real time-saver that is Numpy is no longer a timesaver. The reason is Cython has to call back into Numpy's C API for Numpy expressions. If you do that in a loop that is a considerable overhead. The alternative is that you have to write the array indexing code at a low level in Cython. This to me is significant, because avoiding that error prone and tedious part is one the reasons I use Python/Numpy in the first place. Then there is the question of debugging C code that a code generation tool has generated. Usually there is gobs and gobs of it. Cython does try to help you with debugging that, but I find myself better of debugging C code that I wrote myself. Not to mention mature tooling that exists around C or C++. For me, usually what happens is that I Cython'ize a small part, then I have to Cythonize the loop containing it for performance reasons and my Cython keeps swallowing the Python application bit by bit, mostly inward out. So usually comes a time for me when I say screw it, I will just write this in C or C++.

As I said, this is a personal take, your experience with it may vary. I still happily use Cython if its a Python code that I have to speed up, or I have to collaborate with someone in Python. I consider Cython the best tool for doing this. If, however, I were to write something from scratch, I typically just go ahead with C++ now a days, and have lately been poking and prodding at Julia. I like what I see there, a lot.

sbrother · on Oct 21, 2014

Another example that I worked on last week is for the UI framework Kivy. Most of the core libraries are in Cython, and particularly for the graphics stuff I think it's a nice balance of execution speed and ease of development. In particular, it's trivial to use the Cython objects from pure Python, without having to mess with the CPython API on either side.

Here's an example of the experimental SVG parser/renderer we've been working on over the past few weeks, if you want to see some more "real world" Cython: https://github.com/kivy/kivy/blob/master/kivy/graphics/svg.p...

andreasvc · on Oct 20, 2014

Cython is used to write extension modules for Python. If that is something you want to do, it is better than plain C because the code is more compact and less error prone because some things are taken care of behind the scenes. The point of using Cython is to combine the best of both worlds (Python and C).

krick · on Oct 20, 2014

> some things are taken care of behind the scenes

Like what?

andreasvc · on Oct 20, 2014

Reference counting, type checking, converting iteration over lists or ranges to more efficient code. In general a lot of error checking.

dang · on Oct 20, 2014

This is similar to how people are using the LuaJIT FFI, as a higher-level, garbage-collected C that lets you control data layout in memory.

haberman · on Oct 20, 2014

That's an interesting comparison, because under the hood Cython and LuaJIT FFI could hardly be more different.

Cython is an optimizing Python (+ extensions) -> C compiler.

LuaJIT FFI is a library for Lua. It doesn't change the language at all (there are none of Cython's optional annotations), there is no build-time tool or need to run a C compiler.

But I can see how they could serve some of the same use cases.

_dps · on Oct 20, 2014

"LuaJIT FFI is a library for Lua. It doesn't change the language at all ... "

This is literally true, of course, but in my ~5 yrs using it now I can say that writing LuaJIT+FFI-data-type code is, for me, both aesthetically and mechanically different from writing plain Lua code. The post captures it quite well:

  This is basically the inverse of the old promise that languages like Python came 
  with: that you would write your whole application in Python, optimise the “hot 
  spots” with C, and voila! C speed, Python convenience, and money in the bank.

  This was always much nicer in theory than practice. In practice, your data 
  structures have a huge influence on both the efficiency of your code, and how 
  annoying it is to write.

I often write my LuaJIT programs from the ground up designed around C data types and C-style memory allocation discipline. Bu I can always ditch that in areas where I know I don't care (e.g., configuration loading etc where 10 lines of C become a 1-liner in Lua).

dang · on Oct 20, 2014

In both cases a tool is being adapted for something it wasn't intended for: to write C-style code—specifically, code that works with memory the way C does—in a high-level scripting language.

This way of combining high and low levels within the same program seems qualitatively unlike, say, C++, traditional FFIs, and other answers to this problem. It's fascinating to see it emerge from more than one existing technology.

maxerickson · on Oct 20, 2014

"Python with C data types" was somewhat of a design goal for Pyrex, the language that Cython followed/built on:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ver...

dang · on Oct 21, 2014

That's fascinating because it is very much about the same problem.

maxerickson · on Oct 21, 2014

I guess I'm not sure which tool you meant was adapted; Cython (and Pyrex) certainly adapt Python to work more directly with memory, but I would say the style of code in the article is an intended use case of them.

(I also might have undersold the connection between the two, Cython is arguably just a friendly fork (I guess mostly they wanted to move faster))

dang · on Oct 21, 2014

I just meant that the OP is adapting Python (via Cython) similarly to how FFI-driven LuaJIT adapts Lua. The resulting programming style is different enough from classic Python or Lua that it approaches being a different language—something like an embedded C DSL.

xioxox · on Oct 20, 2014

When I've tried to use Cython I've not enjoyed it. I feel like I'm invoking a load of barely comprehensible copy and paste magic if I need to use numpy arrays. I imagine it would be possible to understand this stuff, but I need it so rarely that I can't justify learning it. C or C++ feels so much simpler.

ngoldbaum · on Oct 20, 2014

Can you give an example of the copy/paste you're worried about?

I agree that working with numpy arrays in cython introduces some semantic overhead, then again in newer version of cython they've introduced the more generic concept of typed memoryviews [0]. I had a lot of success recently using this to generate some very efficient multithreaded code with a minimum of hassle.

[0] http://docs.cython.org/src/userguide/memoryviews.html

xioxox · on Oct 20, 2014

It's things like the square brackets and triangular brackets, e.g. def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g)

I know what this is for, but I have no idea how this is implemented - is it some sort of library, some sort of macro, something inside the compiler, or what? It feels very opaque. I don't understand the lifetime of these objects or what could go wrong.

Even with C++ templates, I can usually work out how something is happening, but not here without research. The Cython-specific syntax also feels pretty unpythonic. The feeling I get from using it is that it's an ugly Python/C/C++ mutant with a load of hidden rules.

What I really would like is a set of templates or even macros for C++ which would allow easy wrappers to be written for different Python implementations, handling the reference counting, etc. Does such a thing exist?

andreasvc · on Oct 20, 2014

The example you give is the old way to specify the type of a numpy array. With typed memory views it's more straightforward IMHO:

    cdef convolve(double [:, :] f, double [:, :] g)

I don't know if there are templates or macros for C++ which do the same thing, but if you look at the output of Cython and the complexity and amount of work it's doing, I don't think it's likely something like that could work.

The generated code is of course clunky and difficult to follow, but I found that when trying to understand specific things it's easy to follow what parts of the Python C API are being invoked and consider the overhead involved. The annotated HTML output is useful for this (cython -a).

ngoldbaum · on Oct 20, 2014

And usually you want to avoid invoking the python C API from cython unless you have to use it for some reason. The whole point of the typed memoryviews is that you get direct access to the raw memory buffer and can access it like a numpy array or even like a pointer array.

This blog post [0] incrementally shows how to get microoptimizations out of your cython code.

In the end, the autogenerated code is usually not too bad too follow since it's a one-to-one transpilation from the cython to C. cython -a helps a ton, or if you can use it, the cythonmagic [1] for the IPython notebook embeds the output of cython -a inline in a notebook, making the iteration process much quicker.

[0] https://jakevdp.github.io/blog/2012/08/08/memoryview-benchma...

[1] http://ipython.org/ipython-doc/2/config/extensions/cythonmag...

Fede_V · on Oct 20, 2014

In the specific case you pasted, the [] are just syntactic sugar to specify information about the arrays that you are passing in to Cython.

I find that far more legible than complicated template expressions - but that's likely a matter of habit.

About the object lifetime: there is a lot of good information on the Cython webpage (although it is a bit spread out) - but usually, unless you are writing Cython only applications, I tend to write tight Cython numerical routines that take a pre-allocated input/output vector and just update the output vector.

syllogism · on Oct 20, 2014

In response to https://news.ycombinator.com/item?id=8482445

MordodeMaru · on Oct 20, 2014

We developed biicode (a C and C++ deps manager) in python using Cython

http://blog.biicode.com/bii-internals-compiling-your-python-...

mahyarm · on Oct 20, 2014

So how would you do multithreaded apps with cython while avoiding stuff like the GIL?

I work with objective-c a lot and python was my first language, so to see this kind of unification just blows my mind.

psb217 · on Oct 20, 2014

As the sibling comment suggests, multi-threading through Cython isn't as smooth as it could be. But, it doesn't seem too bad. I've used it in a rather rudimentary way to accelerate key computations for some NLP models that I've been playing with recently.

As an example in [1], you can check out the function at lines 293-314, which calls the function at 224-256 (or lines 259-291 depending on the machine architecture...). The actual dispatcher that initializes the threads is in [2]. I guess the key points are the use of "nogil" in the function declaration on lines 224-229, and the use of "with nogil" when that function is called on line 312. Note that the code is complicated a bit by my use of BLAS functions accessed via function pointers provided by SciPy.

[1]: http://github.com/Philip-Bachman/NN-Python/blob/master/nlp/C...

[2]: http://github.com/Philip-Bachman/NN-Python/blob/master/nlp/C...

syllogism · on Oct 20, 2014

In theory this is easy and good, in practice it's not.

http://docs.cython.org/src/userguide/parallelism.html

The problem is that it's really nasty to declare all your functions nogil, because that means you don't get exception handling. There may be a trick to this that I've missed --- I find the GIL handling stuff in Cython sort of confusing.

I think my applications are genuinely hard to multi-thread, so maybe it's usually easier than I've found it. It doesn't help that I use a Macbook Air, and OpenMP isn't available for CLang. Developing on OSX sucks sometimes.

ngoldbaum · on Oct 21, 2014

I'v had a lot of success with cython's parallelism. The key is to release the GIL in tight inner functions of big loops. You can then parallelize the loop by iterating over calls to the nogil function or "with nogil" block.

The support for only openmp kind of sucks, but I'm working on a Linux cluster so it doesn't bother me except when I'm on my laptop.

andreasvc · on Oct 20, 2014

Sometimes it is possible to do long-running computations with pure C code and the GIL can be released. Mostly though you use Cython because it lets you combine Python code with C code, and any code that touches Python datastructures must hold the GIL. Therefore the same advice as for Python in general applies, use multiprocessing instead of threading.

cturner · on Oct 21, 2014

In C, you need header files so that vars are declared before they're used. How does cython deal with this? I find headers painful, and would appreciate a memory foundry language more if it didn't suffer this.

syllogism · on Oct 21, 2014

Unfortunately, you're dead right --- this sucks in C and it sucks in Cython!

I don't understand why .pxd files are necessary --- they should be generateable from the .pyx files. At some point I'll have a go at this...

But, as of now, you need to maintain this separate file, which you must keep synchronised to changes in the original. And, yes --- this sucks!

The only mercy is that writing code in the headers is illegal.

andreasvc · on Oct 21, 2014

It is not illegal to write code in pxd files. For cdef functions to be inline-able across modules, they need to be specified in a pxd file. But that is the exception to the rule.

andreasvc · on Oct 21, 2014

In Cython you can use Python modules/functions/variables as you normally would, dynamically and without declaring them. To integrate with C code you do need headers and type declarations. Other tools (ctypes, cffi) can load libraries dynamically, but I think that has its own tradeoffs.

Reef · on Oct 20, 2014

It is a bit hard to read code with one-letter variable names

syllogism · on Oct 20, 2014

I took them from the original post. I imagine they're the standard forms of the equations.

zellyn · on Oct 20, 2014

I'm not sure why position is "r", but F for force, m for mass, w for world (in a short function), N for, well, N - doesn't seem too bad. Math looks much worse if you spell everything out.

randlet · on Oct 20, 2014

It's pretty standard to use r as a position vector in physics.

stefantalpalaru · on Oct 20, 2014

He's following the code in the original article: https://www.crumpington.com/blog/2014/10-19-high-performance...

Makes side by side comparisons easier.

lhc- · on Oct 20, 2014

I took at peek at Cymem just to see what was going on, as it was described as a "small" wrapper. A 3600 line .c file was... more than I expected.

syllogism · on Oct 20, 2014

https://github.com/syllog1sm/cymem/blob/master/cymem/cymem.p...

The .c file really shouldn't be in the repo...

andreasvc · on Oct 20, 2014

That is the C code generated by Cython. The actual source is in the .pyx files.

pjmlp · on Oct 21, 2014

While it is interesting to see all these efforts improving the Python ecosystem, I go straight to a ML language with native compiler.

timClicks · on Oct 20, 2014

I wonder how much time is needed to learn C proficiently to make use of Cython properly.

Fede_V · on Oct 20, 2014

It depends what you mean by 'use cython properly'. To speed up some numerical routines, you don't need to know almost anything about pointers/malloc, etc, you just move your Python function into its own separate file, rename it from .py to .pyx, add type annotations, and enjoy the free speed up.

To do more complex things (low level wrappers, etc) then you do need to know more things about C.

ape4 · on Oct 21, 2014

Its good for job security.

aleksandrm · on Oct 20, 2014

Cylons write in Cython.