Hacker News new | past | comments | ask | show | jobs | submit login
Why is BIND 10 written in C++ and Python? (isc.org)
103 points by AndrewDucker on Feb 27, 2013 | hide | past | favorite | 131 comments



There's a comment in the post:

"As of right now, it ends up that about 75% of our code is C++ and 17% is Python (link) since it turns out that a lot of BIND 10 is performance-critical."

Which could easily be taken the wrong way. I believe the right way to think about it is "How much _more_ C++ code would there be, if there wasn't that 17% in Python?".


Hum, I don't think interpreting this comment as Python is not adequate for performance critical applications is misinterpretation. The author makes it very clear in some other part of the text:

> [Python] has all of the features that we were looking for… except performance.

Now your interpretation is also valid since they could have written everything in C++ but they didn't.


There is a python DNS server running on pypy that seems to get a pretty big speedup over CPython.

Here is the benchmark result: http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472...

0.002 for pypy VS 0.015 for CPython, which is 6.66 times faster.

So, for the last couple of years they have been wrong - since PyPy has shown pretty good performance in a DNS benchmark, so Python can have pretty good performance ;)


As much as I respect and appreciate Pypy, BIND is one of the core foundational pieces of the internet. It needs to be bulletproof and robust. I don't think Pypy itself qualifies as a mainstream enough language for them to build BIND on. This isn't a slight on Pypy, because the core problem is simply that it hasn't been around for long enough and subjected to enough stress to earn entry into the highest of the high tiers of reliable software. That's not a bad thing per se, and it can be fixed, but in the meantime, ISC has to take the situation as it is now, not how it might be in five years.


> I don't think Pypy itself qualifies as a mainstream enough language for them to build BIND on.

Minor pedantry: PyPy is not a language. It's an alternative compiler/runtime for Python.


I was echoing the terms used in the post. I thought it was clearer than going into a long comment about how Python is mature, at least in its CPython implementation, but the PyPy implementation is not, and when the blog post said "language" they really meant "implementation" because language is not implementation, yada yada yada.


Performance isn't a quality that is either present or not present, it's a quantity you can measure with a number as you did. It's true that pypy is much faster than CPython, but it's also true that C++ is much faster than pypy. Cpython is normally about 2 orders of magnitude slower than C++ on computationally intensive workloads, and if pypy get that down to a single order of magnitude that's great... but not enough here.


[citation needed] You're mostly talking nonsense. PyPy is usually 20-100x faster on the numeric workloads I measured than CPython, while the difference is significantly smaller on non-numeric workloads. The difference between C++ and CPython is also smaller.


I think the issue is that the percentages may be measured in lines of code.

If one language is more verbose, counting lines of code will throw off conclusions about how much functionality is implemented using each language.


C/C++ and python is a nice combination, and from what I hear, even a best practice when dealing with python optimization. I strongly recollect hearing python developers promoting the idea of first analyze performance, and then rewrite the biggest resource hogs to C code and thus get the best from both worlds. I assume that c++ will have the same usefulness here.

It would be very interesting to see the code paths being run in python vs C. I suspect that the 17% python code is actually is around 80% of all possible code paths, but that the 75% C code is just like 5-15% of possible code paths. A possible way to check that could be by looking at the test suites and see which one is bigger, python tree or the c++ tree.


A good overview of BIND 10's architecture: http://jpmens.net/2012/12/21/completely-different-bind-10/

It seems that all apart from the performance critical parts are written in Python 3.1


Wow! Someone at the ISC has finally learned what Unix means. I've not used Bind for years after being burned by it's security issues back in the 90's. The 'one big monolithic' program was a terrible idea. I had used DJB dns for years, even with its shortcomings in areas, the modular design caused a bug or security issue in one area not to kill the entire system. Hopefully with bind 10 we'll be able to use user separation (or at least enough selinux) to keep each program in its own security 'domain'.


BIND 10 is not the Unix way. In BIND 10, the use of separate processes is just an implementation detail. The Unix way is about presenting small programs to the user that can be composed. To the user, BIND 10 is actually more monolithic than BIND 9 because it needlessly includes a DHCP server. The Unix way is also about human-editable text files - SQLite zone files and icky JSON config files are the antithesis of that.

I'm not saying the separate-process design of BIND 10 is bad (to the contrary it's good for security), but using multiple processes internally is only Unix-y in a superficial way.


I think it's interesting that they did consider C and then decided it was too much of a risk. There's a constant theme of 'why use C++? C is faster and doesn't suck as much!' especially due to Linus' statements on the language, but often C is a technical risk as there's so much more that can go wrong. I'm not saying C++ is a better language than C, but it's definitely different and given the language is structured to be less of a headache I'm surprised it's often shot down over C anyway.


It's a tangent, but: has Linus ever given a presentation that involved actually discussing code? Perhaps code projected on a screen, even part of a slideshow?

It's odd that he's such an accomplished engineer but every presentation I've seen him give involves saying outrageous things while starry-eyed geeks stare at him adoringly. If it weren't for the fact that he's arguing from a position of (very great) authority, he would persuade very few people of anything.


Ah, but he has earned this position of authority by actually delivering. He did not get that position of authority randomly, nor by an act of nepotism nor by making promises during an election.

First, by delivering working code that ended by running the majority of phones and smart devices out there. Also, by revolutionizing source code control. No, he didn't invent (almost) any of the concepts behind git, and by now the majority of the code was not written by Linus. However, he was able to strike a balance between features, usability, speed and working model that DID revolutionize version control. Monotone pioneered a lot, but was too slow and cumbersome; so was Bazaar without pioneering much. BitKeeper had a lot of things going for it, but freedom and price working against it. is more or less on par with git, but it's git that brought the revolution.

Second, by being able to successfully manage more than one huge project with hundreds of contributors, all of whom he can fire at, but which he didn't actually hire (nor can he, if he needs more work).


Well, I've not seen any but that doesn't preclude him actually giving them but I'm sure he has said in interviews he mostly just does email and patch wrangling rather than actual coding these days.

He's loud and opinionated and one of the tech folk heroes, his opinion is going to be listened too and often parroted due to it, even if it's just his opinion rather than a technical fact.


Linus does not code much these days, at least not in the kernel. Writing emails and git-merging is most of his job.

If Linus is coding, it is probably more on his scuba diving tool than the Linux kernel.

https://github.com/torvalds/subsurface


> It's odd that he's such an accomplished engineer but every presentation I've seen him give involves saying outrageous things while starry-eyed geeks stare at him adoringly.

Why is this? Because the man delivers.


Presentations is not his job. I remember in 1992 when he has tried to compile the kernel using g++ mode. Compiling C in C++ mode was a way to detect more errors and to support migration toward C++. A lot of nasty bugs in g++ were discovered thanks to Linus. His main concern was that code generated by g++ was a lot slower than C. Many optimisations were not possible due to the language.


> as there's so much more that can go wrong

Uh. Everything that can go wrong in C can go wrong in C++ by definition.

C++ also adds a lot of extra things that can go wrong which don't exist in C. There are plenty of arguments to be made for C++ over C but "so much more that can go wrong in C" is not one of them.

The C++ FQA is strongly recommended reading: http://yosefk.com/c++fqa/


>but often C is a technical risk as there's so much more that can go wrong.

That is objectively incorrect. There is much less that can go wrong with C, as there is much less period. Everything that you can do incorrectly with C, you can do incorrectly with C++, plus 10 times more things that C++ introduced. The idea that C++ is safer because "we just won't do dangerous stuff" is silly, as any language is safe if you "don't do dangerous stuff", including C.


Alright, I probably phrased it wrong. C++ was written with specific safe guards against a number of common issues than can occur in C code, still it's more than possible to go horrifyingly wrong in C++ but the language has been designed (well, 'ish', it's still C++) to help avoid simple issues. The issues then come from the fact it's still is a fairly complex language and developers are often going to make mistakes.

All languages are safe if you've written perfect code, but no one is perfect, C++ does try to catch some of the lower hanging fruit problems but if you're writing hoary code you're going to blow your foot off eventually.


And with C you can use static analysis tools to avoid simple issues as well, plus you get a simple language so you aren't making the mistakes that you get with a huge complex language. OpenBSD is entirely C, and they have a far better track record than most C++ software.


C++ allows to you to

- use safe arrays with bound checking if you feel like to

- use automatic memory management

- pass arguments by reference and being sure they point to valid data

- use proper strings without caring if the null character is missing

C++ is only dangerous thanks to the C compatibility legacy.

Don't use C'isms and the application will be a lot safer than doing pure C coding.


I am not sure about the reference passing, but the rest you can do in pure C, too.

You can bound-check your arrays at run-time, by wrapping them in a struct and only accessing them with functions.

There's even a proper (although conservative) garbage collector for C, while C++ usually boils down to reference counting.

Even in C you are not restricted to the standard library to handle strings. C doesn't have to mean null-terminated strings. (And in C++, too, string literals give you the old-time bad strings.)


> You can bound-check your arrays at run-time, by wrapping them in a struct and only accessing them with functions.

This is no longer a data type seen as a language type, but an Abstract Data Type as known in Computer Science.

You are no longer using arrays, but a data structure made by yourself.

> There's even a proper (although conservative) garbage collector for C, while C++ usually boils down to reference counting.

If you are referering to Boehm-Demers-Weiser GC, it also works in C++.

C++11 also has a GC API.

> Even in C you are not restricted to the standard library to handle strings. C doesn't have to mean null-terminated strings.

The moment you do this, you are the strange kind in town as all libraries expect C style strings as input. So it is conversion party any time you need to call those functions.

> (And in C++, too, string literals give you the old-time bad strings.)

Yeah, this is a consequence of C's compatibility that infected C++.


> This is no longer a data type seen as a language type, but an Abstract Data Type as known in Computer Science.

Yes, it's no longer a built-in data type. But non-builtin-types are perfectly fine, too. By the way, how do you get bounds checking in C++? I guess you use the same technique, but C++'s overloading makes it syntactically easier to hide that you are not using a built-in?


Yes, but that minor difference makes a huge difference in usability.

Thankfully C++ is powerful enough that user types have the same rights as built-in ones.


C allows those things too, that's the point. Saying "I choose to code unsafely in C, and safely in C++, therefore C++ is safe and C is dangerous" is dishonest. And C++ has plenty of its own dangerous aspects, not just C compatibility.


> And C++ has plenty of its own dangerous aspects, not just C compatibility.

All of them would go away if C++ wasn't made to be C compatible.


I guess you might as well use D, then?


> Everything that you can do incorrectly with C, you can do incorrectly with C++

This isn't true. C++ doesn't allow certain implicit casts that C allows (e.g., from void* to T*).


For core services like BIND I don't see any good reason to write them in more than one language (actually anything other than C/C++) and introduce cumbersome dependencies because of that. I wouldn't complain if it was some another DIY name server toy created for whatever reason (how much time it would take..., I can do it!, etc.), but it's software that will become widely deployed in upcoming years (almost) without doubt.

How much more work these 17% done in Python would take to be (re)written in C++? Would it make the code that much worse in terms of quality and managability? Having coherent, one-language codebase, is a good feature on its own too, often improving above mentioned factors.


You can look at the source yourself.

I don't understand either, randomly opening parts of both the Python and C++ and it's certainly not complicated and they write the Python in a C style anyway.

I'm not a C++ coder and I only play with Python now and then, but can't say the code impressed me much. It's actually pretty hard to skim the code because it's massively over-commented and there are far too many tiny 2-line private functions that are called by exactly one other 2-line private function that is called by exactly one other 2-line function. Or Holographic code as John D. Cook called it[1].

And the copyright notice in every source file is extremely irritating.

Far too high noise to code ratio for my tastes, so I got bored before I could really 'see' how the Python had helped.

but I can't judge that well as I'm not sure if that's all just a bi-product of using C++ and being an open source project that comes out of committee.

[1] http://www.johndcook.com/blog/2012/01/09/holographic-source-...


In general, abstracting code out into functions can be useful, even when the new functions is only called once.

That is because functions are a well understood abstraction. And having code separated into functions makes it easier for the reader to deduce the coupling points: two blocks of code after another can have all kinds of weird dependencies, e.g. the first blog might set some local variables that the second one relies on. But functions just have arguments and return-values.


Obviously.

I've found in my experience working with other people's code and maintaining large code bases you find that overly nesting functions causes a lot of problems when you're trying to read or debug code.

You often also see problems where the essentially dependant functions start to separate in the code as people accidentally add new functions between them.

The article I linked is good, John describes it well. It's a nightmare to work with when you get triple or quadruple nesting of tiny functions, like in the code of this program. It's totally unnecessary.


I guess it also depends a bit on the language you are working in. Some languages have an easier time dealing with functions. In, say Haskell, having quadruple nesting of tiny functions isn't too much of a problem---and if you are pedantic, is the only way to create a function with four arguments.


Go seems like it would have been perfect here, but of course, the language wasn't even a glimmer in Robert Griesemer, Rob Pike, and Ken Thompson's collective eye yet.


Obviously I can't speak for these guys, but if I were designing it the fact that Go uses a garbage collector is a pretty strong minus. I like GC'd languages for a lot of tasks, but if I care about performance to the extent that they need to with BIND, I'm not going near it.


Interestingly, they actually wanted a garbage collector:

"The language had to address most of the problems with C. Ideally this meant something with good string handling, garbage collection, exceptions, and that was object oriented."


Yeah, I saw that, and my WTF sensor lit up. DNS isn't trivial, but it seems like a sufficiently well-explored problem that memory lifecycles should be pretty well-understood.


Quoting from the article, with my replies:

> String manipulation in C is a tedious chore.

Use a dynamic strings library, like Postfix for instance, and everything else.

> C lacks good memory management.

So strange that you went for C++ for most of your code that is not immune of problems from this point of view. I could understand that point if you were opting for a language with GC support. With C you can easily get better (that is, safer) than C++ native MM just building a reference counting system on top of your C "objects". This is trivial and it is what Redis, Tcl, C-Python, and many others are doing.

With Redis memory leaks or memory management never was a big issue.

> Error handling is optional and cumbersome.

Exceptions mostly suck, and in system software the only sane way to deal with errors is C-alike IMHO, that is, check the return value / error returned by every function and act accordingly.

> Encapsulation and other object-oriented features must be emulated.

This is not an objective point since many thing that this features are actually a problem.

Weak points IMHO, and C++ and Python with the minority of Python looks like a design error.


So I used to agree with you, but I've done extensive performance critical C development at Facebook (memcached, a new thing we are about to talk about) and I have also done extensive performance critical c++11 development (our layer 7 load balancer for http) and I'd have to say that c++11 is the far superior option if you really, truly understand what the compiler is doing to your code (a huge caveat).

Unique_ptrs are a total game changer, and the ability to use closures and lambdas when you want to set a callback function instead of a function pointer with a context pointer you have to cast and decode is absolutely huge for readability. maybe we aren't getting every last bit of performance out of it that we could with C, but it works at our pretty ridiculous scale, so I think C might have been a premature optimization for us.


First you are saying c++11 is the far superior option (with that one caveat) and in the last sentence you imply that C would still be faster. Can you clarify? I'll add my opinion too: I think all the readability you can get out of c++, will be wasted in layers upon layers of object oriented design and C compiles much faster, so there is that.


C++11 is the superior option because it is easier to write correct code with it, plain and simple. And we aren't talking about orders of magnitude difference in throughput or latency, we're talking about a slight increase in CPU idle.

And layers upon layers of object oriented crap is also a problem in C (I've seen it). At the end of the day I've just been burned more by the complexities of building large things in C (particularly when people do reference counting in C) than I have been by the complexity of c++ in general.


"Unique_ptrs are a total game changer"

You meant that a linear type system is a total game changes. Unique_ptrs are an ugly hack to emulate a linear type system in an inadequate language.


> With C you can easily get better (that is, safer) than C++ native MM just building a reference counting system on top of your C "objects".

Why build your own when C++ has std::shared_ptr.


I was referring about how to do it with C.


Hi antirez

C doesn't have a destructor that gets called when something goes out of scope. That's taken advantage of in C++ (RAII) to implement various things (like scoped_ptr that helps avoid leaks).

Refcounting is also not perfect for every case.

But I like and prefer C. ;) Just pointing out one thing different in C++.


How is a C implementation of std::shared_ptr safer?


Does anyone know of large projects where C++ handles memory allocation failures gracefully? For services that need to stay up and never crash under memory pressure, I would think C is the way to go. It's too hard to reason about control flow with exceptions.


You don't have to use exceptions in C++. You can do stuff the same way as in C by checking the return pointer.

I've worked on C++ apps that had built-in garbage collection (basically asset (geometry) paging) for huge amounts of data that would allocate/free/page on demand based on what was going on.


Sure, but then you can't use the standard library or any other libraries. Even a single exception sneaking into the codebase breaks everything.


We did - loads of Qt and std:: stuff - admittedly we used a custom memory allocator for the std:: stuff, but it worked fine.


Too performance critical for a GC! Reference counted all the way.


I'm finding that Cython[1] can be a very convenient way to speed up the critical parts of your Python. Anyone who considers porting parts of their app to C++ should give it a look.

[1] http://www.cython.org



Did they consider C with a pool allocator library like talloc? It's a good fit for servers (cf. Samba using talloc, and Apache which uses another pool allocator).


Nice to see more projects moving into more expressive and safer languages.


I'm not sure that I'd call dynamic languages like Python safer, it depends on context. Compared to C/C++ you lose static typechecking but gain a safer memory model.


Anything is better than C.


You haven't used PHP much? Or MUMPS?


PHP is not a compiled language for systems programming.

As for MUMPS I know it is something used only in US it seems.


Oh, I wasn't aware that you only talked about those. MUMPS is also not for systems programming, as far as I know.

To nitpick a bit, compiled or not is more a property of the implementation than of the language itself. Of course, some languages are more commonly compiled than others. But, don't Facebook have a PHP compiler?


> But, don't Facebook have a PHP compiler?

Yes, but I don't see PHP as a possible systems programming language, even with a compiled implementation.

Uhm, dreams of device drivers written in PHP...


> "C lacks good memory management"

C lets you manage memory without possibly inefficient or even broken magical black box automated processes or garbage collectors. Maybe what the author meant to say was "C lacks easy memory management."


Funny you mention "black box automated process" because, what do you think malloc is?

There are two ways of allocating memory directly from the kernel (IIRC), brk and mmap.

There's some management malloc does, and if it's good or not depends on you application.

For example, size, number and behaviour of your allocations. Depending on your situation you may want to do your own memory management.


When I worked on embedded systems, we didn't use malloc() at all, just some heap functions. Even after that, there have been times where I worked on projects where there was a single malloc() and that was turned into a memory heap.

However, a call to malloc() is pretty straightforward, you expect it to return a pointer to the allotted memory, or not. Using a GC or the boots libraries is not quite as straightforward, and the black box is a lot bigger.


You're kind of reinforcing the OP's point though. C does lack easy memory management, unless you're ready to rely on external GC libraries.


Consider realloc. Seems like a sensible function, with its ability to extend memory blocks in-place. Except, you can't try to extend a memory block in-place, but not move it if it can't be extended.

malloc and friends really are a quite poor way of managing a memory space if you want to do even slightly clever things I find, unfortunatly


There are plenty of malloc alternatives (tcmalloc,jemalloc, etc)— and yet I'm not aware of any that has bothered with a realloc_only_if_you_dont_need_to_move(). I'm not aware of any higher level language construct that reduces to that— where your program flow changes depending on details of the systems memory management. If this were an interesting case, I would have thought someone would have implemented it somewhere.

Seems to me that you're making a weird strawman argument there.


This isn't a detail of systems memory management, this is "all my pointers just moved to a different place". You can't really get something more major!

You can't use realloc on C++ types (which typically need their constructors/destructors running without their memory address moving underneath them). I've written C types which had similar behaviour, and were not happy about being moved. Of course you can (and people do) write code which will after the move go through and do fix-ups, but it is often move pleasant to do the move yourself, if an in-place move isn't going to work.


The magic of C is that no one is making you use realloc() if you don't want to. realloc() is also written in C. You can write your own, and people do, all the time.


Unless you're the most complete full stack engineer ever, at some point you're going to be relying on a magic black box. The trick is to know what is acceptable as a black box or not. At this point in time, I'd say most memory management systems are pretty competent enough to warrant usage.


Obj-C anyone? Seems to meet all their requirements - OO, exception-handling, memory management via either ref counting or GC, and it's a C-superset so C-based optimizations would be easy. Maybe the run-time is to large and/or complicated? Although how it could be more complicated than a C++ runtime I don't know. Would Apple being the primary driving force behind it's development be a problem?


Sad. They didn't do their homework. They basically had the exact same requirements of the commercial video game industry which uses C/C++ and Lua to solve this problem. Lua had already long asserted its dominance by 2006 so it wasn't a secret. And they should have done a lot better than only 17% of the code in Python.


based on the given criteria, wondering why Java wasn't considered.


Probably because of memory usage and garbage collection and the fact that java needs to be installed on most systems whereas C++ needs no dependencies and python comes preinstalled on the usual linux server (probably not python3 but i assume it will be some years before bind10 sees adoption for such critical infrastructure). Also he mentions specialized data structures and memory management.

Atleast i wouldn't sleep well if i know that the heart of the internet was running Java :P


Probably because of memory usage and garbage collection and the fact that java needs to be installed on most systems whereas C++ needs no dependencies

Well, GCC (as in the compiler collection) has had a ahead-of-time compiler for ages:

http://gcc.gnu.org/java/

There is also work in the LLVM camp on AOT Java compilation:

http://vmkit.llvm.org/

Atleast i wouldn't sleep well if i know that the heart of the internet was running Java :P

Well lucky you, only many other vital body parts are running on the JVM via Java, JRuby, and increasingly Scala ;).


Those implementations have quite weak garbage collection implementations (boehm conservative, but I'm not sure), which would just kill the performance. Hotspot JVM has a very sofisticated Incremental Generational garbage collector, which does have a _very_ good performance. I'm pretty sure they had their reasons for not using Java (I actually do have mine too), but, garbage collection is not one of those.


Indeed. I was arguing against the 'you need the JVM as a dependency'. But then Go is all the hype these days, and people also use Go for network applications, which also has a weak GC ;).


Go does generate a _LOT_ less garbage then Java, you can control the layout of your structures. That why its gc has less impact on performance then the JVM´s one.


There are many more commercial AOT solutions for Java

- IBM J9

- Aonix PERC

- Oracle Squawk VM

- Oracle Embedded Java

- Excelsior JET

- Avian

- RoboVM

- ...


What parts are running on the JVM that are as vital as BIND?


Which ones? I'd say DNS is probably the most important parts. I also don't believe that core routing equipment is running in the JVM.

Apparently you are talking about the "Web", i was talking about the internet or networks in general.. I'm aware that tomcat may have a very large installation base. There is no use for your tomcat if the DNS is down, though :P



Very cool, but not really related to core internet functionality.

The truth is, Java is viewed as "not Unix friendly" by many people, whether deserved or not. I suspect the ISC folks fall squarely into this camp. My personal and thus purely anecdotal experience is near-universal hostility from network admins.

Despite this, there is a fair amount of network management software written in Java. OpenNMS and Cisco Prime Networks (an absolute beast) both spring to mind.


Yeah, cultural issues are very hard to fight and usually require a generation change.

I convinced unless a big OS vendor forces changes for their default system programming language, nothing will change.

This is why I like Microsoft's decision to drop C on Windows.

What I would like is to have a proper systems programming language in the lines of Modula-3/Active Oberon/C#/D/Rust or similar, being used.

Time will tell when this happens, but it will require a few generations of developers for the mentality change to take place.


Neither is a particularly viable option: GCJ is nearly a decade behind mainline Java in terms of feature support. VMKit seems cool but I'm not sure about the maturity of the project.


> wondering why Java wasn't considered and 70% of the internet's backbone not risked to vulnerabilities...


Okay. Maybe Ada is a better choice.


Here's what I take from this: "C++ is by no means an easy language to work with, so the idea is that we will avoid its complexity when possible."


Avoiding its complexity does not mean avoiding it. E.g. you could avoid template metaprogramming due to its complexity whilst keeping to simpler areas of the language.

C++ is a large, multi-paradigm language. It gives you choices and one is free to abuse those choices. But having more options gives you more power to express.

I think with modern C++ style, boost and C++11 its "difficult-to-work-with" reputation is massively overstated. It is possible to write succinct code with good design and get massive performance benefits.


I hope that OpenBSD team will fork it (BIND 9).


I think the plan is to drop it entirely and replace it with nsd.


This doesn't look good for Python:

"Whenever possible, we use Python"

"When necessary, we use C++"

"As of right now, it ends up that about 75% of our code is C++ and 17% is Python (link) since it turns out that a lot of BIND 10 is performance-critical."


Why doesn't it look good for python? As far as I'm aware, Python has never been advertised as a top-performance language, and part of the advantage has always been that you can rewrite performance-critical paths in c.

Adding to this, the fact that BIND is something pretty performance-intensive, I don't think this makes it look bad at all.


Because the stated goal was to use Python "whenever possible" and judging by the python2 bashing at the end of the article, the author is a true believer yet the best he could do was 17% of the total code.

This also clashes with the belief in the Python community that performance is not an issue because you just rewrite those few critical sections in C/C++. How's 75% as one possible definition of "few"?


What stupid comparison is this? Of course there will be more lines of code in C++, but i bet you that 10 lines of python do more then 10 lines of C++.

As another comment puts it: How much more C++ would there be without python?


>Because the stated goal was to use Python "whenever possible" and judging by the python2 bashing at the end of the article, the author is a true believer yet the best he could do was 17% of the total code.

Of the total code of a very specific application, with very specific performance needs.

For others applications, including servers, the ratio could vary widely.


If they're comparing by lines of code, the 17% that's in python may well contain most of the logic.


Bashing might be a bit harsh. Looks like the author is expressing legitimate concern about the proliferation of Python 2 over 3 and, let's face it, there are a lot of differences while 3 "is the future" as it says.

The percentages may or may not be simply a case of C++ verbosity, but without actually browsing the source, we shouldn't jump the gun on exactly what percentage of "heavy lifting" C++ is doing vs. Python.


>Why doesn't it look good for python?

Because it directly contradicts the "only 20% of your code is performance sensitive and the other 80% can be scripting language X" nonsense that scripting language apologists constantly parrot with no evidence. This is a good example of how scripting languages are in fact not well suited to application development, and should instead be used for scripting.


The question isn't how much Python there is relative to C++, the question is how much C++ there would be if there were no Python.

To put it more concisely: "only 20% of your logic is performance sensitive and the other 80% can be encoded in scripting language X leading to a significant reduction of your total code."


It doesn't lead to a significant reduction though, it is hardly any reduction at all. Look at their code, the python could be replaced with C++ at an almost 1:1 ratio.


I would argue that BIND isn't "merely" an application: it's almost kernel-level in what it needs to do and how it needs to respond. At a minimum, it's system-level—but definitely not application-level.


There is nothing almost kernel-level about it. It is an application. You seem to be trying to draw a very arbitrary distinction between applications and applications that you want to consider special for no particular reason.


Or you should use a fast Python interpreter instead of the slow one. The 20/80 (or 10/90 or so) is mostly nonsense in my experience too.


How many lines of C++ would be necessary to replace each Python line? Let's make a table:

Expansion Expanded 1 to 1: 75/ 17 --> 75% 17% 1 to 3: 75/ 51 --> 56% 38% 1 to 5: 75/ 85 --> 45% 51% 1 to 10: 75/170 --> 30% 67%

The LOC doesn't measure how many features are implemented in each language. If a language is more verbose or need more detailed management the same features will appear to be longer.


I can see what you're saying but it's worth contemplating what percentage of the total codebase that Python code would represent if it were replaced by C++ code.


I often convert python code to C++ and you'd be surprised how often the difference in loc is not that much. Certainly no hassle to do so. Not that I'm suggesting that one always should.

Several boost libraries are inspired by python and C++11 features all help to write C++ code that is surprisingly similar to python, with a bit of extra type sepcification. If you think otherwise I'd suggest you're probably thinking of the C++ of the 90s - a more C with classes, and not modern C++.

I often find it useful to develop and prototype in python and convert to C++. Most often I'm doing this because my python simulations can take days and the C++ versions hours. Often I find it is not just the performance critical areas, but it is easy enough just to wholesale convert the lot.


I often convert python code to C++ and you'd be surprised how often the difference in loc is not that much.

I agree, especially with C++11 and (besides Boost) Qt. It's often the header/code separation that makes things a bit tedious, having to keep function and method signatures in-sync. Of course, if you are template-land that is not that much of a problem.


Thanks, this is interesting. I'm thinking of getting back into C++ after being forced to work on (of all things) a C project. It's hard to know where to dig in, since actually working with existing code means not playing with C++11.


I'm working on a project right now where we're replacing a Qt interface written in Python (PyQt) with C++ Qt as it's way too slow (interface is really laggy), and other than things like having to occasionally delete objects manually (often, if you just add a widget to a layout, Qt does the deletion for you), it's pretty much a 1-1 mapping.

We're also replacing core op-graph and geometry processing from Python to C++, and again it's close to 1-1 - there's a bit of overhead in the loops - in our coding style we're caching begin iterators on the line before the loop, but other than that it's very close.

And the speed of the app is so much faster it's not even funny.


>in our coding style we're caching begin iterators on the line before the loop

Can you give an example of what this looks like?


  python:

  for face in mesh.faces():
  	faceCentre = Point()
  	for v in face.vertices():
  		faceCentre.add(mesh.getPoint(v))
  	faceCentre.div(len(face.vertices()))
  
  C++:
  
  std::vector<Face>::const_iterator itFace = mesh.getFaces().begin();
  for (; itFace != mesh.getFaces().end(); ++itFace)
  {
        const Face& face = *itFace;
  	Point faceCentre;
  	std::vector<unsigned int>::const_iterator itVertex = face.vertices.begin();
  	for (; itVertex != face.vertices().end(); ++itVertex)
  	{
  		const unsigned int& pointIndex = *itVertex;
  		faceCentre += mesh.getPoint(pointIndex);
  	}
  	faceCentre /= (float)face.vertices().size();
  }
So the C++ is longer, but you've got braces, and the references to the Face and unsigned int pointIndex are placed as a local variables, which makes debugging much easier - they could be inlined.

It's possible to get that down even more using more modern C++ - using auto variables and not declaring the start iterator on it's own line.

So yes, counting braces, it can be a lot more lines, but if you don't count braces, it's generally not that much more.


I guess what I'm asking is, why are you declaring the start iterator on its own line? What benefit does it have? Or does is just make the for loop line shorter?

Also, as of C++11 you can do this:

  for(const Face& face : mesh.getFaces()) {
  	Point faceCentre;
  	for(size_t pointIndex : face.vertices())
  		faceCentre += mesh.getPoint(pointIndex);
  	faceCentre /= static_cast<float>(face.vertices().size());
  }


Yeah, it makes the for line shorter.

We sometimes hoist the end iterator as well, as often g++ can't optimise out the call to .end() each iteration - it can if there's a ref to a const item and you call end() on that const ref, but otherwise, it generally doesn't as it can't guarantee the item hasn't been modified.

We're still stuck with CentOS 5.4, so g++ 4.1 for us as that's what we've got to deploy to (although we use ICC for production builds, building off the g++ 4.1 standard headers)...

Basically, we want top possible speed - if that means the code's a bit more verbose than it can be, so be it.


That's 163 vs 486 characters, so, as I expected you'd need about three times the amount of code to do the same thing in C++.

And just look at the difference in readability/simplicity. I can explain the Python code to my 12 year-old cousin, the C++ version though...


Point taken, but if by this metric the percentage replaced by C++ code is still bigger my original criticism still stands.


It could mean that you need more C++ to do stuff (or less Python code to do stuff, whatever your like).

What I'm saying is that "amount of code" is not a valid magnitude when comparing different languages.

In my experience converting some unmaintained Perl tools to Python, the Python version was almost as long as the Perl one, but it included more functionality, comments and error handling (I'm not criticising Perl here but the unmaintained code that I had to convert).


Generally you start with Python and 1) get the code correct (hardest part) and then 2) port the performance critical parts to C/C++. So even they ended up with 0% in Python, it wouldn't "look bad". You got the benefit in part 1.

(And as others have said, 17% of LOC in Python might still mean majority of the functionality in Python).


I don't think you are interpreting that correctly. Python is never pushed as "fast". That is exactly how I would use it. Python wherever I could and C++ (or whatever) when performance is crucial.


That is (the official/BDFL) Guido's stance on the matter too - whenever possible, use Python, but switch to C for computationally expensive tasks.

(https://plus.google.com/115212051037621986145/posts/HajXHPGN...)


Why isn't it written in Node.js?


After careful consideration, as the article mentions, ISC decided a combination of C++ and Python as being more appropriate to the goals of the project. That being safety, stability, established familiarity, speed and of course a level of guaranteed future-proof platform availability.

Read: "This is one of the cornerstones of the internet that we didn't want to piss away on novelty".

That sounded rude, and I'm sorry for that, but I don't know how else to make that point cogent.


I think it was a joke.


Because the performance would be so totally awesome that it would melt the servers. And we don't want the internet to melt.


Poe's law?


obvious troll is obvious




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: