Hacker News new | past | comments | ask | show | jobs | submit login
Raphters, a web framework for C (github.com/danielwaterworth)
145 points by DanWaterworth on April 4, 2011 | hide | past | favorite | 64 comments



Interesting, but very minimal. It's basically a wrapper around FCGI (meaning it's doing the old-school "pull request data out of environment variables" thing) with a linked list of regexes matching handlers. That's not nothing; "registered list of regexes matching handlers" is a proven good model for web frameworks and it's handy to have.

On the other hand, if I was going to release a web framework, I might do more to abstract request variables (provide a params hash like Rails, or a Rack-style request hash). Also: I'd probably build it on libevent's evhttp instead of demanding that users plug my C code into FCGI.


A little off-topic, but you sound familiar with libevent.

I've never used either but I'm looking at gevent for a project, and I noticed that they switched to libev: http://software.schmorp.de/pkg/libev.html ... do you know what "limitations and bugs" in libevent they speak of? I poked through the mailing list for libev and couldn't find anything specific.


I've never paid much attention to libev. I've been using libevent since 2002 (it was the standard event loop at Arbor Networks, which was in Ann Arbor, where Niels Provos lived at the time). Before libevent, I used ACE_reactor to accomplish the same tasks. I prefer libevent (strongly) to ACE. I might prefer libev to libevent, but... why bother? Libevent works fine.

My guess about "limitations" of libevent is that it's primarily that libevent is anchored to global state and needs to own the event loop for the program. So, for instance, to get it working in Cocoa apps, I have to spawn off a libevent thread. And I can only have one of them.

But this isn't really a big deal for me; once I got the libevent thread working, everything was peachy for me.

In no other context do I ever, ever have any need to do anything funky with how I include libevent; 90% of programs that use libevent are designed from the outset to use libevent and don't benefit from flexibility about the event library.


I have minimal experience with libevent and much more with libev, but I do think that libev is very, very well-engineered. The author is on steroids, you can really notice it just by reading the documentation. The documentation and the code indicate that he knows what he's doing and is very serious about performance. There are also no obvious design flaws in libev that would result in limitations. Want fork? No problem, but be aware of clearly documented issues A B and C. Want threads? No problems, it even documents how, as well as potential issues. Want signals? No problem, and again potential issues are clearly documented.


Niels (of libevent) is also pretty ridiculously talented; it's also hard to dispute that libevent has had more testing.

Libevent vs. libev really seems like a Linux vs. BSD kind of debate. I'd always choose libevent over libev, but you can apparently get a performance improvement (which is probably going to be marginal compared to other simple things you can do to speed up an evented program) by going with libev. And it looks like if you're doing clientside dev, like building a new browser or file transfer client, that libev is easier to embed.


There's also picoev to look into if you're interested in seeking alternatives to libevent. I don't know a huge amount about libevent and it's a bit niche, though. The meinheld WSGI server uses it, and I got nearly double requests/second compared to gevent (libevent). Not that that means much, but it's interesting.


Something like evhttp also has the benefit that you can have more than one request "in flight". FastCGI can theoretically do this, but almost no implementation supports it - and it makes e.g. writing a simple chat server much easier (one process, one thread, tons of connections, data in memory.)


The downside to doing evhttp is that he'll have to write his own request parsing (I think he's relying on Apache/CGI to do that for him now).

But if you're going to do a C web framework (and... really? you sure about that?), that's probably how you should do it.


> The downside to doing evhttp is that he'll have to write his own request parsing...

Which, these days, isn't a huge deal. Both of these are decent:

https://github.com/ry/http-parser

https://github.com/mongrel/mongrel/blob/master/ext/http11/ht...


Request parsing is no big deal, I've already written a json parser/generator for this project (though it's in a separate project atm).


Nothing is set in stone. I do admit that a libevent based asynchronous API is tempting.


This is neat.

Honestly, in addition, I'd like to see something like Rack for C. The "gateway interfaces" for C are too implementation-specific (CGI, FastCGI, SCGI, web server extensions/modules, etc). It would be something more abstract that would run on-top of a web server interface.

     void application_main(web_request *request, web_response *response)
     {
          char body[1024];

          snprintf(body, sizeof(body), "Hello, %s", request->param("username"));

          response->status(200);
          response->header("Content-Type", "text/html");
          response->write(body);
          response->end();
     }


As an illustration of why C kind of sucks for web apps, that code is obviously insecure† (it's reflected XSS). To get around that while preserving natural syntax, you want:

  char *hsafe(const char *input);
But where does hsafe get the memory for the string from? It can't use input (the filtered result is larger than the input). Does it malloc? Now you have to free the result. Does it do the inet_ntoa() thing with the static variable? Now you can't chain it (or use threads, but you wouldn't want to do that anyways).

Maybe you can do an arena for each connection, so it's:

  char *hsafe(request_t *r, char *str)
But that's still sort of painful.

In an admittedly artificial way.


I think part (far from all!) of the issue is that you (and parent) are not using the right abstraction: The response body shouldn't be a string; it should be a stream:

   write_quot(response,"this < should be safe >");
   
Not perfect (you still need to deal with allocating temporaries if you want to inspect the contents before sending to the client), but: it (a) matches what's actually going on under the hood, (b) makes the simple cases safe, (c) provides a decent interface for safely extending the available formatters. (They would write their output to the stream, then free all temporary resources themselves before returning.)

(Further, I've had enough influence from statically-typed-land that I'd personally want to create tainted wrapper structs so that the compiler helps prevent user data from being passed to an unquoted write... but that's just me.)


I'm not seeing this a problem with C, but perhaps I misunderstand. Your goal is avoid including anything executable in the page. The filtered result is only larger than the input if you need the ability to faithfully quote the potentially malicious code. In this particular contrived example, you could "just" strip it out and make it shorter, or even error out if you see anything suspicious.

Which is to say it's only a problem if you are doing it by hand on a per field basis. Which leads to your next point:

  >  Maybe you can do an arena for each connection, so it's:
  >  char *hsafe(request_t *r, char *str)
  >  But that's still sort of painful.
This is only painful if you are doing it by hand every time you use the parameter. If the framework handles it for you, so that you always get the sanitized result, it strikes me as no different than any other language.

r->param("input"); // presanitized, lazily created, pooled in r

r->param_raw("input"); // if you want to live dangerously

Why is this worse in straight C than in something like Python that is doing the same thing but with an interpreted layer between you and the C?


You need to convert < into &lt;. That's the price of entry. You can't redefine the problem to "web framework that simply strips < out of inputs". Your framework would then be immediately inferior to every other framework which does output quoting.


Yes, if it was done in the framework, you'd want to have the sanitizing and allocation happen seamlessly, and you'd want the memory management to be simple as well. My confusion was why this strikes you as a significant difficulty for a framework author to set up.

Certainly it's harder if you decide that you are going to work from the ground up in straight ANSI C, but there's lots of good pool memory allocators out there. I'd either use one I had laying around, or just link in the one from the Apache Portable Runtime: http://apr.apache.org/docs/apr/1.4/group__apr__pools.html

The framework author could easily hide this behind the scenes, so that the user would find the string creation and destruction just as seamless as in Perl or Python. Use it and forget it, and the pool would be freed along with the Request. Thus my question, and my confusion, was why you finished with "But that's still sort of painful."


Hm good point. Built-in C string handling is much too painful for me to even consider doing a web application in C, and you have to worry about buffer overflows and memory leaks all the time. I'm so happy I don't have to do that in Python.

Then again, if you have a sane string handling library in C instead of messing around with strcpy/strcat, malloc/free and fixed-sized buffers, I guess it could be made more convenient.

so that the user would find the string creation and destruction just as seamless as in Perl or Python

If you manage to do that, please send me the link :)


You could just have a linked list of memory allocations that get freed at the end of the request.


If you're going for clean syntax, you could store an arena pointer in a global variable/thread local storage (depending on the programming model). Just make sure the scaffolding switches/creates/destroys the arenas. I think you'd still want a hierarchical allocator, though.


> The "gateway interfaces" for C are too implementation-specific (CGI, FastCGI, SCGI, web server extensions/modules, etc). It would be something more abstract that would run on-top of a web server interface.

Yes. My preference in web apps (esp in C) is also to deal directly with http requests and responses. Although I wouldn't necessarily call that the "more abstract" approach. To an extent, each of these gateway interfaces is a different leaky abstraction for http.


Rather:

    response_set_status(200);
    response_set_header("Content-Type", "text/html");
    response_write(body);
    response_end();
The awkwardness of which, to me, demonstrates the superiority of C++ in this regard.


That implies some global state somewhere being updated, and now you've given up on threading. This may be OK, but it's still dangerous and leaky even if you give up on threading.

There are various solutions that still allow you to pass a set of closures without being particularly more ugly than anything else is in C, for instance, see http://library.gnome.org/devel/gobject/stable/chapter-signal... . C++ still ends up nicer, my point is just that you can do good things in C and there are libraries that implement the stuff for you.

That said, if I were really going this route I probably would try to go with some sort of minimal C++. Then again, my opinion is suspect, as I would never go this route anyhow. Web developers writing in languages that don't permit buffer overflows write software that has all the security of a colander; adding the ability to write good, solid buffer overflows as well hardly seems like a step up. You can write secure C code, but you can write secure PHP code, too. Existence proofs of secure code aren't very interesting.


You're right, web developers already write software with all the security of a colander. You can go one of two ways, 1.) you can devise a language that can't express an insecure application or 2.) you can teach people about security. I know which option I prefer.


Overly reductive. There is an actual security benefit to using environments that minimize memory corruption. Educating devs helps, but educated devs do not reliably avoid memory corruption.


I assume you mean #2? Because here we are in year, oh, 17 or 18 or so of web development give or take a couple of years and that's worked out so well, hasn't it?

It's a beautiful sentiment. It hasn't worked. Handing them new vectors to screw up in is not going to solve the problem.

It also doesn't matter. Hand a developer a tool that requires them to continuously pay attention to an issue and no matter how good they are, at some point they will slip, because they're human. We've tried "teach them about security" a couple hundred times, maybe it's time we try "devise a language that can't trivially express an insecure application". C will not be the language that implements that, it completely lacks any of the necessary primitives, by design.


Web applications vulnerable to shell-code injection... this sounds like a security nightmare.


They don't have to be global state - they can be updating thread local state, for example, in what is essentially dynamic scope. Depending on the OS, the efficiency of such TLS accesses can be quite good - on Linux, for example, it's just an extra pointer indirect or two with a segment selector (assuming the code is statically linked; it's very slightly more expensive in a solib).


Touche. It's still an API design that's going to prove problematic, though.


You're only a couple function pointers away from the syntactic sugar of the -> operator.


Yeah that's what I get for being in the OOP world for the daily grind. I think a C API is still definitely for the best. Perhaps a nice shortcut API for this would be something like:

    response_complete(200, body, response_header("Content-Type", "text/html"));


Well yes it should have been

    response_set_status(handle, 200);
My point was that the syntax in the example was C++.


in correct functional style shouldn't it rather be :

response_set_status(response, 200);

no issues with global state, threading (if correctly mutexed), etc...


All the pieces to do the whole framework are already in POCO and other libraries, with ssl and cookies etc.

http://pocoproject.org/docs/Poco.Net.HTTPResponse.html

http://pocoproject.org/docs/Poco.Net.html

Doing this in C seems silly to me.


You might look here before starting from scratch in C: http://okws.org (C++)


It would be possible to support other gateway interfaces in raphters. The abstractions already exist.


It would be neat to see how you could possibly hook Lua into this to give you some of the flexibility of a dynamic language with the speed and small footprint of a (mostly) C app. I've been thinking of working on something like that for a while but unfortunately my C skills need a lot more work before I'd be comfortable tackling it.


Came across this the other day:

Embed the Power of Lua into NginX https://github.com/chaoslawful/lua-nginx-module


There's also AbsoLUAtion for Lighttpd. Lua's a very nice, convenient language for embedding dynamic/user-configurable bits in otherwise static applications. PowerDNS and nmap use it, too. It seems to have displaced Tcl as the de facto language for embedding.


Luajit2 approaches C on speed [1] - and with the newly released FFI library the C is not really needed that much - so, pretty much all the work has been already done. Just use the results of it :)

[1] http://news.ycombinator.com/item?id=641313


LuaJIT is awesome but it doesn't run on as many architectures as the original Lua VM. The nice thing about a framework in C is that you can install it anywhere it compiles, and if you stick to ANSI C like Lua does then it compiles pretty much anywhere, like Lua itself.


Agree, just x86 and x86_64 - though realistically, is there a lot of usage of other platforms for high-performance servers ?

And if the server is not high-performing, mongrel2 + $something can be a very good choice. (And has a more liberal license).

But of course, the tastes differ - so the more, the merrier. as soon as the folks do not get pwned.


I'm thinking of small devices like wireless routers, phones and maybe even wristwatches. I like the idea of having a full stack framework for web apps, served locally on a tiny device. You could reuse a lot of code that way and at least for phones, potentially avoid some of the cost of having a native app with totally different code from your web app.

But yeah, LuaJIT recently added support PPC and soon will add ARM, so point well taken - to do what I'm thinking of, it may not really be necessary to have anything in C.


Have you looked at Tir?

http://tir.mongrel2.org/home


Yeah, I wrote the test framework that Zed uses for Mongrel2 and Tir's unit tests, and also wrote the WSAPI handler for Mongrel2 (a separate project from Tir).

What I'd really like to see is not so much a web framework in Lua (there are a few, I'm working on another one, and it's really not hard to make one either), but rather one that's mostly C but lets you use Lua, say, for templates or controller functions. That could be pretty interesting for very small devices, or for squeezing lots of performance out of a larger one, all while stilling letting you program the bulk of your code in a higher level language.


Pretty cool, but if I really had to write some kind of C stuff to output web pages I suppose I'd just build it as an apache module rather than go the CGI route, there's lots of useful functions and nifty stuff within apache making it a pretty nice environment for serving stuff.

(Or more probably since it's 2011, I'd try an nginx module (or lightty), although I have never looked at any of those projects code yet).

Building response stuff in C is pretty high maintenance stuff though.


Funnily enough this was going to be my starting point into writing a C webapp. I felt that by writing a C module directly embedded in an existing web server it would allow me to leverage the request handling. Having already done bugfixes for apache C modules are work it felt the most relevant path to take. The problem I see with it though is you are bound to the webserver you write the module for. Portability would be nice.


There was some issues with the code. Mostly memleaks. I think I've fixed most of them, although it was tough to do so elegantly. You may want to rethink using the START_HANDLER macro (or at least restructure it) as it makes it difficult to correctly handle errors with writing a response. I've addressed this issue in RAPHT and offered a fix. However I think I complete rework is of that subsystem is probably better.

I'm at work so I can't setup a server to test, however after summoning my inner compiler my changes seem sane. But you should look them over (check the pull on github https://github.com/DanielWaterworth/Raphters/pull/2).

I'll be around if you want to discuss a proper fix for some of the issues. Albeit it will take a pretty major rewrite.


Can't we just have a minimal webserver for ZeroMQ, and from there all the languages with ZeroMQ bindings?


You mean mongrel2?


Yeah, that could work too.


You mean mongrel2?


I thought at first that this was named for Raph Levien (http://en.wikipedia.org/wiki/Raph_Levien) who wrote Advogato in C, some 10+ years ago. But no mention of that in the README or RAPHT. Too bad :)


Until you mentioned him I never even knew he existed.


A couple of years ago, I also embarked on a mad project to create a web framework in C. I wrote a scandalous and regrettable blog post about it which got on Hacker News and I have since taken down. You can find what remains of the unfinished project here: https://github.com/joelmichael/memereap


And still, be blocked by IO.


This is awesome, reminds me of the days of writing CGI apps in bash.


Very refreshing, a time tunnel straight to 1994.


The title is like saying, 'running shoes for grandma'.


This framework does almost nothing. I don't see the point of it.


The point is that with sufficient pathology you can do high-level things in a low-level language, but not the inverse.


Too bad the point isn't true?

I've write code that looks nearly identical to C in Haskell all the time. (Except, of course, that `alloca` is faster than malloc and is garbage-collected.)


That's not exactly true though. Here's an example:

http://www.arduino.cc/playground/Interfacing/Python


Reinventing the wheel for lulz...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: