Hacker News new | past | comments | ask | show | jobs | submit login

The syntax highlighter isn't really a parser, per se. It's just a lexer / tokenizer / scanner / what-have-you, which tend to be fairly compact state machines.

As an aside, I agree with your edit. As someone who started programming back in the DOS days, whenever I do Web development, I'm always floored by how much it feels like one step forward and two steps back. So many simple things just aren't simple when it comes to the browser, and even certain protocols on the backend like FastCGI seem way more complicated than they need to be (SCGI seems pretty nice, though I feel it's missing a way to signal out-of-band information to the Web server, the way you'd use stderr in CGI, for example (though perhaps I'm missing something there)).

As much as I love the idea of sending semantic markup over the wire for document transfer, I can't help but wonder if a more terminal-like protocol would be better for application "delivery", the way we used to do with telnet and BBSs in the 90s, with an upgrade for multimedia. But I digress...




FastCGI/SCGI were obsoleted by having app servers simply speak HTTP; the web gateway just needs to function as a simple reverse proxy. HTTP is about as simple as it gets to parse - you can write a passable (not quite production quality, but works) parser in about 10 minutes.

A lot of the difficulty in webapps is because every webapp is inherently a distributed system, which are always hard. Single-page apps with no connection to a server are actually quite simple, but they're also about as commercially viable as DOS programming (i.e. not at all).


A while back I'm sure I saw a HTTP state diagram posted here that showed writing a HTTP parser is anything but simple. I guess if you only have to be a (reverse) proxy you might get away with it.



That's the state diagram for processing the full complement of HTTP requests. Nothing to do with parsing. The parsing bit is trivial.


While you're correct that it's a state diagram processing HTTP semantics and not parsing; parsing a text-based protocol is far from trivial. In fact, the HTTP2 FAQ explicitly mentions [1] that reducing parsing complexity was a motivation for going binary with HTTP2.

[1] https://http2.github.io/faq/#why-is-http2-binary


I've done perfectly adequate HTTP parsing with this Python 4-liner:

  headerText, body = text.split('\r\n\r\n', 1)
  headerLines = headerText.split('\r\n')
  method, path, protocol = headerLines[1].split(' ')
  headers = dict(line.split(':').map(str.strip) for line in lines[1:])
For production use you'd probably want something a bit faster & more robust like Mongrel's HTTP parser (itself only 166 lines of Ragel), which powers several million websites out there:

https://github.com/mongrel/mongrel/blob/master/ext/http11/ht...


That's a perfectly adequate 4-line HTTP/1.0 parser :) But for HTTP/1.1, which must support chunked transfer coding [1], this won't work.

[1] https://tools.ietf.org/html/rfc7230#section-4.1


I'm going to convert this into a poster and get this printed


Maybe you were thinking of and-httpd's rant about HTTP/1.1:

http://www.and.org/texts/server-http


Speaking HTTP is worse in almost every way to using CGI: harder to implement, loses meta information (suppose you put an application at /app/ on your server... CGI handles it well, HTTP doesn't unless you do some extension since your server will see GET / anyway) and is harder to centrally log too.

I think app servers speak HTTP more because it is kinda convenient for developers to run the local server without setting up the http server.. convenience rather than superiority.


Nah, it's not worse: as nostrademons mentioned, you have your gateway server be a reverse proxy server, e.g. HAProxy, and then you can do whatever you want in there. Serve an application under /app/? No problem. Centrally log? Of course. Harder to implement? Not really, every major language has well-supported HTTP libs. And since you only need to know how HTTP works, rather than needing to understand HTTP and CGI, it's conceptually simpler too, assuming that you're running more than one backend server and thus needed a load balancer anyway.


It is interesting to note that most those HTTP libraries end up looking like CGI to the programmer anyway, even sometimes using X-Whatever headers for additional information, because that's the relevant information to an app server.


I started programming in the same era, and feel the same way about "web apps" being a step backwards. The problem is we're building apps on top of a platform originally intended as a document viewer, with kludge upon kludge piled on...


If only we can take out HTTP and only use JSON (or something even simpler) end-to-end.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: