Hacker News new | past | comments | ask | show | jobs | submit login
Ask YC: if Haskell is the hammer, what should be the nail?
18 points by dhbradshaw on Sept 16, 2008 | hide | past | favorite | 28 comments
Hi, guys. I've been thinking of learning Haskell, and so have been casting about for a good project to start in it.

I found a top ten list of Haskell projects (http://haskell-news.blogspot.com/2008/01/top-10-most-popular-haskell-programs.html) and it looks like things are sparser than I might have hoped.

Anyway, if one wanted to break into that top-ten list by using using Haskell's strengths, what are some kinds of projects that one would take on?

Also, if anyone would like to join me, that would only add to the fun.




I've just started watching the Simon Peyton Jones videos on the subject (from OSCon last year), and I've been kinda thinking I might tackle some systems monitoring problems (I usually use Perl for that sort of thing...and most every other sort of thing, lately). I, too, have been struck a bit by how low-level most of the libraries and such are, at this point (then again, compared to CPAN almost every language has a dearth of high level libraries). And I'm still far too much of an amateur at the language to do much about it. But it is an interesting language, and probably useful to tinker with even if no useful code ever comes of it.


You may find systems monitoring to be a bit challenging in Haskell if you're accustomed to Perl. By design, the normal POSIX abstractions that are so tightly integrated into Perl are held at arm's length in Haskell, if available at all; something as simple as forking a child process or switching effective UIDs will require far more code in Haskell than in Perl.

That being said, if you want to be able to reason about system configuration states, and write concise deterministic rules that dictate the triggers for and transitions between between such states, Haskell may be worth some additional attention.


I don't think that's necessarily true. Take a look at all the System.Posix modules: http://haskell.org/ghc/docs/latest/html/libraries/unix/Syste...


That's exactly my point, though -- there's a standard module for POSIX integration, but it's not part of the language syntax itself. It's a subtle thing if you don't do systems code often, but it becomes apparent after working with Perl for a while how much of it really is a domain-specific language for UNIX systems operations.

Haskell, as a general-purpose functional language, is always going to require just a few more keystrokes/lines of code to accomplish the equivalent system operations.


Since I don't have any experience with doing system code in Perl, a small example would be nice.


I just talked about this in a JavaScript thread a few days ago as a reason JavaScript isn't quite ready to take the place of Ruby, Perl, and (kinda) Python on the server (though it probably will in the next year or two).

The most obvious example in Perl is its file processing abilities (which are scary at times, and astonishingly beautiful in their conciseness--thus Perl is still the master of one-liners). Like Haskell, JavaScript also has some file-related libraries...but they're also off in the ghetto of a clunky library, as it has been (historically) in SmallTalk and Lisp. The difference between making a function call and using a core language feature can be subtle...but it's definitely a friction point (I hate using Python, despite its many positive aspects, for many system related tasks because regexes are in the ghetto, for example).

So, here's a simple example:

    while (<>) {
        for $chunk (split) { # split on white space
            # do something with words
        }
    }
This is the entirety of code for processing a file word by word (<> is a magic filehandle that slurps in files on the command line, and not really recommended in code used by untrusted folks...so in real world software there would be a bit more boiler plate, but not much...but it's a good example). Add in that regexes are first class citizens and a file handle in Perl can be a pipe or a network socket or stdin/stdout, and you have an exceedingly low friction environment for building system-related tools.

That's not to say that Haskell can't overcome the fact that systems-level stuff isn't in the core language (most Perl additions in the past several years have also been in the form of libraries rather than more keywords and new syntax--at some point it makes sense to put things into libs rather than making the language bigger). I don't know enough about Haskell to say.

But, I can say that I've been quite intimidated by the amount of code I need to write to do things that are one-liners or a handful of lines in Perl. I'm sure some of this is my lack of knowledge, and some of it is the lack of CPAN (in ten years, if Haskell is extremely lucky and extremely successful, it'll have a selection of libraries on par with CPAN of today). But I'm having fun tinkering, regardless. Worst case, I'll learn something new.

I had a lot of fun with mjd's Higher Order Perl (which exhibits most of the major functional techniques, like currying, recursion, infinite iterators, memoization, etc. using Perl), so it'll be cool to "go native" for a while, and what better way to learn than by doing tasks I'm already familiar with in a new language...I may get some new perspective on how things can be done in Perl (since it has lots of functional features) and maybe I'll even find some new features that can be added to our products uniquely easily by introducing some Haskell code.

An interesting source of perspective on what makes Perl magical for systems-related tasks would be a perusal of the perlvar manpage. Perl has a bunch of "magic" special variables, many of which are related to how Perl behaves when given files to munch on. Folks find this intimidating, but it's a source of great power, particularly for one-liners and pipes (Perl is very much of the UNIX culture, and Perl fits into a long line of pipes as well as grep or awk or sed). Of course, for many classes of problem you would never use most of those special variables. But, for systems related code, it's hard to beat. (I've tried. I spent a few years in a Python shop, and was constantly amazed by how verbose my code had to be in Python vs. doing the same task in Perl...it was also generally a lot slower. I like Python for lots of stuff, but systems tools aint exactly its strong suit. I may find the same is true of Haskell.)


Take a look at Don's reimplementation of some classic Unix tools as Haskell oneliners: http://www.cse.unsw.edu.au/~dons/data/Basics.html.

The mere fact that interact lets you make a pure function into a Unix-style string -> string utility should show that you are exaggerating a bit.

  interact (unwords . something . words)
handles basically your whole simple example.

Another simple example: capitalize every word of the input:

  main = interact (unwords . (map (\(x:xs) -> Char.toUpper x : xs )) . words)


show that you are exaggerating a bit

No, it merely shows that I don't know Haskell--I didn't claim Haskell couldn't be as concise as Perl for this kind of thing...just that I don't know how to make it as concise. The link helps very much, thanks. (And, just so no one is misled, the same example could be done as a Perl one-liner, as well, using the magic variables I mentioned in the prior comment. I just figured I'd make it readable, since there are some anti-Perl bigots around just waiting for the chance to say, "I knew it! It's nothing but line noise!".)

Also worth noting...I love that I can write code like the example you've shown in Perl (with some minor syntactic differences, but it does have first class functions and expressions as arguments), and the fact that Haskell seems to be entirely made up of code like that made it seem very appealing on first glance. That slurping a file can be written in Haskell in one line and some lib imports makes me...umm, I'm embarrassed to say...a little giddy. And it makes me wonder why I didn't start using Haskell sooner.


Right now, the area Haskell absolutely thrashes everything else is speedy and lightweight concurrency (it even stomps Erlang in the Debian Shootout). So if you can find a project that needs fluid responsiveness, multi-connection or or multi-CPU scaling, Haskell is the ideal tool.


Okay, here's what comes to mind based on those criteria: an operating systems, databases, large numerical simulations, intensive graphical manipulations, mind-modeling. Do these sound right?


It's not really much use for OSes unless you're a research project, because the compilers target an OS and not bare metal.

Also, it's slower than C on actual processing (about the speed of Java). So numerical stuff is out, unless the benefits of being parallel outweigh the benefits of being fast. (This may improve as the back-end optimization project shows results.)

I think the perfect target would be any sort of massively multi-user network server.


Personally, I think there are some very interesting possibilities for using Haskell in the highly-secure web application space. The most trivial example would be simply using the type checker to protect against SQL injection and cross-site scripting attacks by representing user input with a different type than query parameters or HTML output.

I've also thought for a while that Haskell would be a great environment in which to implement a static analysis tool to check the security of existing application code. In particular, I think that the PHP community could really use an information-flow and type-checking tool external to the core runtime which could be used to run a quick "sanity check" over source code. The simplicity of the language (relative to, say, Ruby or Perl) makes it a prime target for parsing and analysis, and the large body of existing code in the wild makes for an interesting set of test cases.


There's this meme about type checking defeating SQL Injection that I don't really understand.

There are basically three situations I see injection problems recurring in modern web code:

* The stupid cases where people are interpolating input strings directly into query strings, so that a query for "O'Neill" will accidentally break your SQL.

* The not-so-stupid cases where column sorts and query builders pass limits, sort orders, and groupings directly from user inputs.

* The cases where stored procedures resort to dynamic SQL.

The first problem is solved not by better type checking, but by switching to parameterized queries, where the query string is parsed prior to argument binding.

The second problem is solved by not passing SQL literals in and out of input.

The third problem isn't even happening in the application's programming language.

Which of these problems is handled well by application type checking?

As for XSS attacks, I'm again skeptical. If the problem was as easy as type checking, you'd solve it trivially by output filtering everything from the database, neutralizing HTML metacharacters. It's not that easy: there are lots of times when you really do need to honor HTML in input.


The main problem is that both SQL and HTML are often simply represented by strings. The programmer has to keep track herself to make sure everything is escaped and unescaped at the right moment.

The point is that you can use Haskell's type system to get guarantees about and keep track of escaping. It's really light-weight to add new types. Also, dynamic typing would probably mess this up.

Still, all this mainly comes down at the shoulders of the library designer. But, looking at some of the available libraries (such as Text.XHtml) this works really well.

In the case of Text.XHtml the easy/default case when using a string in HTML is that it's escaped. When you want to 'parse' a string to HTML you'll have to be explicit. That makes it really hard to 'accidentily' forget to escape HTML.

The way I look at Haskell's type system is that it's a great tool for easily enabling 'safe' programming. It won't work automatically, but it gives you the opportunity to let the type checker take care of guaranteeing that everything will work as expected ;)


Yeah, I think this is a bit naive. Of the three SQLi cases I mentioned, only the first is due to the app language's handling of string input and query strings, and that case is just as easily handled by parameterized queries.

The second case is not due to the fact that the same type is used for input and query strings; when a web app passes DESC or ASC or LIMIT 100 in via POST arguments, that's a design problem type systems don't solve.

Likewise, type systems might fix the simplest XSS problems, but the nasty ones occur in code that is explicitly trying to handle input that has been laundered through the database and must include HTML characters.


I still don't see why "type systems don't solve" the problem of keeping data domains separate. If user input is of a different type than SQL query components, you simply can't allow GET or POST arguments to hit the database un-sanitized. Yes, you can perform the check by hand (i.e., fix the "design problem"), but as we've all seen, programmers don't do that consistently, which leaves us patching the same class of vulnerability time and time again.

Type systems also don't have to be the algebraic types of Haskell; SELinux DTE and FlowCaml/Jif information flow analysis both fit loosely under the umbrella of "type checking," and yet allow for very fine-grained and interesting security properties of complex, real-world systems to be asserted and enforced.


You're right that #1 is indeed just a matter of using the right API. (I actually blame the poor state of many database client libraries and tutorial docs for the proliferation of interpolated query params, but the point remains that it's basically a mechanical fix.)

For #2, the problem is a bit trickier, but again, it just requires attention on the part of the developer.

Given the limited use of stored procedures by most LAMP developers, I think that #3 may be the least critical, even if it is the most difficult to tackle with a simple code analysis tool.

As you suggest, none of these issues are particularly hard to avoid via careful programming. The problem is that programmers often aren't careful, and when working in languages which offer "one-size-fits-all" string, array, and file types, they tend to just shove everything into and out of those containers without thinking about type or flow checking.

Type checking isn't a panacea, but it can flag issues of either type 1 or 2 early, which gives developers a chance to catch themselves before they commit (or even worse, deploy) a potential security hole.

Part of my motivation for this comes from reading a lot of unit tests and specs for webapp code. If you step back and think a bit, it becomes obvious that a lot of the test harness is simply devoted to type checking, and that the verbosity (and easy satisfaction of coverage metrics such tests provide) often mask otherwise glaring gaps in the actual test coverage.


I guess I'm asking, how does Haskell reduce the amount of effort required to write secure SQL?

* It helps in the first case, but then, so do parameterized queries, which developers already have and are familiar with.

* It doesn't help --- at least not elegantly --- in the second case.

* It can't help in the third case, because that error is happening in the database itself. (By the way, not sure why injections in dynamic SQL in stored procedures would be least critical).

I get you loud and clear that "even though it's easy to avoid these problems in PHP, developers don't". You're right. "Use good code" is not a solution for the problem of "bad code". But is Haskell?

I like the idea of static analysis in Haskell.


Where a static analysis tool can help is in flagging those cases where a developer might have made a bad assumption, and forcing them to reconsider that code point. Basically, it's a way to enforce some basic code review practices without requiring one of your senior developers to be able + willing to read over your junior devs' code.

The whole point of programming is to automate those tasks that can be automated, right? So, let's automate the first-pass stage of a basic security audit as much as we can.

To your specific points:

* Parameterized queries are available in some cases, as you said, but don't work when you have to build some portion of the query (sort order, grouping, etc.) at runtime. Static tools can help flag such dynamic values as being tainted or clean, as well as help insure that resulting queries will be well-formed.

* All that's necessary to extend your type checking to the database is to enforce the same level of discipline you do for built-in API functions. Just as static analysis tools need a signature for primitive functions provided by the runtime platform (syscalls for C, primitive library functions for PHP, etc.), your static checker could ensure that queries which used stored procedures checked at least the asserted type signature of those procedures. (SQL is pretty type-savvy, and most stored proc authors should be able to trivially provide type sigs for their code.)

The advantage to using Haskell for all of this is twofold:

* There are excellent existing parsing and graph algorithm libraries maintained by the Haskell community

* Working in a strict, pure functional language makes you think in terms of mathematical theories, rather than stateful effects, which in turn tends to result in better, more deterministic code

That being said, I think you could use OCaml, Lisp, or any number of other high-level languages to similar effect. The uniquely math-oriented world view of Haskell (and its implementers and users) does tend to lend itself to just the kind of defensible reasoning that you would want from a security-focused static analysis tool, though.


Well, just so we're clear: I buy the idea of using Haskell or OCaml for static code analysis (in fact, one of the better known static analyzer tools, which operates on binary control flow graphs, is written in OCaml).

What I don't buy is the idea that the Haskell runtime actually improves SQL or HTML security. By all means, write a Haskell source analyzer. It will help. But a web app stack written in Haskell will presumably have the same problems (or lack thereof) that Rails does.

Mathematical reasoning, for what it's worth, doesn't have a great track record in systems security. ;)


I'm not sure how you can assert that math hasn't informed real security. Even disregarding crypto (which is pretty much the purest math you'll ever see in CS), the bulk of classic security design is based around systems like BLP and Biba, which are not only very strongly rooted in algebraic theory, but rely on formal proofs for assurance of their correctness.

At the end of the day, Haskell is just another programming language, with all the equivalent strength and limitations that provides. However, I do think that having a cultural expectation of strong type checking and reasoning about systems can only help security.

(To be fair, I should disclose that I do most of my day-to-day coding in Ruby. However, I'm trying to impose the discipline of at least mocking out security-sensitive systems in Haskell before implementing them in Ruby to force me to make my assumptions about safety and information flow explicit.)


I didn't say math hasn't informed real security; I said mathematical reasoning hasn't had a great track record in systems security. What doesn't seem to work is proving a "kernel" and then building application code on top of it; you have to analyze everything, formally, which is too expensive. So in the real world, a B-level secure Solaris kernel like Argus Systems sold still falls to code execution vulnerabilities in its after-market web server.

I'm not repudiating math. I'm just saying, systems security is messy.


You're right, of course, that systems security encompasses more than a formally-verified system can encompass. I say this having worked for a group that pitched the DoD and private vendors on "provably correct" systems that relied on commodity operating systems and compilers, so I really do get it, and know just how horribly limited formal methods are in the real world.

My point is, as I originally suggested, that any static analysis would likely be an improvement over what currently happens in the LAMP world. Most compiled languages have at least some tools available which developers can use to perform basic sanity checks -- the compiler itself being of course the first and most oft-overlooked of these. Dynamic language users lack even that basic safety net, which is the reason for my primary argument about the benefits of type safety and static analysis.

Basically, we're in furious agreement, and if you had caught me at a different time of day or mood, I could just as easily be standing on your side of the fence, arguing against someone who was trying to tell me that theory could solve all my security problems.


For protection against sql injection, I'd recommend taking a look at Meredith Patterson's Libdejector http://sourceforge.net/project/showfiles.php?group_id=145075

A partial presentation on it is here: http://www.blackhat.com/presentations/bh-usa-05/bh-us-05-han...

It works by having the user offer an exemplar of a dynamic query and then verifying that the actual query is limited to the syntactic forms seen in the exemplar.


This is the kind of thing you deploy when you didn't write the app. You can implement it easily for Rails apps by proxying MySQL network connections and parsing the queries. If you can write a lexer, you can catch 99% of attacks just by counting terminals.

It's not the right idea, at all, for development teams. The reason is, parameterized queries solve this problem. If you're parsing queries before you bind parameters, you're not injectable.

Audit your code to ensure you're using stored procedures. Audit your inputs to make sure they aren't passing things like "DESC" and "ASC" when they should be passing "down" and "up", which you can't fuck up. Then stop thinking about that problem.

Move on to a more important problem, which everyone ignores: locking down your database connection. Why does the (one) database connection in WordPress have rights to insert into "wp_users"? A much better area to spend your time in.


For MySQL you can use the excellent http://forge.mysql.com/wiki/MySQL_Proxy in just this way.

However didn't the Remote Agent bug story posted here recently reveal the weakness in relying on auditing? A bug identified by a formal verifier was accidentally re-introduced later in development after they "stopped thinking about the problem".


I agree 100% with the assertion that modern webapps should divide privileges across multiple database connections/security principals. A large portion of SQL injection and session-hijacking issues could be rendered useless overnight if the basic tenet of "minimal privilege" were taken to heart by LAMP coders.


what isn't haskell for?

i've been throwing it at everything from fastcgi programs to scrapers to scripting to systems stuff

maybe the one place it isn't well suited is in the realm of throw-away quickies. haskell code takes longer to write. that doesn't mean you are writing a lot of code. i still see a role for perl/python in throaway scripts

haskell has practically every meaningful cool concept in CS built in. it is the functional language (for now). it is very fast and as the number of cores rises, haskell's performance will start to leave traditional tools in the dust. with the new GC it will destroy erlang on its own turf.

haskell takes time to learn and time to code. if you are in a rush, it is not for you. otherwise, haskell can handle almost any problem

in any case, you can come to haskell or it can come to you. the next ten years will see functional concepts get weaved into every language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: