Hacker News new | past | comments | ask | show | jobs | submit login
Curl vs Wget (haxx.se)
149 points by helwr on April 5, 2010 | hide | past | favorite | 59 comments



>Wget requires no extra options to simply download a remote URL to a local file, while curl requires -o or -O. However trivial, this fact is often mentioned to me when people explain why they prefer downloading with wget.

Funny because that's exactly the reason I installed macports and then wget. It's stupidly trivial, but there you have it.


Another stupidly trivial reason:

I can type 'wget' and then do command-V entirely with my left hand, keeping my right hand on the mouse. Having to type 'curl' would require me to move my hand back and forth, slowing me down.


I love reasons like this, they make non-geeks look at us like we're insane. I've only used curl personally, so this wouldn't be enough to switch, but it's an excellent point.


Whereas the true geeks look at you like you're crazy for still using crufty old QWERTY. =)


I'm guessing QWERTY? Dvorak and Colemak are not too good for single-hand typing.


Bash fu? alias qwer='curl -o'


stupid unix tricks:

curl http://somesite.com/download/latest.tar.gz | tar zx

(saves having temporary files lying around everywhere)


Wget seems to be better at naming saved files:

`curl -O http://host/my%20file.txt` annoyingly saves to `my%20file.txt` while `wget http://host/my%20file.txt` will save to `my file.txt`.


I don't think it's all that trivial; whatever a program does when run without options should be a big clue towards the primary use case that it is designed to handle or fulfill.

In the case of curl, it writes to stdout; wget writes to a file. While the mechanics of each are the same, curl lends itself towards use in scripts while wget (at least in my experience) is better if you're kicking it off interactively.

Also, wget has a very handy batch mode, where you can feed it a file containing nothing but a list of URLs and it will fetch each one. I am not familiar with a similar one for curl -- although admittedly you could do the same thing with a minimal wrapper script, that's a lot more keystrokes than "wget -i <file>".


I'm not criticizing, but wouldn't a simple alias be easier?

    alias co='curl -O'


True, much simpler. I guess a larger reason would be that 7+ years of using wget has wired me to its commandline. I'm so used to typing `wget -c "<middle-click>"` or something on the command line, that I just never considered creating aliases for alternatives.


'co' is already taken though :-)


Alright,

alias wget='curl -O'


With what? I've got no such alias here.


co and ci are the old RCS commands, and they're why you can do "cvs co" and "cvs ci" with CVS.


Maybe he aliases that to a vcs checkout command, like 'svn co'?


OS X probably ships with FreeBSD's fetch(1), which should do for most <command> <url> needs.


  mini[~ 111] uname -a
  Darwin mini.local 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386
  mini[~ 112] which fetch
  fetch: Command not found.
Unfortunately, it does not. I use fetch all of the time on FreeBSD, so it is a little obnoxious to have to go and install wget from MacPorts.


My main complaint about curl is that it prints a bunch of crap (well, statistics technically) to stderr by default. Especially for something claimed to be (at least in this article) "traditional unix style", that seems undesirable.

It's annoying when I do 'curl $URL | less' and end up with garbage in my pager that I have to scroll around a bit to make disappear.

It seems like default behavior should be more like 'curl -s -S' -- silent, but print errors if they occur, just like...well, every other sensible command-line tool in the world (so that's what I have curl aliased to).

wget of course prints statistics too, but it's not such "pipe-oriented" tool, as the author points out, so this isn't as big of a deal.


My version of curl only prints the progress junk to stderr if you pipe stdout somewhere. I guess I've never had a problem since I don't use a pager usually.

BTW, how do you detect whether your program's stdout is being piped to something other than the shell?


> My version of curl only prints the progress junk to stderr if you pipe stdout somewhere.

Hmm, good point -- I actually hadn't noticed that, and it seems to be the case on mine as well. (Though I rarely use it without piping or stdout-redirecting to something, so it still strikes me as inappropriate.)

> BTW, how do you detect whether your program's stdout is being piped to something other than the shell?

In the three languages I've used recently enough to remember:

C: int isatty(int fd); /* from unistd.h */

Python: os.isatty

Shell (bourne): [ -t 1 ] # exits 0 if stdout is a tty, 1 if not


Instead of os.isatty(sys.stdout.fileno()) (ew) you should just use sys.stdout.isatty(). In addition to being shorter, it will work everywhere [that Python does], not just Unix.


Actually, it is very much unix style. With any modern shell you can simply turn off stderr, or only put stdout in the pipe, or put 2>/dev/null and so on. If both went to stdout then it would be a reasonable complaint.


> With any modern shell you can simply turn off stderr, or only put stdout in the pipe, or put 2>/dev/null and so on.

Or I could type fewer extra characters by using the '-s' flag, as I described in my comment. My point was about its default mode of operation: "noisy" (which is not, as I see it, a characteristic of "traditional unix" tools).

Edit: Additionally, if both went to stdout the tool would be completely unusable -- I'd simply file that under "data corruption".


That is not really a good assertion. Look at grep: default output is filename, line number, relevant line. This is far from a minimalist output. Or du, which by default lists all the files and subdirectories and their sizes. Minimal, quiet output doesn't define "traditional unix" style -- the ability to taylor the output to the situation, and further make the output as machine readable as possible is what makes it "unixy".


This is probably going on longer than it merits, but...

Those aren't really good comparisons.

Perhaps you have a different version of grep than I do, but my grep only prints filenames when more than one file has been specified for searching (in which case showing which file the match is from is necessary, I'd say), and certainly doesn't print line numbers if I don't give it '-n':

  [me@host: ~]% grep PATH /etc/bashrc 
  		if ! echo $PATH | /bin/egrep -q "(^|:)$1($|:)" ; then
  				PATH=$PATH:$1
  				PATH=$1:$PATH
  [me@host: ~]% grep PATH /etc/bashrc /etc/profile
  /etc/bashrc:		if ! echo $PATH | /bin/egrep -q "(^|:)$1($|:)" ; then
  /etc/bashrc:				PATH=$PATH:$1
  /etc/bashrc:				PATH=$1:$PATH
  /etc/profile:	if ! echo $PATH | /bin/egrep -q "(^|:)$1($|:)" ; then
  /etc/profile:	      PATH=$PATH:$1
  /etc/profile:	      PATH=$1:$PATH
  /etc/profile:export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE
  /etc/profile:export PATH
  [me@host: ~]% grep -V
  grep (GNU grep) 2.5.1
As for du...well, frankly I'd prefer if du's output were less verbose -- a default more like 'du --max-depth=1' would be more to my liking. Compare to 'ls' vs. 'ls -R' (though I realize du does need to operate recursively regardless of whether it prints a line for everything it finds, so perhaps that's not an ideal comparison).

But the main difference here is the nature of what they're printing. In both of the above cases, the extra output text is still much more immediately relevant to the program's real task than what curl prints to stderr. For a more direct analogy, it would be like grep printing to stderr '50MB searched, 974MB remaining...' as it searches through a 1GB file.


What I do

  * curl for any bash scripts
  * wget for downloading via command line
The error codes that curl output make it very powerful when creating a script. Something that gets run on cron & you want to know why it failed if it did.

Wget is simple, and for a shell it's what I always turn to. The only time I would use wget in a script is if I was making some sort of simple web crawler script.


same here. I use wget to do stuff, and I use curl to do complicated/fancy stuff. One is simple, one is powerful. I'll use simple whenever I can get away with it.


Probably wget's major reason for being is mirroring websites. It does this well. Curl is a more general HTTP library with a command line front end.

And of course there are GUI frontends for wget, but it's not really intended to be used as a library. I don't see a problem here.


I use curl for almost everything. Especially for debugging caching and whatnot, using curl -D - -o /dev/null http://example.net/ makes it extremely simple to see the headers the server is sending for a certain file.

Not only that, but curl is installed standard on Mac OS X whereas wget is not, also, most attacks that attack PHP tend to use wget to download further files to the target system, which means that when wget is not available no extra files are downloaded and it generally stops the exploit cold in its track as the attacker does not try again.


> most attacks that attack PHP tend to use wget to download further files

Nowadays, neither Wget nor Curl is required for downloading in PHP, because functions like file_get_contents() are able to handle URL arguments! So a missing Wget protects only against script kiddies who don't know PHP very well and copied over some outdated code snippets.

Also, not installing Wget "protects" you just from those kind of primitive attacks to which your system shouldn't be vulnerable in the first place.

Security is achieved by sealing entry doors, not by removing tools.


file_get_contents on most hosts does not allow external URL's, including the one where I am an administrator.

http://www.php.net/manual/en/filesystem.configuration.php#in...

Also, installing wget is just one more way to defend a server that is hosting multiple sites with multiple scripts all of which can not be trusted all the time. I don't know if tomorrow a new bug is found in phpBB that allows a remote exploit that runs as the user php runs under, if not having wget stops one such exploit it is a win in my book.

The other thing I have in my firewall is all outgoing access is blocked for the user that PHP runs under so that in the event something does happen and something is executed under that user it won't be able to make an outbound connection or get inbound connections.

Yes, security is achieved by sealing entry doors, but removing an entry door in the first place may be even better, wget being an entry door, unfortunately the front door is a huge gaping mess you could drive a truck through, but that is unfortunately required for the business I am in, administrating a server with multiple websites running various PHP scripts.


> file_get_contents on most hosts does not allow external URL's

This is still not really a gain in security, but ...

> The other thing I have in my firewall is all outgoing access is blocked for the user that PHP runs under

... this is! And since you do that, there is no need for strange measures like providing Curl but not Wget.

Wget, Curl and allow_url_fopen are just tools to access the network API. So you need to block that API, rather than blocking some tools which merely use that API.

Although I fully agree that it improves security if you remove all services that aren't needed, I don't think the same holds for tools. (except if they are SUID binaries, but such a program is in fact more a service than a tool)


Good points all round, X-Istence is suggesting defense in depth, and you (vog) suggest making sure the outer perimeter is secure. Both good ideas - even if you fill the hall with barbed wire, you should make sure to lock the front door. Beyond that, it's a trade off.


Having outbound access from the PHP user blocked is not always possible due to client needs to interact with remote services or API's (Flickr, Google, Twitter). Generally those run under a different user (yay FastCGI) but still can leave holes exposed that may not be easy to fix.

If having no wget installed removes one or two possibilities for an exploit it is a win in my book.


> Not only that, but curl is installed standard on Mac OS X whereas wget is not

On a vanilla Debian, Wget is installed but Curl is not.


> Especially for debugging caching and whatnot, using curl -D - -o /dev/null http://example.net/ makes it extremely simple to see the headers the server is sending for a certain file.

Working on web stuff, I'd needed to see server headers many times, and I have always used the HEAD command, that is part of the libwww-perl package on ubuntu. It's just: HEAD http://example.com.

The libwww-perl package also comes with GET (which is similar to curl in that it sends the retrieved content to standard output) and POST (which sends a POST request to a server) commands which are very handy too.


For that, I usually use 'curl -I http://example.com/, in case you don't want to install a separate package (or, for some reason, dislike perl).


The -D option was really neat, thanks for this useful tip!


You are welcome. It has made life so much easier for me in the past when I wanted to know what the server was sending exactly.


"libcurl does not put any restrictions on the program that uses the library."

I wonder how much of the activity could be attributed to such things.


That's a nice breakdown... I've been a curl user, and never really knew what the major differences were. And it doesn't seem overly biased to me. Many thanks to the author!


Does anyone knows how to mirror a site using wget or curl, and get all the assets? (specially images linked on stylesheets).


Some time ago I filed a bug report because of the stylesheet images. So it still isn't fixed? It's simply a bug.

I have tried and liked pavuk in the past, don't know if it supports stylesheet images.


I use HTTrack, usually handles the job just fine.


I find wget is almost always what you want, unless you're interested in development details like headers. If I want output to a pipe, I use wget -O -.

In scripts, I use wget -o <log-file> (or -a) so that I can tail -f to keep track of it. I don't see off hand what curl's log file support is, so I can't comment too deeply on it.


lwp-download should be in this comparison. I've found lots of random cases where it does stuff right and the others don't.


As should aria2. But that would make it a completely different type of an article, more like a Wikipedia sub-section. We already have Wikipedia.


When downloading via HTTPS curl seems to barf error codes often so I try wget first, and use curl only if wget doesn't work or curl is easier (which isn't often).


wget has a better-designed command line API and documentation, although its inability to PUT or POST is rather unhelpful. For example, curl uses --include to output full headers (wget uses --save-headers) and, more inexplicably, --request to change the request method, even though --method is available.


Use --post-data or --post-file to POST with wget.


--post-* is pretty sensible for POST. Unfortunately wget doesn't do PUT or DELETE.


kinda banal if you ask me. it's two simple tools that do approximately the same thing. even grep vs ack would be more interesting.

no offense to the authors of course, they did a great job in both cases.


As someone who have always used curl and wondered why I often see wget quoted in READMEs (as how to obtain some remote file), I quite enjoyed reading the write-up.

My take-away was that a) I am not missing out by staying with curl and b) the use of wget is likely due to being part of GNU.


You did notice this was written by curl's author? I only noticed after I read it and without that knowledge thought it was rather a pathetically biased article.

But you can't expect developers to be impartial about their own code and it was my fault for not noticing the disclaimer the first time. It reads completely different with that framing in mind, but still if I was looking for the definitive reason that one tool was recommended over another I'd consider the point of view of the developers as a starting point rather than an open and shut case.


I use wget for one shot downloads because it is less to type. I use libcURL when developing.

This is the difference between "wget http://www.google.ca/ and "curl -O http://www.google.ca/

Just a simple small but significant difference.


Well, plus the recursive download facilities. This is the reason why I mostly use wget on the command line, although I'd use curl in scripts.


I'm pretty sure awk is Turing Complete while grep isn't.


He said ack, not awk:

"ack is a tool like grep, designed for programmers with large trees of heterogeneous source code."

http://betterthangrep.com/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: