Hacker News new | past | comments | ask | show | jobs | submit login
The HTTP status code for a web server's default “hello” front page (utcc.utoronto.ca)
77 points by ingve on July 9, 2023 | hide | past | favorite | 89 comments



If this is talking about a "placeholder" index.htm that comes with the server, I think the answer is obvious: If you request that page, it will successfully find and serve it in the same way as any other page. Thus 200 is the expected response.

On the other hand, if there's no page to be served, not even a default one, then it should 404. (Unless the default config is to list the (empty) directory, and it exists, in which case 200.)

There's no need to complicate things beyond that. One thing that I've learned over many years of experience with software is that if at all possible you should never add additional conditions or edge-cases, because they will tend to create more problems than they solve. The server is behaving most consistently if it treats the placeholder page the same as any other.

Would you want a compiler that specifically detects "hello world" programs and compiles them to always return failure, under the similar argument that it's not a "real" program? Because that's the logical conclusion of this sort of inane overthinking.


It is about fallback placeholder which is not the same as default placeholder, but rather indicates that user did not configure his website at all. I'd say that 5xx is appropriate here. For example you don't want for Google to remember your website as "Apache installed" if its crawler happens to run nearby.


That, and "Coming soon" shouldn't be remembered either. So why not have a e.g.

    205 Temporary Placeholder
or something along those lines?


Ironically, I sought to make a reference to the GeoCities "Under Construction" animated GIFs, and upon visiting knowyourmeme.com, they served me a 500 error on their landing page. Justice!


Is that ultimately not the same as 404 – The resource you requested was not found, so here is a temporary placeholder?


Browsers, proxies and crawlers might remember 404 longer than any status code that has "temporary" in it. See also the various permanent and temporary redirect status codes.


404 is the temporary code. 410 is permanent.


Have you ever seen a 410 actually used IRL? This has to be one of those that's pretty high up in the code rarity list.


I've used it for resources that have been permanently deleted, rather than the 404 that you'll usually see. I think it makes sense for that sort of stuff.

From a user perspective, getting a 404 after following a link that previously worked can indicate a couple of things. Like maybe the resource still exists in some other place, but they didn't set up redirects. Maybe it's been "privated" in some way, and I no longer have access to it.

A 410 makes it explicitly clear to me, that the resource has been permanently deleted. It'd also be nice if the response included some metadata as to when the resource was deleted.


It is still a 404.

You can serve a body with a 404, user will see the appropriate message and robot can safely ignore this page until further notice. Search engine will often retry 404 later and slowly reduce crawling if it stays not found.


A failback placeholder is a 404…

5xx is a failure to serve the page, which is not the case if a placeholder is served


I fail to see the difference.

5xx still returns content. 5xx can return HTML content. Browser will display it properly.

4xx codes mean client error. 5xx codes mean server error. Server misconfiguration is a server error. Client did nothing wrong.


The server has been configured without content.


Definitely overcomplicating things there. robots.txt already exists.


If they haven't set up their landing page I doubt they configured a robots.txt


This is clearly talking about default behaviours, so it'd already be there as standard. It's no different than expecting a different status code for the purpose of requesting crawlers not to index.


robots.txt deindexing a site by default is not necessarily the right choice. Lots of people are clueless about it and would never even think to change it when going live.


You’re not wrong, but I also believe that a lot of these problems are inherently complicated. Trying to encapsulate all the cases for a given problem will inherently have edge cases. Yes they suck and yes we should try to prevent them when possible, but I feel like ignoring them is also a huge foot-gun.


That's one thing I keep reminding myself -

No code is faster than no code


Agreed. Anything other than 200 is going to require special case handling by the server. How else would it know whether it's serving up the default index? You could add a .htaccess (or equivalent) rule but that increases the likelihood of somebody forgetting to remove it.


otoh, if you somehow monitor if your web servers are working, you wouldn’t want your server to answer with 200 in the case the configuration is reset by error.

I mean, the real point is here : nobody cares what your server is answering just after being setup. However I can see how it’s a problem if for any reason, a server loses its configuration and still acts like everything is fine.

(ofc you can bypass this by monitoring a more specific url but it’s not always possible if you are not the one deciding what the server serves)

For your analogy, it’s more like you’d want your compiler to fail if you fed it with no source code.


The default should be a 404. The overwhelming majority of such pageviews are mistakes.


I run Nginx Proxy Manager as a reverse proxy on my home server that has NextCloud, Mastodon, and a few other things on it.

I kept hitting an issue where random things on the internet would stop working: USPS's login page, my WiFi garage door opener, etc.

I finally tracked the problem down to bright cloud flagging my home IP as a "proxy". We went back and forth for a while, and the one thing they eventually showed me was a number of subdomains for things I no longer had online (such as a gitlab instance), that now got the default Nginx Proxy Manager "success" page. (I have a wildcard subdomain set up, so it continues to resolve even after I take something down.)

It turns out that brightcloud's crawler just flags any page with the word "proxy" on it - it doesn't distinguish between a reverse proxy and the open forward proxies that their customers actually care about.

I switched the configuration to serve up a 404 for unrecognized domains/subdomains and haven't had a problem since.


Depending on how public you want your home server to be, I'd recommend either blocking IPs you don't want touching it (yes, this includes those "security scanner" services) or allowing only the ones you do.


I usually setup HTTP basic authentication for these types of things. It also prevents exploitation by bots when a zero day is out and you haven't patched yet. The username/password pair can be trivial, even something like `foo/bar` stops pretty much all automated scanning.


> On the other hand, the HTTP status code does matter (sometimes a lot) to programs that hit the URL, including status monitoring programs; these will probably consider their checks to fail if the web server returns a 404 and succeed if it returns a 200. If you're pointing status checking programs at the front page of your just set up web server to make sure it's up, probably you want a HTTP 200 code (although not if the real thing you're checking is whether or not the web server and the site have been fully set up).

This is a subtle but important distinction…

There are so many layers now between a user and the application code. What if due to some misconfiguration or new image push or ___ the web server or load balancer or PaaS router or CDN or Cloudflare or whatever starts serving some default placeholder, or error message, or its own content up on my URL?

That’s why I’d argue for a non-200 status code for the default “hello” page.

And in production monitoring I’d use something like https://heiioncall.com/blog/enhanced-api-monitoring-with-exp... to verify the presence of some special header set only by your application, so you know that your desired code is actually being called. (In addition to asserting the HTTP status code.)


But that is also an argument for 200. Because if you want to test your load balancer against your new web server you will want it to serve a 200 or else you will just see an error from the load balancer.


You should set up a health endpoint for that, rather than just serving the default page.


> As with other HTTP error codes, the real answer is that one should probably use whatever status code is most convenient.

That's all you need to know. HTTP, like the other components of the web stack, is an organically grown monstrosity that resembles what you would get if a thousand random people shat on a pile. Any attempt to extract philosophical purity and/or rigorous discussion from it is a massive waste of time. Just use what you feel makes sense at the current moment, and move on.


HTTP 418 I'm a teapot


Unironically, this feels right. It's not an error, it's not a success. It's just announcing what it is, a teapot - er, web server.


But 4xx codes are client errors. 418 - I'm a teapot is specified as an error ("I can't brew you a coffee, silly: I'm a teapot")[0].

[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418


The error seems appropriate: the client expects erroneously something from the server that it can't provide (otherwise there would be something other than the default page to return)


I mean sure, but it feels like a stretch (albeit a whimsical one) to say its more appropriate than 404 Not Found, which could also be described as "client expects erroneously something from the server that it can't provide".


Maybe 405 or 406 then? I mean, sure it's not right either, but better than 418? (I'd say go with 404.)


The client is mistaken in requesting coffee.


HTTP 420 Enhance your calm


This one actually gets use. I bumped up against an error in our internal services and it was returning 420's... Not fun to diagnose


Don’t give Elon any ideas


I think twitter is where it was first used, actually. They stopped after a while, but from what I read, it was a code you got before hitting your API limit. (iirc, you got a 417/429 error when you ran out)


HTTP 527 Railgun Listener to Origin


That one's not standard, and probably shouldn't ever have been used in the first place IMO


Neither is 420


[flagged]


>2. reeks of anglo-centrism.

Explain this, please. What about it is Anglo-centric?


Lighten up chap


No, just like engineers, doctors, etc. take their work seriously so should programmers.

There are places for jokes. This is not one of them.


The only joke I see is the cosmic irony of some pompous, self-appointed gatekeeper bitterly complaining about humour whilst being factually wrong in every particular of their complaint:

1. The 418 response code is not standard. It is described in RFC2324, which is an informational RFC, not a standards-track document. This rookie blunder illustrates how understanding the standards process can be as crucial as understanding the documents themselves.

2. Tea, and teapots, originate in China. Coffee, incidentally, has African and Islamic origins.

3. Doctors are very funny. I should know, I married one.

We take humour seriously here on the Hacker News forum. Do try to keep up


The BMJ’s yearly Christmas issue is a great example of doctors having a bit of fun with the serious format of medical journals, very much in the spirit of the teapot RFC.

(I married one too, cheers to doctor’s partners)


> 2. Tea, and teapots, originate in China. Coffee, incidentally, has African and Islamic origins.

"I'm a little teapot" is a British nursery rhyme... are you being deliberately daft? Of course I am not referring to the origins of the plant



[flagged]


You did not specify—or even mention the term—“anglosphere” previously in this thread, actually.


"2. reeks of anglo-centrism."


In a sensational burst of further irony, that remark is the most parochially anglo-centric to be found in the entire subthread, for implying that knowledge of colonial nursery rhymes is some kind of fundamental prerequisite to participation; even more egregiously, it depends on misconstruing the technical document: if a pedagogical verse in respect of vessel morphology was normative for the RFC, it would be a) correctly quoted, and b) referenced in section 10


418 gives no details as to the size of the teapot.


You're wasting your time. Despite the fact that computers literally run the world, the vast majority of software engineers have collectively decided that their work is only deserving of ridicule and stupid memes. Software engineering is the only profession in human history where self-hate is the cultural norm, and any attempt to imbue the work with seriousness and respect is invariably met with contempt from the professional masses.


As always, relevant xkcd

https://xkcd.com/2030/


Both the "joke" you linked to, and the fact that you linked to it, are part of the problem I described. If you can't respect what you do, I promise you that others won't either.


If it wasn't obvious (it probably wasn't), I like HTTP 418 and I applaud movements to make sure it stays. Having a good sense of humor is akin to oil that keeps the world working properly.


I'm just trying to make some money so I can live in the city and eat food. Not really interested in earning people's respect.

Why the hell would anyone care if someone respects their profession or not anyways? Seems like a weird thing to get bent out of shape about.

Lighten up a little man, life is supposed to be about having fun and enjoying it.


You are a clown


OP suggests 404 Not Found, but there’s also argument to made for a 5xx server error. After all, this “hello” front page pops up not because of client error, but because the server has not been configured properly.


I like this idea.

What about "518 - teapot not configured" ?


You should not publicly admit you are unconfigured, attackers might take advantage.


Which on the opposite side, default configurations should disable as much as possible as to reduce attack surface.


Now you'd have to argue the semantics of what it means to be misconfigured. Is it misconfigured just because the placeholder index hasn't yet been replaced? How does the server know?


The probable reason they used an error handler for that "welcome" page is so that it would keep /var/www/html empty and any upgrades wouldn't try to replace an index.html or whatever you put there yourself. So it's a "hack" to serve a welcome page from outside the default DocumentRoot, not to force some kind status code. That status code is just a downside of this hack and not really of importance because whoever made it also knew it was going to be the first thing you remove when you want to use the webserver.


I build my own Apache container images (long story, Nginx and Caddy are okay too for most purposes) and I need to do health checks, so I also had to think about this.

When I launch a container that's supposed to sit in front of multiple other containers as a reverse proxy or just serve static files, I need to know whether the Apache process in it is working and actually serving files. This is regardless of the rest of the configuration and whether every site is up: for example, if 19 out of 20 sites are configured correctly and are served, the failing one can be addressed separately later.

In my case, that's as easy as the following:

  healthcheck:
    test: "curl http://127.0.0.1:80 | grep 'Apache2 is up and running' || exit 1"
    interval: 10s
    timeout: 10s
    retries: 6
    start_period: 5s
(there are also separate external uptime checks for the actual sites with domains and HTTPS, too)

I serve some HTML files by default in every container with specific contents, in addition to any domains that the web server has configured. If I can access these default files, that means that the web server is up and I can then think about testing the rest of the configuration myself. In this case, I decided to check the file contents instead of the status code.

Frankly, you can get the same result with just status codes (probably a 200 IMO because of how simple it is), or maybe some specific HTML contents in the default page to identify whether you're talking to the correct web server instance and have deployed it in the right place, maybe even page contents that have been generated by something like PHP-FPM to check whether scripts get executed correctly (also like testing OpenResty with Lua) if you need that sort of thing.

Whatever works for you.


I propose using 418 simply because it is a weird enough behavior that the developer will consult docs to fix the placeholder page.


Is there no 2xx code that indicates "this server is healthy and could be configured to serve real web pages, so we're telling you it's a sort of success"? Doesn't look like it.

Proposal: 209 Healthy With Minimum Configuration


204 No Content

Seems to fit that bill well. Though I'm not sure exactly what your browser would show as it's meant not to cause a direct to a new page.


204 No Content is not supposed to actually include any content

https://github.com/httpwg/http-core/issues/26


Which is what we should be doing on a lot of these 'unconfigured' services and devices that are not setup beyond defaults.

Give no information about what it is.

Give a status to a monitor that it is alive.


A pity that Not Implemented must not be returned by GET. Makes sense here.


That's an quirk of how people think about HTTP response codes, which are defined confusingly enough already.

They started out as server response codes but once people started making web applications and APIs they started overloading the server response codes to also have meaning as application response codes, making things even more confusing.

Not Implemented is supposed to mean that the server does not implement the request verb (PUT, PATCH, DELETE, POST, etc.), not really relevant to anything that might exist or be running on the server.

Maybe there could've been a separate standard header for application status, but that might not be great either since everything would have to handle all combinations of server and app status.


these are such weird philosophical stances. just do the thing that makes sense to you or makes sense for your use and move on. there is not a strictly correct choice.


There's indeed no need to overthink it.


I think it should be 404, as in "you reached a functioning web server, but there's nothing here." Body can be whatever.


Even better are APIs or API gateways that respond with 200s but have a completely different HTTP code within the payload!


I like "204 No Content" for this.


Webservers should have a maintenance mode setting. If this setting is enabled, an HTTP 503 is returned for any request. The setting should be enabled by default.


I think an error is the correct behaviour, errors are for machines, and no machine is ever accessing an url in hopes of finding the default placeholder page.


I'd argue for 200, but if you want to express that it is not the final page why not 302 to the temporary page.


A 301 to itself would be more in line with tradition


But to where?



You can tell it's Sunday because this is no. 1 on the front page.


if you are new to having a server, and use password access, make sure to install fail2ban. once you have it running, you'll understand why it's important.


If you are new to having a server, don't use password access. Same goes if you're old to having a server. Just never use password access.


Maybe the author arrived at this topic in an idly curious manner and I wish him no ill will, but man, it sure is fucking obnoxious whenever this sort of inconsequential lawyering flares up among my real world colleagues. I think teams should be forbidden from having more than one Hacker News reader in their ranks.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: