If this is talking about a "placeholder" index.htm that comes with the server, I think the answer is obvious: If you request that page, it will successfully find and serve it in the same way as any other page. Thus 200 is the expected response.
On the other hand, if there's no page to be served, not even a default one, then it should 404. (Unless the default config is to list the (empty) directory, and it exists, in which case 200.)
There's no need to complicate things beyond that. One thing that I've learned over many years of experience with software is that if at all possible you should never add additional conditions or edge-cases, because they will tend to create more problems than they solve. The server is behaving most consistently if it treats the placeholder page the same as any other.
Would you want a compiler that specifically detects "hello world" programs and compiles them to always return failure, under the similar argument that it's not a "real" program? Because that's the logical conclusion of this sort of inane overthinking.
It is about fallback placeholder which is not the same as default placeholder, but rather indicates that user did not configure his website at all. I'd say that 5xx is appropriate here. For example you don't want for Google to remember your website as "Apache installed" if its crawler happens to run nearby.
Ironically, I sought to make a reference to the GeoCities "Under Construction" animated GIFs, and upon visiting knowyourmeme.com, they served me a 500 error on their landing page. Justice!
Browsers, proxies and crawlers might remember 404 longer than any status code that has "temporary" in it. See also the various permanent and temporary redirect status codes.
I've used it for resources that have been permanently deleted, rather than the 404 that you'll usually see. I think it makes sense for that sort of stuff.
From a user perspective, getting a 404 after following a link that previously worked can indicate a couple of things. Like maybe the resource still exists in some other place, but they didn't set up redirects. Maybe it's been "privated" in some way, and I no longer have access to it.
A 410 makes it explicitly clear to me, that the resource has been permanently deleted. It'd also be nice if the response included some metadata as to when the resource was deleted.
You can serve a body with a 404, user will see the appropriate message and robot can safely ignore this page until further notice. Search engine will often retry 404 later and slowly reduce crawling if it stays not found.
This is clearly talking about default behaviours, so it'd already be there as standard. It's no different than expecting a different status code for the purpose of requesting crawlers not to index.
robots.txt deindexing a site by default is not necessarily the right choice. Lots of people are clueless about it and would never even think to change it when going live.
You’re not wrong, but I also believe that a lot of these problems are inherently complicated. Trying to encapsulate all the cases for a given problem will inherently have edge cases. Yes they suck and yes we should try to prevent them when possible, but I feel like ignoring them is also a huge foot-gun.
Agreed. Anything other than 200 is going to require special case handling by the server. How else would it know whether it's serving up the default index? You could add a .htaccess (or equivalent) rule but that increases the likelihood of somebody forgetting to remove it.
otoh, if you somehow monitor if your web servers are working, you wouldn’t want your server to answer with 200 in the case the configuration is reset by error.
I mean, the real point is here : nobody cares what your server is answering just after being setup. However I can see how it’s a problem if for any reason, a server loses its configuration and still acts like everything is fine.
(ofc you can bypass this by monitoring a more specific url but it’s not always possible if you are not the one deciding what the server serves)
For your analogy, it’s more like you’d want your compiler to fail if you fed it with no source code.
I run Nginx Proxy Manager as a reverse proxy on my home server that has NextCloud, Mastodon, and a few other things on it.
I kept hitting an issue where random things on the internet would stop working: USPS's login page, my WiFi garage door opener, etc.
I finally tracked the problem down to bright cloud flagging my home IP as a "proxy". We went back and forth for a while, and the one thing they eventually showed me was a number of subdomains for things I no longer had online (such as a gitlab instance), that now got the default Nginx Proxy Manager "success" page. (I have a wildcard subdomain set up, so it continues to resolve even after I take something down.)
It turns out that brightcloud's crawler just flags any page with the word "proxy" on it - it doesn't distinguish between a reverse proxy and the open forward proxies that their customers actually care about.
I switched the configuration to serve up a 404 for unrecognized domains/subdomains and haven't had a problem since.
Depending on how public you want your home server to be, I'd recommend either blocking IPs you don't want touching it (yes, this includes those "security scanner" services) or allowing only the ones you do.
I usually setup HTTP basic authentication for these types of things. It also prevents exploitation by bots when a zero day is out and you haven't patched yet. The username/password pair can be trivial, even something like `foo/bar` stops pretty much all automated scanning.
> On the other hand, the HTTP status code does matter (sometimes a lot) to programs that hit the URL, including status monitoring programs; these will probably consider their checks to fail if the web server returns a 404 and succeed if it returns a 200. If you're pointing status checking programs at the front page of your just set up web server to make sure it's up, probably you want a HTTP 200 code (although not if the real thing you're checking is whether or not the web server and the site have been fully set up).
This is a subtle but important distinction…
There are so many layers now between a user and the application code. What if due to some misconfiguration or new image push or ___ the web server or load balancer or PaaS router or CDN or Cloudflare or whatever starts serving some default placeholder, or error message, or its own content up on my URL?
That’s why I’d argue for a non-200 status code for the default “hello” page.
And in production monitoring I’d use something like https://heiioncall.com/blog/enhanced-api-monitoring-with-exp... to verify the presence of some special header set only by your application, so you know that your desired code is actually being called. (In addition to asserting the HTTP status code.)
But that is also an argument for 200. Because if you want to test your load balancer against your new web server you will want it to serve a 200 or else you will just see an error from the load balancer.
> As with other HTTP error codes, the real answer is that one should probably use whatever status code is most convenient.
That's all you need to know. HTTP, like the other components of the web stack, is an organically grown monstrosity that resembles what you would get if a thousand random people shat on a pile. Any attempt to extract philosophical purity and/or rigorous discussion from it is a massive waste of time. Just use what you feel makes sense at the current moment, and move on.
The error seems appropriate: the client expects erroneously something from the server that it can't provide (otherwise there would be something other than the default page to return)
I mean sure, but it feels like a stretch (albeit a whimsical one) to say its more appropriate than 404 Not Found, which could also be described as "client expects erroneously something from the server that it can't provide".
I think twitter is where it was first used, actually. They stopped after a while, but from what I read, it was a code you got before hitting your API limit. (iirc, you got a 417/429 error when you ran out)
The only joke I see is the cosmic irony of some pompous, self-appointed gatekeeper bitterly complaining about humour whilst being factually wrong in every particular of their complaint:
1. The 418 response code is not standard. It is described in RFC2324, which is an informational RFC, not a standards-track document. This rookie blunder illustrates how understanding the standards process can be as crucial as understanding the documents themselves.
2. Tea, and teapots, originate in China. Coffee, incidentally, has African and Islamic origins.
3. Doctors are very funny. I should know, I married one.
We take humour seriously here on the Hacker News forum. Do try to keep up
The BMJ’s yearly Christmas issue is a great example of doctors having a bit of fun with the serious format of medical journals, very much in the spirit of the teapot RFC.
In a sensational burst of further irony, that remark is the most parochially anglo-centric to be found in the entire subthread, for implying that knowledge of colonial nursery rhymes is some kind of fundamental prerequisite to participation; even more egregiously, it depends on misconstruing the technical document: if a pedagogical verse in respect of vessel morphology was normative for the RFC, it would be a) correctly quoted, and b) referenced in section 10
You're wasting your time. Despite the fact that computers literally run the world, the vast majority of software engineers have collectively decided that their work is only deserving of ridicule and stupid memes. Software engineering is the only profession in human history where self-hate is the cultural norm, and any attempt to imbue the work with seriousness and respect is invariably met with contempt from the professional masses.
Both the "joke" you linked to, and the fact that you linked to it, are part of the problem I described. If you can't respect what you do, I promise you that others won't either.
If it wasn't obvious (it probably wasn't), I like HTTP 418 and I applaud movements to make sure it stays. Having a good sense of humor is akin to oil that keeps the world working properly.
OP suggests 404 Not Found, but there’s also argument to made for a 5xx server error. After all, this “hello” front page pops up not because of client error, but because the server has not been configured properly.
Now you'd have to argue the semantics of what it means to be misconfigured. Is it misconfigured just because the placeholder index hasn't yet been replaced? How does the server know?
The probable reason they used an error handler for that "welcome" page is so that it would keep /var/www/html empty and any upgrades wouldn't try to replace an index.html or whatever you put there yourself. So it's a "hack" to serve a welcome page from outside the default DocumentRoot, not to force some kind status code. That status code is just a downside of this hack and not really of importance because whoever made it also knew it was going to be the first thing you remove when you want to use the webserver.
I build my own Apache container images (long story, Nginx and Caddy are okay too for most purposes) and I need to do health checks, so I also had to think about this.
When I launch a container that's supposed to sit in front of multiple other containers as a reverse proxy or just serve static files, I need to know whether the Apache process in it is working and actually serving files. This is regardless of the rest of the configuration and whether every site is up: for example, if 19 out of 20 sites are configured correctly and are served, the failing one can be addressed separately later.
In my case, that's as easy as the following:
healthcheck:
test: "curl http://127.0.0.1:80 | grep 'Apache2 is up and running' || exit 1"
interval: 10s
timeout: 10s
retries: 6
start_period: 5s
(there are also separate external uptime checks for the actual sites with domains and HTTPS, too)
I serve some HTML files by default in every container with specific contents, in addition to any domains that the web server has configured. If I can access these default files, that means that the web server is up and I can then think about testing the rest of the configuration myself. In this case, I decided to check the file contents instead of the status code.
Frankly, you can get the same result with just status codes (probably a 200 IMO because of how simple it is), or maybe some specific HTML contents in the default page to identify whether you're talking to the correct web server instance and have deployed it in the right place, maybe even page contents that have been generated by something like PHP-FPM to check whether scripts get executed correctly (also like testing OpenResty with Lua) if you need that sort of thing.
Is there no 2xx code that indicates "this server is healthy and could be configured to serve real web pages, so we're telling you it's a sort of success"? Doesn't look like it.
That's an quirk of how people think about HTTP response codes, which are defined confusingly enough already.
They started out as server response codes but once people started making web applications and APIs they started overloading the server response codes to also have meaning as application response codes, making things even more confusing.
Not Implemented is supposed to mean that the server does not implement the request verb (PUT, PATCH, DELETE, POST, etc.), not really relevant to anything that might exist or be running on the server.
Maybe there could've been a separate standard header for application status, but that might not be great either since everything would have to handle all combinations of server and app status.
these are such weird philosophical stances. just do the thing that makes sense to you or makes sense for your use and move on. there is not a strictly correct choice.
Webservers should have a maintenance mode setting. If this setting is enabled, an HTTP 503 is returned for any request. The setting should be enabled by default.
I think an error is the correct behaviour, errors are for machines, and no machine is ever accessing an url in hopes of finding the default placeholder page.
if you are new to having a server, and use password access, make sure to install fail2ban. once you have it running, you'll understand why it's important.
Maybe the author arrived at this topic in an idly curious manner and I wish him no ill will, but man, it sure is fucking obnoxious whenever this sort of inconsequential lawyering flares up among my real world colleagues. I think teams should be forbidden from having more than one Hacker News reader in their ranks.
On the other hand, if there's no page to be served, not even a default one, then it should 404. (Unless the default config is to list the (empty) directory, and it exists, in which case 200.)
There's no need to complicate things beyond that. One thing that I've learned over many years of experience with software is that if at all possible you should never add additional conditions or edge-cases, because they will tend to create more problems than they solve. The server is behaving most consistently if it treats the placeholder page the same as any other.
Would you want a compiler that specifically detects "hello world" programs and compiles them to always return failure, under the similar argument that it's not a "real" program? Because that's the logical conclusion of this sort of inane overthinking.