There are so many odd edge cases in HTML, a good one I found was with forms. If you open a <form> but don't have a closing tag, the browser will close the form block "visually" at the end of the forms immediate parent, as you would expect. All styles are applied to it, or children via selectors, up to that automatically inserted end point. It's how browsers handles most unclosed block tags.
However, the forms "functionality" isn't closed at that point, any inputs further down the page (outside of the forms DOM tree) are included in the post/get when the form is submitted. Or at least until another form is found in the DOM. Effectively an unclosed form is two things, a visual block that is closed automatically, and an "overlapping" form capturing inputs indefinitely.
> The form element pointer points to the last form element that was opened and whose end tag has not yet been seen. It is used to make form controls associate with forms in the face of dramatically bad markup, for historical reasons.
And search through the rest of the page for the term to find how it’s implemented—it’s straightforward, just set on a <form> open tag and reset on an (explicit) </form> close tag.
This is somewhat unreliable: browsers support it, but tools using XML pipelines are allowed to ignore it (§13.2.9), and lots of JavaScript code will assume hierarchy rather than using form.elements, and thus not catch such elements, or elements that manually specify a form owner via the form attribute.
③ Look through the DOM interface listed, elements sounds promising. Find the explanation of that IDL attribute below: “The elements IDL attribute must return an HTMLFormControlsCollection rooted at the form element's root, whose filter matches listed elements whose form owner is the form element, with the exception of input elements whose type attribute is in the Image Button state, which must, for historical reasons, be excluded from this particular collection.” Roll your eyes at the bizarre exclusion of <input type=image>, then focus on the term form owner which sounds relevant. That links you to https://html.spec.whatwg.org/multipage/form-control-infrastr....
④ Hmm… null, parser inserted flag, nearest ancestor form element, form attribute. Parser inserted flag sounds relevant (though it’s just a flag, not the actual association link). Also the note “They are also complicated by rules in the HTML parser that, for historical reasons, can result in a form-associated element being associated with a form element that is not its ancestor.”
⑤ This is where having the whole spec open, rather than the multipage version, is handy: you can search the entire document for the term “parser inserted flag” to see where that gets set. You can also guess that it’s going to be in §13.2 Parsing HTML documents (parsing.html). In the end, it’s https://html.spec.whatwg.org/multipage/parsing.html#creating...: “… then associate element with the form element pointed to by the form element pointer and set element's parser inserted flag.” Ah hah!
⑥ You have found the concept in the parser: “form element pointer”. You can then look through where it’s used and quickly see how it’s set on <form> and unset on </form>, thus deliberately handling the missing-</form> case.
You develop a feeling for this kind of thing over time. I didn’t know about the form element pointer (though I feel I should have known about it), but this is a loose description of what I did, though I was able to speed through some of the steps, and I really should have just started by looking at “An end tag whose tag name is "form"”, but at first I thought the claim was bogus.
I think I got to point 2, found no reference in the form tag section, and gave up.
But what's fascinating is that it describes the html parser effectively implementing "overlapping markup", as in the Wikipedia article, for this edge case for backwards compatibility.
There’s a lot of messy stuff the HTML syntax supports for historical reasons.
https://html.spec.whatwg.org/multipage/parsing.html#an-intro... covers a variety of fun ones, like how inline formatting elements basically support overlapping markup, and how malformed nesting can be achieved in a few different ways. It’s a useful section of the spec for understanding these things because it explains what’s going on, with links to the precise parser details.
It's not in the DOM, from memory chrome dev tools even shows a closing form tag where it's been inserted. I have no idea how it's implemented internally.
Confuse me for a while when debugging a legacy website. It had actually been done intentionally to work around a rather complex architecture.
There exists a "form"-attribute for input elements that can be used to associate input elements outside the form hierarchy to be included in the form submission.
So the semantics of "form field outside the actual form" are available anyway. When parsing a not-closed <form> the browsers just make use of that.
XHTML was an attempt at such strictness. It failed, though you can still use XML syntax for HTML if you choose to (serve it with MIME type application/xhtml+xml instead of text/html). I leave you to research why it failed if you want; plenty has been written on the topic.
In the mid-2000s, HTML syntax was finally codified. There is now no undefined behaviour: all inputs have a defined output, however surprising may be, because a lot of it was being relied upon.
However, the forms "functionality" isn't closed at that point, any inputs further down the page (outside of the forms DOM tree) are included in the post/get when the form is submitted. Or at least until another form is found in the DOM. Effectively an unclosed form is two things, a visual block that is closed automatically, and an "overlapping" form capturing inputs indefinitely.