Hacker News new | past | comments | ask | show | jobs | submit login

This behaviour is defined and explained in the HTML spec with the form element pointer <https://html.spec.whatwg.org/multipage/parsing.html#form-ele...>:

> The form element pointer points to the last form element that was opened and whose end tag has not yet been seen. It is used to make form controls associate with forms in the face of dramatically bad markup, for historical reasons.

And search through the rest of the page for the term to find how it’s implemented—it’s straightforward, just set on a <form> open tag and reset on an (explicit) </form> close tag.

This is somewhat unreliable: browsers support it, but tools using XML pipelines are allowed to ignore it (§13.2.9), and lots of JavaScript code will assume hierarchy rather than using form.elements, and thus not catch such elements, or elements that manually specify a form owner via the form attribute.




Thanks! My 2 minuets of googling back when I found it didn't surface that and I moved on to the next job.

Somehow despite coding html for 25 years I had either not seen the input form attribute or forgotten about it. I suspect the latter!


Steps on finding this from the HTML spec:

① Start at https://html.spec.whatwg.org/multipage/. Or https://html.spec.whatwg.org/ if you prefer, with everything in one page, but that’s a big document. You can also build it all locally yourself if you like. I have.

② “The form element” sounds like a good place to look. https://html.spec.whatwg.org/multipage/forms.html#the-form-e...

③ Look through the DOM interface listed, elements sounds promising. Find the explanation of that IDL attribute below: “The elements IDL attribute must return an HTMLFormControlsCollection rooted at the form element's root, whose filter matches listed elements whose form owner is the form element, with the exception of input elements whose type attribute is in the Image Button state, which must, for historical reasons, be excluded from this particular collection.” Roll your eyes at the bizarre exclusion of <input type=image>, then focus on the term form owner which sounds relevant. That links you to https://html.spec.whatwg.org/multipage/form-control-infrastr....

④ Hmm… null, parser inserted flag, nearest ancestor form element, form attribute. Parser inserted flag sounds relevant (though it’s just a flag, not the actual association link). Also the note “They are also complicated by rules in the HTML parser that, for historical reasons, can result in a form-associated element being associated with a form element that is not its ancestor.”

⑤ This is where having the whole spec open, rather than the multipage version, is handy: you can search the entire document for the term “parser inserted flag” to see where that gets set. You can also guess that it’s going to be in §13.2 Parsing HTML documents (parsing.html). In the end, it’s https://html.spec.whatwg.org/multipage/parsing.html#creating...: “… then associate element with the form element pointed to by the form element pointer and set element's parser inserted flag.” Ah hah!

⑥ You have found the concept in the parser: “form element pointer”. You can then look through where it’s used and quickly see how it’s set on <form> and unset on </form>, thus deliberately handling the missing-</form> case.

You develop a feeling for this kind of thing over time. I didn’t know about the form element pointer (though I feel I should have known about it), but this is a loose description of what I did, though I was able to speed through some of the steps, and I really should have just started by looking at “An end tag whose tag name is "form"”, but at first I thought the claim was bogus.


I think I got to point 2, found no reference in the form tag section, and gave up.

But what's fascinating is that it describes the html parser effectively implementing "overlapping markup", as in the Wikipedia article, for this edge case for backwards compatibility.


There’s a lot of messy stuff the HTML syntax supports for historical reasons.

https://html.spec.whatwg.org/multipage/parsing.html#an-intro... covers a variety of fun ones, like how inline formatting elements basically support overlapping markup, and how malformed nesting can be achieved in a few different ways. It’s a useful section of the spec for understanding these things because it explains what’s going on, with links to the precise parser details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: