Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Single-header C++11 HTML document constructor (github.com/tinfoilboy)
87 points by tinfoilboy on Dec 16, 2018 | hide | past | favorite | 10 comments



It's worth getting the escaping right in a library like this. For example, at https://github.com/tinfoilboy/CTML/blob/master/include/ctml....

  for (const auto& attr : m_attributes)
    output << " " << attr.first + "=\"" << attr.second + "\"";
it'll generate incorrect HTML if an attribute value has a " character. Although early versions of HTML were kind of vague on how escaping was supposed to work, the HTML5 standard explains it in detail.


I'll add escaping attributes to the library according to the spec. Thank you for the heads up!


Please look it over once, twice and thrice. Getting escaping correct is the main reason to use a library like this over manual construction with a heap of string concatenations.



I thought about this for a little while, and I think I don't like the API for this library. There are two things in particular I don't like. The first is that I think I would prefer to have helper types for each tag type so you don't have to include them as strings all the time, and the second is that I don't like the AppendChild approach that this library takes.

I would change it so that you pass the document to the constructor of each element and the scoping of each variable effectively determines the relationships. As a 10 second example of what I mean:

    HTML::document d;
    HTML::body b( d );
    HTML::div div( d );
    HTML::p p( d );
    d.text( "Hello world" );

    <html><body><div><p>Hello world</p></div></body></html>
The reason I like this because it maps very well onto a C++ programmers natural understanding of the stack frame and RAII, and in addition it can be implemented with needing to store any state inside the node classes. This means that only HTML::document would need to actually allocate any memory, and it would just be a single text stream.

This wouldn't create a node hierarchy in memory, so it's not a DOM like this library creates, but if you are just looking to output HTML quickly, then I think it would be easier to use and more efficient.


I really dislike that.

The document is mutable and goes through a whole sequence of states that aren't what you want?

Constructing a body object with a document argument modifies the document?


While I do agree with the idea of outputting HTML quickly, I think that in the end I'd want to emulate the DOM more closely. For instance, I've been thinking about adding a simple parser to the library so that I could use it in making a web scraper. With a replication of the DOM, I could then easily find nodes that I'd like to grab from the scraper. In addition, I was also thinking of adding actual element grabbing via selectors a la CSS, which would require a DOM representation.

However, the helper types could be useful, and might be able to be implemented as simple aliases to a Node. Also, the reason I take the approach of appending child nodes is for representing actual HTML easier. For instance, with your .text example (at least with my limited glance on it), you can't do something such as <p>Hello, <span>world!</span> Welcome!</p>, which was actually a previous problem that I had with another version of the library.


You would implement that like this:

    HTML::document d;   
    HTML::p p( d );
    d.text( "Hello, " );
    {
        HTML::span s( d );
        d.text( "world!" );
    }
    d.text( " Welcome!" );

Obviously though, if you were trying to do an HTML parser as well, then my ideas are not appropriate.


A reason why I don't like this approach is that I have to know where an element goes when creating it.

I prefer creating an element and then moving it into place, so a module of my application can create some structure in isolation.


I agree. And your way I can add the same subtree to multiple parent nodes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: