Hacker News new | past | comments | ask | show | jobs | submit login
Something is wrong with this picture. (w3.org)
192 points by Nemmie on Nov 7, 2011 | hide | past | favorite | 81 comments



Either you have separation of data and presentation or you don't. HTML didn't. HTML5 still doesn't. It is not possible to present an arbitrary block of data in a normalized, optimal form, and have CSS render it any way you want. For that you need script (or XSLT!), but even then, unless you go canvas, you have to deal with a markup language for presentation that is actively trying to not be presentation but be data instead. Its trying and failing. Life would be so much easier (and involve much less cognitive dissonance) if the DOM stuck to being presentation.

As it is, developing web apps is this giant joke, where you encrypt what you really want to happen in the form of pseudo-data (html) and magic rule-based transformation (css), and then the browser goes to a ton of effort to attempt to recreate what you really wanted. The effort the browser has to go through as soon as you introduce a div, or change its class, just to determine which, if any, of the CSS rules now apply to that node and any of its children, is just offensive. Worse: its something you, the programmer, must be able to model in your head to determine what the fuck is going on. Good luck with that! The result we get is "try it and see" "programming".

HTML1.0 was a local optimum. Somewhere there is another, better optimum. The W3C appears to be willing to travel the Himalayas of suboptimal to find it.


HTML is a structured document format, with CSS for styling and presentation.

The web became so popular that developers want HTML to be an unstructured data format with a separate fully-fledged layout language.[1]

Neither idea is wrong, there's just an impedance mismatch between the two.

There are three options:

1. The status quo. Developers mangle technologies designed for something different so it does what they want, with an increasing use of libraries and abstractions. It's inefficient and messy, but it works and is backwards compatible.

2. Give up on the idea of structured data and try to mangle HTML/CSS to be closer to what developers want. You end up with a crap structured document language and a crap web application language, but the web application language will be slightly less crap than the status quo.

3. Create a new language actually designed for web applications with a proper layout engine and HIGs to go with it. Despite being the most sensible option, this isn't realistically going to happen. A lack of backwards compatibility kills adoption and browser makers will never agree. It took them a decade to agree on a font format.

None of the options are ideal. I suspect somewhere between 1 and 2 will happen in reality.

[1] Layout is also a really hard problem. I don't think it's been acceptably solved in the general case.


Amen.

More disturbing still: because those people who are learning to program /now/ - in this decadent era of webapps and mobile devices - learn the brain damage that is modern web development early on. And it turns out (sorry, no evidence here but my own observations) that when you're first learning something, it's easy to mistake a brain-damaged system for an elegant one. And so, yes, there are people learning to program today, holding up CSS as an example of elegance.

To be honest, I wouldn't complain about CSS so much - even if it lacked the much-seeked "separation of data and presentation" (isn't there a buzzword for that somewhere?) - if only accomplishing simple tasks didn't require obscure, unreliable hacks. As long as CSS was capable of formatting a reasonably-constructed document (titles before text, left-hand elements before right-hand elements) without hacks, I'd be happy. But it's not. And it doesn't look like it will be any time in the near future.


  > As long as CSS was capable of formatting a 
  > reasonably-constructed document (titles before text,
  > left-hand elements before right-hand elements) without
  > hacks, I'd be happy.
What hacks do you need to do that?


Not sure what is meant by titles before text, but it's impossible with css to reliably change the order of elements on the screen. You can try to float them differently, but this introduces side-effects, and won't work well with more than two elements.


Of course it's impossible, because CSS was never intended to do that in the first place. Seems to me that too many people are trying to use HTML/CSS in ways it was never meant to be and then complaining about the fact it doesn't work as expected.

I'm totally shocked that my car won't drive up the side of my office building.

If you are wanting to move stuff around in an HTML document with CSS you'll have to go with absolutely positioning every element within a container. Even then you'll need javascript to change classes and/or styles of the elements to accomplish that.


That's kind of the whole point. Why even pretend that HTML is data and CSS is presentation when the presentation language can't even reliably order things on the screen?

You claim that "CSS was never intended to do that" but the goal of CSS is explicitly to manage the rendering of a document on a given device/browser. Obviously it was intended to manage layout. Deciding which order to show sections is absolutely something a presentation layer should be able to manage, and CSS can't do it.


I pretend nothing, HTML is for structured data and CSS is for presentation of that data. Just because it doesn't do what you want doesn't mean the definitions are wrong.

CSS was never intended to manage layout, thus it has very little tools to do so. The HTML was intended to manage the layout. CSS changes the presentation of the document as structured by the underlining HTML.

HTML controls the order of elements on the page quite well. What you are wanting is a reliable method to CHANGE the order of elements on the page. That is what I mean that CSS was never intended to do in the first place. HTML/CSS were developed on the idea of structured documents that do not change in real-time.

You are wanting to take methods from a totally different set of standards and force-feed them onto this standard. Web pages were created to be static documents, much like printed pages, not applications.

If you are wanting to control the order of elements on the page in real-time then I would suggest you look into having all your elements absolutely positioned inside a container. Then you can use javascript to move and hide elements all you want. Just keep in mind the pros and cons of doing that.


With 960.gs, it's quite easy to do this with push_ and pull_ directives.

It doesn't change, semantically, the order of the markup obviously, but that's to be expected.


I think you are missing the fact that a good document really needs three things:

1) Data (RDF, NoSQL, RDBMS, etc)

2) Structural presentation (i.e. what HTML or LaTeX provides) and

3) Presentation to the user (i.e. what CSS or macro packages in LaTeX provide)

To get from 1 to 2, you have to have some logic. You could do it with Javascript acting against RDF and HTML, I suppose. Or you could do it with XSLT. However there's no inherent guarantee that inherent data structure will in any way match your document structure and so these are really separate concerns.

This is why HTML template systems are so important for web programming.


>Either you have separation of data and presentation or you don't. HTML didn't. HTML5 still doesn't.

Is it really so desirable? I understand the appeal in theory, but in practice is it really worth it in most cases?

Or perhaps I'm so scarred by HTML and CSS that I can't even visualize a web with true separation of data and presentation that actually makes life better.

Edit: some of the other comments discuss this. Of course I see the advantage of abstraction and reusability. What I'm really asking about is the advantage, or even feasibility, of a pure, or strict, separation of data and presentation.


Totally agree.

In my opinion, the problem of separation of data and presentation won't be solved by markup or CSS.

If a Web page is to contain data and a service wants to act on this data, it has to scrape the Web page. Which is even harder with scripted pages. But scraping data isn't a solution. The semantic web may try to have web developers bring sense to data on a web page, but the problem remains. It's just a markup patch. It's doesn't define how to act on that data. Web Intents are just another patch to markup to bring verbs.

Direct access to the data sources with well define methods to act on that data and interact with it (instead of using form) is what works today through APIs. What doesn't work is that there aren't a lot of open standards APIs. Most well used APIs are proprietary and Facebook's a good example. I believe standard bodies should put their brains and efforts on defining API standards. Some standards APIs some clunky, are in use in B2B in the back ends but there's not much of it on the consumer facing Web. We must move forward to push separation of presentation, data and verbs for the whole Web, one small step at a time.

If most use cases on the Web used standard APIs, we would have true separation of content and presentation. We would even have the verbs to exchange/create understandable content. Then, you can use HTML/CSS to adapt a UI to any device with true separation.

That's the way we build apps and sites today and with standard, it would pave the way to a more exciting future.

So one day, if I want to have my own customized UI for that new holographic/gesture recognition device to shop with my preferred merchants, I just have to build an app and I'll be able to browse their merchandise, sort it like I want and finalize a transaction without even visiting their Web site.


>If a Web page is to contain data and a service wants to act on this data, it has to scrape the Web page.

This is completely the wrong approach and this isn't how people who know what they're doing work now.

A web page is just a presentation layer. If you have a service that wants data, it needs to work with a model or presenter/controller layer. On the web this can be a REST service, SOAP, something proprietary, etc. Ideally, the web site will be using this same source to get its data.

If the web application presents data via a web interface and doesn't offer a presentation/controller layer to allow you to access that same underlying API, then yes, you will have to scrape if you want that data for some reason. But I don't see this as wrong, you're doing something the owners of the data didn't intend for you to do. You'll have similar issues if you want to get data out of any application view (e.g. screen scraping a windows native app).

EDIT: Read the rest of your post and I see that you addressed much of this already. I still maintain that this is already how people are working who want others to use their data.


> I believe standard bodies should put their brains and efforts on defining API standards.

Have you seen SOAP?


He said put brains into it.

Not in a sense: "Throw your brains in for zombies to have a party". But in a sense that one should try and think and find protocols which are elegant in a sense that it makes reasoning about and using them easy and simple.

I agree that coming up with SOAP and XML-RPC took quite some brains and effort, too bad that some really good people had to be lobotomized for it.


Yes. I prefer something more lightweight. But SOAP is a protocol. What I propose are standards for common use cases that are not as general purpose.

Example, Facebook API allows querying, interacting with the social graph, profiles, photos, feeds, events, etc. These are use cases commonly used on photo apps, social networks, eventing, etc. But it's proprietary. Now imagine an open source standard similar to that but that can define such building blocks including other scenarios such as contacting a web site owner (about page, contact page), querying/posting articles to a web site, querying/doing transactions with products/services, etc. Once you go through all scenarios, then the problem that remains will be more about agents/authorities/reputation/security of allowing someone to interact with services. With better access for apps to interact directly with content by bypassing the current web presentation layer to avoid spam/fraud.


"HTML1.0 was a local optimum. Somewhere there is another, better optimum. The W3C appears to be willing to travel the Himalayas of suboptimal to find it."

I don't think they're even going in the general direction of better optimum.

Here is what I think that optimum should include:

- Better document model (vs no document model at all, which seems to be where it's all headed right now).

- Separation of content, layout and styling. Yes, into three parts, rather than the two we have right now.

- Partial caching and user-side includes that aren't a blob of ugly, shortsighted hacks. Something that's works with the document model, not against it. It's absolutely ridiculous that I have to write custom code to prevent the browser from re-downloading (and the server from re-generating) page headers so on.

- Significant improvement of forms. New UI elements, support for pure-HTML put/delete requests, different format for sending data that has structure (vs only key-value pairs).


Generally agreed, but would suggest a slightly different set of concerns separated.

instead of separation of content, layout, and styling, I would suggest a separation of data, structure, and styling. Data + structure gives you content, structure plus styling gives you layout.

So in this idea you might have RDF as data, an HTML template as structure, and CSS as styling. The browser would generate the HTML from the template and the RDF, and the CSS would then be used to lay it out and style it.


While I certainly agree that there is tons of room for improvement with HTML/CSS/JS, I get confused when people start discussing it in such hyperbolic terms. It's not that bad. Most data on the internet fits really nicely into the document metaphor.


Normally I immediately bridle when someone offers criticism without including proposals for improvement[1] but in this case it really is that bad. It really, really is.

And while I would be willing to agree provisionally that most data (by volume of unique URLS) on the web does fit the document metaphor, if page views is your metric I'm not at all convinced.

For example, is it logical to even attempt to reason about a Facebook wall containing recent updates from $n individuals in terms of authorship? Is this even relevant information given the entire contents of the page will have changed in 12 hours?

The document metaphor made perfect sense 15 years ago but it breaks down quickly in the face anything dynamic, as is evidenced by the need for any credible web developer to have a minimum of 7 largely unrelated technologies[2] committed to memory to do their job effectively.

Markup 15 layers deep? 2000+ lines of code to tell the browser how to render a website? Vendor-specific dynamic rendering engines to sidestep the limitations of native web languages? Surely this is not what success looks like?

[1] Unfortunately I have no idea how to fix this mess.

[2] HTML, CSS, JavaScript, a JS Framework (typically jQuery), at least one back-end language, SQL or similar and API stuff (SOAP, JSON, etc).


Hmmm. I appreciate the way you laid out your argument, I'm not sure I understand it entirely though. I agree that the concept of authorship is not very logical in the context of your example, but I don't know of any HTML spec which requires defining an author for each document? I think my point was that it's pretty easy to mark up data in a semantic-enough way using the current tools. A wall post doesn't need to be a document of its own, it can be an item in a list of wall posts that make up part of a bigger page. I agree that "document" is sort of a silly metaphor for that use case, but that doesn't make HTML any less useful. We could use XML, but that would basically be the same thing. We could use JSON, but again... it's just another way of drawing the same relationships. I suspect that I've completely missed your point though, in which case I apologize and please be patient with me!

With regard to the rest of your comment (7 unrelated technologies, deep layers of markup, huge numbers of LOC, etc) I agree, but as you said - how could it be fixed? The reality is that the web performs a complicated function. It would be nice to abstract the nuts and bolts behind it away cleanly, and I don't think it's unreasonable to believe that could happen in our lifetime, but it's also not unreasonable to expect that developing for a complicated platform will be complicated.


I think it is this unrelated technologies bit that is driving things like NoSQL.

Interestingly, with LedgerSMB, you have to know: HTML, CSS, Javascript, Tempalte Toolkit, Perl, SQL (including PL/PGSQL), LaTeX

That's only 7..... When we standardize on an AJAX API framework, I guess that will mean 8. However LaTeX is only required by some specialists (customizing printed check and PDF invoice templates), and a lot of the current approach is to hide the AJAX stuff inside TT widgets, meaning no more than 7 for most developers.

And since the LaTeX stuff is a specialty (customizing higher-end printed templates and printed checks), that leaves only 6 for most developers. And since the SQL stuff can be easily handed off to others in the community (because the db API is defined through SQL, mostly through a procedural interface), it means 5. The perl is thin glue, and probably should hardly count (unless you are engineering the framework). A few master them all. Most work with the framework we provide. So here you have to know 4 well to do basic customizations, but 7 well to do the most advanced.

Works pretty well, actually.


HTML markup is a good example of premature design optimization. It may have been a good in theory to separate content from presentation, but if you look at the web today, the way pages are generated is a huge mess.

Even this site, which uses tables for layout when "you're not supposed to used tables for layout", is a good example of why HTML is so bad for creating web pages. There's no reason I should have to jump through all the hoops I do to display two div blocks side by side in a horizontal box. All other XML based layouts I've used (Android and Flex) are a piece of cake compared to messing with HTML, CSS, and JavaScript.


I'm not sure it was particularly premature for the original use case, which was writing hypertext documents. HTML inherited the idea of structure/style separation (though initially implemented in a sort of half-assed way) from previous document markup languages, like SGML and LaTeX, where it had gotten a decent numbers of years of exercise, and worked fairly well. I mean it still works pretty well in LaTeX, though not perfectly.


You're exactly right... for the original purpose - hypertext documents - HTML and CSS work just fine. The problems arise from the fact that we've hijacked these documents to produce web applications which use the barest stub of a document to get running. And then we try to shoehorn a dynamic site into what was originally designed to be a static document. Dynamic content was originally the exception, not the norm.

Frames were actually a good effort to solve the problem of web-site "chrome" (navigation bars, headers, etc), and were somewhat useful. But they proved to not be flexible enough for designers (rightly so), and thus we have the mess that we have now.


This is a failing of CSS, specifically. CSS has extremely limited tools for building a page layout.

See http://www.w3.org/TR/css3-flexbox/ and http://dev.w3.org/csswg/css3-grid-align/


But it doesn't make sense to me- why does presentation need to be separated from content? On pretty much every single page on the internet, layout and presentation are so tightly coupled that it ends up being more of a hassle to maintain a separate style sheet than it is to declare stuff inline.


content and presentation should be separate so that:

- the browsers can cache presentation instead of loading it every time

- presentation only has to be defined once, instead of every time it's used

- presentation can change based on factors like screen size

- accessibility features, like high-contrast themes, are not possible with tightly coupled presentation logic

if your presentation and content are so tightly coupled that it's easier to inline everything, you're doing it wrong.


If you step back you'll realize these are all failures of HTML and the latency of networks that we have to work around, not requirements.

For caching, why do I have to reload a whole page just to change the article part? The HTML 'it's a document' obsession and terrible DOM model that took so long to be updated.

Why don't I define a high level presentation document and then sub-documents? Network latency and the total bizarreness of CSS selector priority.

Screen size. Why can't you dock elements on a page like form based programming language for the last 10/15 years? HTML doesn't have the ability.

And as for accessibility? If accessibility is taking hints from markup about how to present the page, in reality the mark up is tightly coupled to presentation anyway. Accessibility is tightly coupling the markup to the presentation, what it really means is screen readers can format the content nicely without the need for a style sheet.


I also have a few questions to ask which supports separation of content and presentation:

Have you ever worked on a project that used inline styles on almost ever paragraph or heading? If yes, you will instantly know much of a pain it is. If you've ever transferred said content to a totally different design you should be scarred by this experience.

Have you also tried adding a page to a website that uses a table-based layout (the prime example of presentational HTML)? When a modern design is fit onto one of these layouts, adding content becomes a really painful experience.

I can see first hand the benefits of good separation.


I can attest to this. As a client-side developer I can go into detail of how much pain is involved in changing the skin on a site when the original developers did not structure their HTML well, including inline styles all over the place. Plus in many cases it seemed the developers had no idea how HTML actually works creating pages that will never validate causing all kinds of weird side issues from browser to browser.

I have CSS selectors that go five to six levels deep because of tables contained in tables contained in tables with no classes or ids on any elements. Often times those tables in tables is totally unnecessary.

Div > span > table > tr > span > td > span > div

That's not the way to build a web page. My guess? They used Visual Studio for layout as if they were coding a desktop application.


Are you serious?

Don't get me wrong, there is a lot to hate about HTML and CSS. But unless you're working on a 5-page website, separating those two is a blessing. Do you not remember the nightmare of FONT tags?


> Do you not remember the nightmare of FONT tags?

Yes, but that's simply a failure from having no ability to define abstractions.

Separating form/design/layout from content is one possible abstraction boundary, but it's just one arbitrary one, so it's kind of strange that they tried to bake it into the platform instead of making it easier to define whatever abstractions are meaningful for you.


They deliberately don't let you define whatever abstractions are meaningful you, it's called the Principle of Least Power: http://www.w3.org/DesignIssues/Principles.html#PLP

Just explaining that what you consider a "strange" design choice is actually deliberate and carefully thought out, not defending it--for all I know, it might have been carefully reasoned out but the wrong conclusions reached--but the results stand for themselves: a platform that is now synonymous with "the Internet".


or mobile scaling. or changing the design. or changing a detail on EVERY PAGES of your websites (like the font).


> why does presentation need to be separated from content?

You can't think of any instance when you would want to access the content irrespective of the presentation?

Imagine if, instead of HTML and CSS, web pages were delivered as pre-rendered PNGs. Make a list of all the different things that would break. That's why presentation should be separated from content.


This is the exact problem cited in the original post. Why are we designing for hypothetical users when we have actual users to design for?


I'm not really following you. "Make a list of all the different things that would break" - these things are things that are in use today by real people. No hypothetical people involved.


One group that has always been chasing the "holy grail" of separation of content and presentation are big publishers. They want one content source that is controlled by "editors" and then the content can be rendered differently for the different avenues of publication. Maybe a transform of the content is sent to online databases like lexis/nexis or westlaw, or sent to printing press for a book, or sent to the web, or abstracts sent to a bibliography service, etc.

This is why SGML was big in the publishing industry before the web.

It's mostly achievable but there are clearly problem areas such as tables where sometimes the presentation is an integral part of the content.


I agree with you, but the main reason to do this is to keep it it DRY(Don't Repeat Yourself).

Templating systems help with this a lot, but they don't completely get you around being able to add the class 'round' to everything you want to have rounded corners. That's really why you keep things separate, so you can minting a single point of change.


HTML had the opportunity to get better with XHTML and strict enforcement of schemas and XML syntax (which I'm willing to bet Android and Flex require).

The problem was that doing so broke most of the web, or was not internalized by page creators. So we're stuck with the current situation (bad markup, crufty designs, etc.).

Imagine if the first HTML editors forced a schema check before save. I think we'd be in a much better place now if they did...


The myth that the web would be better with strict XML parsing is convincingly debunked here: http://web.archive.org/web/20110514122249/http://diveintomar...

Web pages (unlike Android layout files) are complicated composites of multiple data sources that are generated on-the-fly, and are combined in complicated (but sometimes low-tech) ways like string manipulation. Under such circumstances, it is simply too hard to create perfect XML every time.


That isn't really related to what he said at all. Sure web browsers should be forgiving in what they except, but dev tools shouldn't be. How would you like the C compiler that tried to interpret any old chicken scratch as a valid C program? Never a compiler error again!


> How would you like the C compiler that tried to interpret any old chicken scratch as a valid C program? Never a compiler error again!

Oh fun, bad analogies time! How would you like it if your word processor refused to save a text document because it detected an incomplete sentence?


Nope no analogy. If I expect a machine to understand a language, then it had better be able to determine weather or not a document written in that language is valid. It doesn't matter if it is a programming language or a mark up language. Permissive modes are great for accepting the work of others, but when learning how to write that language in the first place a strict interpretation is best. That way you can focus on getting it right.


C compilers do this all the time. If you're lucky, they'll warn you about undefined behavior as they do it, instead of just silently making all sorts of optimizations because the standard allows them to.

Now as it happens, C compilers have a syntax validator as part of them. Lots of HTML editors, past and present, have used HTML validators too...


What dev tools? Any text editor is a dev tool that can and is used to create HTML for browsers to (forgivingly) accept.

That's actually fundamental design aspect of HTML which was partly responsible for it's popularity.


Last I checked IE, Fire Fox and Chrome all had a dev mode. Sure they might not have had them at first but a strict mode or a parse check or some thing of that sort would have been helpful all along.


Of course, but there are way of combining data sources that will produce valid XML.


I would argue the contrary, that the very reason why HTML took off so fast was that any fool was able to craft a site, and it would work even if it had a few bugs in it. Failure tolerance is a great feature. Especially if you consider it in the context of document authoring - for which HTML was originally designed for - it's better to read a document that has one unclosed <b> tag in it, than to be completely unable to read it because there's a syntax error in markup.


But the problems sskates mentions would not be helped by better schema compliance; they start and end with the fact that CSS is a miserably poor layout engine that is not powerful enough to effectively separate content from presentation.

HTML itself has its warts, sure, but the fact that we end up constantly resorting to HTML and/or Javascript edits to do things that should be happening in CSS alone is, to me, a much worse problem.


Pages made by decent developers are not a mess. Certainly orders of magnitude less messy than handling all your presentation in code, mixed with business logic and controllers.

Separation of presentation and content allows for information to be accessible by everyone, indexed/readable by machines, adaptable to different displays. It's also easier to maintain (content editors can't mess with layouts) and makes many performance improvements possible.

The HN site has no special reason to use tables, it's perfectly possible to render this layout using semantic, clean markup. CSS has come a long way since 1999.


But nowhere near as easy or as cross browser compliant, especially at the time this code was written.


It may be true if HN code was written before 2001.


It's still true now; IE6 still has a decent number of users especially in the corporate world and no CSS solution is as simple and direct as tables even today.


Well, we are allowed to evolve.


Actualy is it so wrong to use table (like we're not supposed to) in a profesionnal css design? They do great in every browser from ie 6 to opera and they don't nead clear:both or other tricks.

I miss designing with tables... And I just don't do anymore because I was told not to, for maybe no good reasons.


There are perfectly good reasons. Read my comment above and the many answers here: http://stackoverflow.com/questions/83073/why-not-use-tables-...


Ceasing to adopt standards, best practices and support users who are disabled because "the old way seems to work better" is a terrible way to advance technology.


Well, I have done things with HTML which make folks cringe who don't understand why I am doing it.

Imagine a web page with a giant table, every other row of which contains another table, and where the entire page contains probably 20000 INPUT tags, half of which are potentially exposed to the user under one set of visibility rules or another.

Now if it wasn't a bulk payment interface for wiring money out to hundreds of clients, paying potentially up to 5000 invoices in a single run, it would be entirely insane. As it is, the insanity is mostly an issue of the fact we have to do a lot to handle the fact that we have to handle concurrency issues over application protocols like HTTP, which leads to fun stuff in the database.


Switching to a new inferior technology simply because it's hip is not a good way to evolve technology.


Using semantic HTML and CSS to design pages is certainly not a "hip" technology. It's use case and benefits, while not perfect[1] are well understood and proven. It's not a "broken" use case, like using tables for layout are.

[1] highly interactive web apps.


I also like how CSS has less layout functionality than tables, specifically <td width="*" valign="middle">.

Sure, you can emulate this with "display: table;" but a) it's not backwards compatible and b) that's the same darn thing as just using a table!


It's not "the same darn thing". The <table> tag is for tabular data, it implies relationships between it's elements that are of utter importance for screen readers and machines. It might be ok visually, but structurally it's a mess.

display:table is not an emulation, it's exactly the same behavior as a table, exposed for use at will. You can use the flexbox model for the same effect: http://jsbin.com/ejeraq

Be very glad that you have good vision and Google engineers spend millions and decades on making some sense of tag soup.


Your passion is commendable. I'll not go into the "a <li> is more semantic and accessible than a <td>" debate.

I'll just say that 1) display:table-cell doesn't work in IE6/7 and 2) CSS's limitations make it annoying or impossible to achieve functionality that was easy with table layouts (eg mixing percentage widths with fixed widths, and vertical aligment).


That you fail to mention, that table layouts make a lot more things annoying or impossible to achieve.


> The <table> tag is for tabular data, it implies relationships between it's elements that are of utter importance for screen readers and machines.

Nope. The <table> element was originally defined as an all-purpose chainsaw for two-dimensional relationships. The browser crackmonkeys locked us into that in the late '90s, and we have to deal with it forever. We are stuck with the legacy of billions of old documents, and thousands of old HTML parsers.

The W3C is pissing into the wind by trying to rewrite history. They cannot simply wish away technical lock-in by putting a "TABLE elements are semantic" clause in a standard. That needs to be repealed, and the we-must-ignore-history children must be given a spanking and sent to their rooms.

What the W3C should do is define a "tabulartext" attribute:

    <!-- A table of data to read -->
    <table tabulartext>
        <tr><th>Title</th> <th>Title</th></tr>
        <tr><td>Data</td> <td>Data</td></tr>
    </table>

    <!-- Structural markup -->
    <table>
      <tr>
        <td>{{left_sidebar}}</td>
        <td>{{content}}</td>
        <td>{{right_sidebar}}</td>
      </tr>
    </table>
If they did this, the screen readers and Alexa top 500 sites would STAMPEDE towards a bright new future of accessibility, machine readability, and backwards compatibility.


You're joking right?

The HTML table model allows authors to arrange data -- text, preformatted text, images, links, forms, form fields, other tables, etc. -- into rows and columns of cells. - HTML 4.01 specification, 1999

Tables were added in the HTML3 spec:

HTML 3.2 includes a widely deployed subset of the specification given in RFC 1942 and can be used to markup tabular material or for layout purposes. Note that the latter role typically causes problems when rending to speech or to text only user agents. - HTML3.2, 1997

The "for layout purposes" was a big mistake, they even acknowledged that it wasn't adequate. Fixed 2 years later, or 10 years ago. There is no reason to markup things as tables when you can have a much clearer document outline using the proper header and grouping elements.

HTML5 parsers have no problem keeping up with old tag soup, but that doesn't mean we should keep writing crap. Implying that sites built using only tables are accessible and machine readable is just.. asinine.


Many of us are developers on here, so I propose the following: rather than be lead around by the nose on this, why don't we change it? We could define our own XML (or JSON? optional?) based data markup format and a separate presentation markup (XML/JSON based though!) format for displaying it on a graphical interface. Then we could create a browser plug-in that used it, or fork a browser to understand how to render it (or both).

We could set up our web servers to detect if the client has the plug-in/browser and serve this new standard if so, or redirect to "legacy" web pages if not (with a little "best experienced with" badge). If the day comes that we aren't getting anymore "legacy" requests we can just drop them (or even better, just program the servers to be able to auto-generate something passable if a legacy client did happen by). If done right, this really could get traction, since most web designers would prefer a nice format to fighting their way through as one has to today, and clients shouldn't be able to tell any difference.

The web is ours. If the "standards bodies" are making crap standards we don't have to stand for it.


"I heard people speak of Web Authors and Web Developers and making various distinctions about them. I heard some folks of arguing that this audience of ours prefers markup over scripts, and when faced with concrete examples of the opposite, retort that those are just some script library folk, not the majority."

Get these people some personas, stat!


I find that personas are nearly useless for this kind of design discussion. Personas are an attempt to homogenize some group of people into a single pseudo-person that has specific attributes. When the lines between personas are hard to draw or there are so many lines that you wind up with lots of personas, you can find yourself spending lots of time managing personas without a ton of direction or solving anyone's problems.

For a heterogenous audience like "web makers" of various types, an activity centered approach can be really helpful. Don Norman explains this idea better than I ever could:

http://jnd.org/dn.mss/human-centered_design_considered_harmf...

http://www.jnd.org/dn.mss/hcd_harmful_a_clarification.html


Thanks, those links were well worth the read.


A markup language, by definition, cannot define behaviour. You can do design, layout and animations with markup, but you simply cannot do any kind of processing or behaviour (when I say behaviour, I mean things like authentication, processing, manipulating data, etc). So if your webapp needs to be behaviour-rich (which I don't know how you can call it a webapp if it isn't) then you're going to need a scripting language to define those behaviours.

HTML + CSS + JavaScript are complex tools, it would be great if we could unify them into a single language. I think that jQuery's success is in part due to this unification, but I certainly don't think jQuery is the answer.


Lispers have said a few times that HTML/Javascript would be well replaced by S-expresssions and (a) Lisp. There are more than a few reasons why it won't catch on, of course, but even to an old-school imperative programmer like me it sounds a damn sight better than what we have now.


I guess the real question is can you or how would you try to reform the culture that has surrounded the process? Or is the alternative branching out and creating your own like WHATWG?


The thing is, web was supposed to be about documents but it always wanted to be about programs. A basic markup language together with styling goes with the former but in the latter it becomes something you need to work around.


For me this just demonstrates W3C's distancing from reality... There where pretty good data-driven and thought-out decisions on HTML5 from the beginning, but things were pretty grim for this year. Parts of the spec are being just pushed around (microdata…), meanwhile there is no public place for developers to be heard (and no, mailing lists and bug trackers are not ideal. this is 2011).

Despite that, the web as a platform is getting better and better. Yes, it's hard to build stuff with HTML/CSS/JS, but it gets easier by the month, and the current state of the technology is really amazing (except for a few legacy browsers still bogging things down).


I like to think the whole problem with web development is that it is popular. Almost all of the exclusive web design/script people I have met are essentially terrible at what they do - I think this happened because the web is so popular and so new that the good engineers and designers are so few and far between that their direction is diluted by the masses. Hopefully it will right itself over time.

Ultimately the tools will mature to the point where what the data representation is beneath them will mean nothing to these kinds of consumers. (e.g. how you save your photos as .jpg, .tga or whatever and usually care less about the details).


Most sites use an HTML page to launch a JS-driven app because HTML as a presentation medium is not as responsive and lack the UX elegance that one can achieve with Javascript. It's simply a technology stack that works better at solving the job.

The main problem is that everyone is reinventing the wheel to deliver that kind of experience with custom JS code that augments the DOM in different ways. If browsers and the W3C worked out a way to deliver a similar UX over something standardized that browsers, crawlers and other consumers of the web could rely on, that would get used.


In this age of rich webapps, the markup is just there to launch javascript. There's nothing particularly wrong with that IMHO.


There's everything wrong with that. There are huge benefits to handling the internet as a linked graph of documents and resources. Try writing a site that loads all its content via Javascript and see how well you rank in Google.

I have nothing against the existence of web applications, as long as it's recognized that they have a fundamentally different nature and purpose than a web page. The W3C doesn't always make decisions that are right for everybody, but they've done a pretty decent job of preserving the ideal of a document/resource-centric web. And it's important that it stays that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: