Different HTML formats

Hi,

I noticed that if I load a HTML file with load_html/3, it has a different format than the one produced by html//1, or by the Markdown pack, for that matter. Why is that? I did not see any obvious explanations (although I might have overlooked it). I assume the html//1 one is the newer one, although I am kind of partial to the one used by load_html/3 myself…

Also, is there a way to convert between these two formats?

Thanks,

–Hans Nowak

It is a bit complicated story. load_html/3 is indeed the oldest part of the family. It produces an unambiguous representation of the HTML document for machine processing (e.g., xpath/3). html//1 works the other way around, but is intended to deal with human input. That it is why it is compact and provides many shorthands. htlm//1 does however also accept the output of load_html/3, so you can inject parsed documents easily. The output of htlm//1 is a token list that is handed to print_html/1 to produce the final HTML document.

So, you can use them in a circle: html//1 --> print_html/1 --> load_html/3 --> html//1, …