The PHP HTML5 parser is nearly feature complete and we are nearing release of a beta where the focus shifts to stability rather than features. I say nearly feature complete because we are looking for the PHP community to suggest any needed features we have not already thought of.
The current functionality is broken down into two categories.
The Helper Functions
The library was written in a manner to allow use of the low level functionality in independent ways. But, most of the time we want to just parse and write html5. To do that there are a series of help static methods.
// Get a DOM (\DOMDocument) from a string of a full html document.
$dom = \HTML5::loadHTML($html);
// Get a DOM (\DOMDocument) from a file name or resource for a
// full html document.
$dom = \HTML5::loadHTMLFile($file);
// Get a DOM Fragment (\DOMDocumentFragment) for a html fragment.
$dom = \HTML5::loadHTMLFragment($htmlFragment);
// Turn a DOM document or fragment into html5.
$htmlString = \HTML5::saveHTML($dom);
// Save a DOM to a file as HTML5.
\HTML5::save($dom, $file);
The Parser and Serializer (Writer)
The parser and serializer are made up of numerous parts that can be used independently from the help functions. For example, if you want to have minified output there is an output rules engine that can be replaced with your own.
If you are interested in the architecture of these systems you can read about the parser architecture or the serializer architecture.
Speak Now
At this point in the development cycle we are looking for feedback on features. Is there something we missed that’s needed for a html5 parser and serializer? If you think so please file an issue.