QueryPath and HTML5

QueryPath enables you to easily work with HTML and XML in PHP. It’s similar to working with jQuery but in PHP. The similarity is intentional.

Working with HTML5 can have some hiccups because QueryPath uses the parser built into PHP. This parser, provided by libxml, works with HTML 4 and xhtml but isn’t designed for the new features in HTML5. This is where the HTML5 parser and writer can be used instead.

Get Them With Composer

The easiest way to get both of these projects is to use Composer and include them in your composer.json. Then Composer can get the details from Packagist and install them.

"require" : {
 	"masterminds/html5": "1.*",
	"querypath/QueryPath": "3.*"
}

Using Them Together

When you want to parse HTML5 and navigate it, the first step is to turn the HTML5 into a DOM.

// Load the html content into a DOM.
$dom = \HTML5::loadHTMLFile('path/to/file.html');

Then you can pass the DOM into QueryPath.

$qp = htmlqp($dom);

Now QueryPath can be used to query objects just as you’d normally expect.

When you want to get HTML5 back out you can again use the HTML5 writer instead of QueryPath which would use the HTML 4 functionality built into PHP.

This example shows both querying a document and outputting nodes as HTML5.

$qp->find('track')->each(function($i, $node) {
  print \HTML5::saveHTML($node) . PHP_EOL;
});

The Future

It would be fantastic if these two libraries worked out of the box together. There’s an issue and a branch of work that’s been started to make this happen. While there has not been movement on this in a little while due to some other things, contributions are welcome and it should happen in due time.