I was looking for an HTML Parser that parses XHTML. I tested many parser like: TagSoup, Apache Sax, Jericho Parser, etc… But i had the following problems:
- My XHTML file was not well formatted.
- The majority of parser are a little complicated api, because are more oriented to well formed documents.
So, I found HTML Parser. This one was very easy to use. Here is an example: Let’s have some html called test.html like this:
<html> <head></head> <body> <img> <h1>Hello World</h1> </body> </html>
You want to get the tag <h1> that has “Hello Word”, note that the tag <img> is incomplete. Let’s code. 🙂
Parser parser = new Parser("test.html"); NodeList list = parser.parse(new HasChildFilter(new StringFilter("Hello World"))); System.out.println("tag founded = " + list.elementAt(0).toHtml());
The previous code finds the tag <h1> and prints:
tag founded = <h1>Hello World</h1>
Here is the explanation:
Parser class, this class is in charge to load the html file.
parser.parse(), is the method that get all nodes or filter the nodes you want to use.
Well this is it for now :).