§HTML parsing to obtain the document tree
–Java
swing parser, W3C HTML 3.2 DTD
§Extract leaf <table>
elements
§Pseudo-rendering to obtain accurate row/column
counts
–Only
considering <tr>, <td>, <th> not sufficient
–Other
tags: <rowspan>, <colspan>, <br>
–
•