Interface HTMLParser
public interface HTMLParser
A front end to a DOM parser that can handle HTML.
- Since:
- 1.5.2
- Author:
- Russell Gold, Bernhard Wagner
-
Method Summary
Modifier and TypeMethodDescriptiongetCleanedText(String string) Removes any string artifacts placed in the text by the parser.voidparse(URL baseURL, String pageText, DocumentAdapter adapter) Parses the specified text string as a Document, registering it in the HTMLPage.booleanReturns true if this parser supports forcing the upper/lower case of tag and attribute names.booleanReturns true if this parser can display parser warnings.booleanReturns true if this parser supports preservation of the case of tag and attribute names.booleanReturns true if this parser can return an HTMLDocument object.
-
Method Details
-
parse
Parses the specified text string as a Document, registering it in the HTMLPage. Any error reporting will be annotated with the specified URL.- Throws:
IOExceptionSAXException
-
getCleanedText
-
supportsPreserveTagCase
boolean supportsPreserveTagCase()Returns true if this parser supports preservation of the case of tag and attribute names. -
supportsForceTagCase
boolean supportsForceTagCase()Returns true if this parser supports forcing the upper/lower case of tag and attribute names. -
supportsReturnHTMLDocument
boolean supportsReturnHTMLDocument()Returns true if this parser can return an HTMLDocument object. -
supportsParserWarnings
boolean supportsParserWarnings()Returns true if this parser can display parser warnings.
-