Package org.apache.tika.parser.html
Class BoilerpipeContentHandler
java.lang.Object
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
org.apache.tika.parser.html.BoilerpipeContentHandler
- All Implemented Interfaces:
ContentHandler
@Deprecated(since="2026-04-30")
public class BoilerpipeContentHandler
extends de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Uses the boilerpipe
library to automatically extract the main content from a web page.
Use this as a
ContentHandler object passed to
HtmlParser.parse(java.io.InputStream, ContentHandler, Metadata, org.apache.tika.parser.ParseContext)-
Constructor Summary
ConstructorsConstructorDescriptionBoilerpipeContentHandler(Writer writer) Deprecated.Creates a content handler that writes XHTML body character events to the given writer.BoilerpipeContentHandler(ContentHandler delegate) Deprecated.Creates a new boilerpipe-based content extractor, using theDefaultExtractorextraction rules and "delegate" as the content handler.BoilerpipeContentHandler(ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor) Deprecated.Creates a new boilerpipe-based content extractor, using the given extraction rules. -
Method Summary
Modifier and TypeMethodDescriptionvoidcharacters(char[] chars, int offset, int length) Deprecated.voidDeprecated.voidendElement(String uri, String localName, String qName) Deprecated.de.l3s.boilerpipe.document.TextDocumentDeprecated.Retrieves the built TextDocumentbooleanDeprecated.voidsetIncludeMarkup(boolean includeMarkup) Deprecated.voidDeprecated.voidstartElement(String uri, String localName, String qName, Attributes atts) Deprecated.voidstartPrefixMapping(String prefix, String uri) Deprecated.Methods inherited from class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
addWhitespaceIfNecessary, endPrefixMapping, getTitle, ignorableWhitespace, processingInstruction, recycle, setDocumentLocator, setTitle, skippedEntity, toTextDocumentMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Constructor Details
-
BoilerpipeContentHandler
Deprecated.Creates a new boilerpipe-based content extractor, using theDefaultExtractorextraction rules and "delegate" as the content handler.- Parameters:
delegate- TheContentHandlerobject
-
BoilerpipeContentHandler
Deprecated.Creates a content handler that writes XHTML body character events to the given writer.- Parameters:
writer- writer
-
BoilerpipeContentHandler
public BoilerpipeContentHandler(ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor) Deprecated.Creates a new boilerpipe-based content extractor, using the given extraction rules. The extracted main content will be passed to thecontent handler. - Parameters:
delegate- TheContentHandlerobjectextractor- Extraction rules to use, e.g.ArticleExtractor
-
-
Method Details
-
isIncludeMarkup
public boolean isIncludeMarkup()Deprecated. -
setIncludeMarkup
public void setIncludeMarkup(boolean includeMarkup) Deprecated. -
getTextDocument
public de.l3s.boilerpipe.document.TextDocument getTextDocument()Deprecated.Retrieves the built TextDocument- Returns:
- TextDocument
-
startDocument
Deprecated.- Specified by:
startDocumentin interfaceContentHandler- Overrides:
startDocumentin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-
startPrefixMapping
Deprecated.- Specified by:
startPrefixMappingin interfaceContentHandler- Overrides:
startPrefixMappingin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-
startElement
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException Deprecated.- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-
characters
Deprecated.- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-
endElement
Deprecated.- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-
endDocument
Deprecated.- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classde.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler- Throws:
SAXException
-