Class BoilerpipeContentHandler

java.lang.Object
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
org.apache.tika.parser.html.BoilerpipeContentHandler
All Implemented Interfaces:
ContentHandler

@Deprecated(since="2026-04-30") public class BoilerpipeContentHandler extends de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Uses the boilerpipe library to automatically extract the main content from a web page.

Use this as a ContentHandler object passed to HtmlParser.parse(java.io.InputStream, ContentHandler, Metadata, org.apache.tika.parser.ParseContext)

  • Constructor Details

    • BoilerpipeContentHandler

      public BoilerpipeContentHandler(ContentHandler delegate)
      Deprecated.
      Creates a new boilerpipe-based content extractor, using the DefaultExtractor extraction rules and "delegate" as the content handler.
      Parameters:
      delegate - The ContentHandler object
    • BoilerpipeContentHandler

      public BoilerpipeContentHandler(Writer writer)
      Deprecated.
      Creates a content handler that writes XHTML body character events to the given writer.
      Parameters:
      writer - writer
    • BoilerpipeContentHandler

      public BoilerpipeContentHandler(ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor)
      Deprecated.
      Creates a new boilerpipe-based content extractor, using the given extraction rules. The extracted main content will be passed to the content handler.
      Parameters:
      delegate - The ContentHandler object
      extractor - Extraction rules to use, e.g. ArticleExtractor
  • Method Details

    • isIncludeMarkup

      public boolean isIncludeMarkup()
      Deprecated.
    • setIncludeMarkup

      public void setIncludeMarkup(boolean includeMarkup)
      Deprecated.
    • getTextDocument

      public de.l3s.boilerpipe.document.TextDocument getTextDocument()
      Deprecated.
      Retrieves the built TextDocument
      Returns:
      TextDocument
    • startDocument

      public void startDocument() throws SAXException
      Deprecated.
      Specified by:
      startDocument in interface ContentHandler
      Overrides:
      startDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException
    • startPrefixMapping

      public void startPrefixMapping(String prefix, String uri) throws SAXException
      Deprecated.
      Specified by:
      startPrefixMapping in interface ContentHandler
      Overrides:
      startPrefixMapping in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException
    • startElement

      public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
      Deprecated.
      Specified by:
      startElement in interface ContentHandler
      Overrides:
      startElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException
    • characters

      public void characters(char[] chars, int offset, int length) throws SAXException
      Deprecated.
      Specified by:
      characters in interface ContentHandler
      Overrides:
      characters in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException
    • endElement

      public void endElement(String uri, String localName, String qName) throws SAXException
      Deprecated.
      Specified by:
      endElement in interface ContentHandler
      Overrides:
      endElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException
    • endDocument

      public void endDocument() throws SAXException
      Deprecated.
      Specified by:
      endDocument in interface ContentHandler
      Overrides:
      endDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
      Throws:
      SAXException