Class OOXMLWordAndPowerPointTextHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler
@Deprecated(since="2026-04-30")
public class OOXMLWordAndPowerPointTextHandler
extends DefaultHandler
Deprecated.
This class is intended to handle anything that might contain IBodyElements:
main document, headers, footers, notes, slides, etc.
This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements
This does not work with .xlsx or .vsdx.
TODO: move this into POI?
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumDeprecated.static interfaceDeprecated. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionOOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String, String> hyperlinks) Deprecated.OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String, String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns) Deprecated. -
Method Summary
Modifier and TypeMethodDescriptionvoidcharacters(char[] ch, int start, int length) Deprecated.voidDeprecated.voidendElement(String uri, String localName, String qName) Deprecated.voidendPrefixMapping(String prefix) Deprecated.voidignorableWhitespace(char[] ch, int start, int length) Deprecated.voidDeprecated.voidstartElement(String uri, String localName, String qName, Attributes atts) Deprecated.voidstartPrefixMapping(String prefix, String uri) Deprecated.Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warningMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Field Details
-
W_NS
Deprecated.- See Also:
-
-
Constructor Details
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String, String> hyperlinks) Deprecated. -
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String, String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns) Deprecated.
-
-
Method Details
-
startDocument
Deprecated.- Specified by:
startDocumentin interfaceContentHandler- Overrides:
startDocumentin classDefaultHandler- Throws:
SAXException
-
endDocument
Deprecated.- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classDefaultHandler- Throws:
SAXException
-
startPrefixMapping
Deprecated.- Specified by:
startPrefixMappingin interfaceContentHandler- Overrides:
startPrefixMappingin classDefaultHandler- Throws:
SAXException
-
endPrefixMapping
Deprecated.- Specified by:
endPrefixMappingin interfaceContentHandler- Overrides:
endPrefixMappingin classDefaultHandler- Throws:
SAXException
-
startElement
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException Deprecated.- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classDefaultHandler- Throws:
SAXException
-
endElement
Deprecated.- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classDefaultHandler- Throws:
SAXException
-
characters
Deprecated.- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classDefaultHandler- Throws:
SAXException
-
ignorableWhitespace
Deprecated.- Specified by:
ignorableWhitespacein interfaceContentHandler- Overrides:
ignorableWhitespacein classDefaultHandler- Throws:
SAXException
-