Class SafeContentHandler

All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler
Direct Known Subclasses:
XHTMLContentHandler, XMPContentHandler

@Deprecated(since="2026-04-30") public class SafeContentHandler extends ContentHandlerDecorator
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Content handler decorator that makes sure that the character events (characters(char[], int, int) or ignorableWhitespace(char[], int, int)) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding the writeReplacement(Output) method).

The XML standard defines the following Unicode character ranges as valid XML characters:

  #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
  

Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.