Class ExternalParser

java.lang.Object
org.apache.tika.parser.AbstractParser
org.apache.tika.parser.external.ExternalParser
All Implemented Interfaces:
Serializable, Parser
Direct Known Subclasses:
TensorflowImageRecParser

@Deprecated(since="2026-04-30") public class ExternalParser extends AbstractParser
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Parser that uses an external program (like catdoc or pdf2txt) to extract text content and metadata from a given document.
See Also:
  • Field Details

    • INPUT_FILE_TOKEN

      public static final String INPUT_FILE_TOKEN
      Deprecated.
      The token, which if present in the Command string, will be replaced with the input filename. Alternately, the input data can be streamed over STDIN.
      See Also:
    • OUTPUT_FILE_TOKEN

      public static final String OUTPUT_FILE_TOKEN
      Deprecated.
      The token, which if present in the Command string, will be replaced with the output filename. Alternately, the output data can be collected on STDOUT.
      See Also:
  • Constructor Details

    • ExternalParser

      public ExternalParser()
      Deprecated.
  • Method Details

    • getSupportedTypes

      public Set<MediaType> getSupportedTypes(ParseContext context)
      Deprecated.
      Description copied from interface: Parser
      Returns the set of media types supported by this parser when used with the given parse context.
      Parameters:
      context - parse context
      Returns:
      immutable set of media types
    • getSupportedTypes

      public Set<MediaType> getSupportedTypes()
      Deprecated.
    • setSupportedTypes

      public void setSupportedTypes(Set<MediaType> supportedTypes)
      Deprecated.
    • getCommand

      public String[] getCommand()
      Deprecated.
    • setCommand

      public void setCommand(String... command)
      Deprecated.
      Sets the command to be run. This can include either of INPUT_FILE_TOKEN or OUTPUT_FILE_TOKEN if the command needs filenames.
      See Also:
    • getIgnoredLineConsumer

      public ExternalParser.LineConsumer getIgnoredLineConsumer()
      Deprecated.
      Gets lines consumer
      Returns:
      consumer instance
    • setIgnoredLineConsumer

      public void setIgnoredLineConsumer(ExternalParser.LineConsumer ignoredLineConsumer)
      Deprecated.
      Set a consumer for the lines ignored by the parse functions
      Parameters:
      ignoredLineConsumer - consumer instance
    • getMetadataExtractionPatterns

      public Map<Pattern,String> getMetadataExtractionPatterns()
      Deprecated.
    • setMetadataExtractionPatterns

      public void setMetadataExtractionPatterns(Map<Pattern,String> patterns)
      Deprecated.
      Sets the map of regular expression patterns and Metadata keys. Any matching patterns will have the matching metadata entries set. Set this to null to disable Metadata extraction.
    • parse

      public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
      Deprecated.
      Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted if setMetadataExtractionPatterns(Map) has been called to set patterns.
      Parameters:
      stream - the document stream (input)
      handler - handler for the XHTML SAX events (output)
      metadata - document metadata (input and output)
      context - parse context
      Throws:
      IOException - if the document stream could not be read
      SAXException - if the SAX events could not be processed
      TikaException - if the document could not be parsed
    • check

      public static boolean check(String checkCmd, int... errorValue)
      Deprecated.
      Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.
      Parameters:
      checkCmd - The check command to run
      errorValue - What is considered an error value?
    • check

      public static boolean check(String[] checkCmd, int... errorValue)
      Deprecated.