Package org.apache.tika.parser.external
Class ExternalParser
java.lang.Object
org.apache.tika.parser.AbstractParser
org.apache.tika.parser.external.ExternalParser
- All Implemented Interfaces:
Serializable,Parser
- Direct Known Subclasses:
TensorflowImageRecParser
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Parser that uses an external program (like catdoc or pdf2txt) to extract
text content and metadata from a given document.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceDeprecated.This version of the Apache Tika library is deprecated. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic booleanDeprecated.static booleanDeprecated.Checks to see if the command can be run.String[]Deprecated.Deprecated.Gets lines consumerDeprecated.Deprecated.getSupportedTypes(ParseContext context) Deprecated.Returns the set of media types supported by this parser when used with the given parse context.voidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) Deprecated.Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler.voidsetCommand(String... command) Deprecated.Sets the command to be run.voidsetIgnoredLineConsumer(ExternalParser.LineConsumer ignoredLineConsumer) Deprecated.Set a consumer for the lines ignored by the parse functionsvoidsetMetadataExtractionPatterns(Map<Pattern, String> patterns) Deprecated.Sets the map of regular expression patterns and Metadata keys.voidsetSupportedTypes(Set<MediaType> supportedTypes) Deprecated.Methods inherited from class org.apache.tika.parser.AbstractParser
parse
-
Field Details
-
INPUT_FILE_TOKEN
Deprecated.The token, which if present in the Command string, will be replaced with the input filename. Alternately, the input data can be streamed over STDIN.- See Also:
-
OUTPUT_FILE_TOKEN
Deprecated.The token, which if present in the Command string, will be replaced with the output filename. Alternately, the output data can be collected on STDOUT.- See Also:
-
-
Constructor Details
-
ExternalParser
public ExternalParser()Deprecated.
-
-
Method Details
-
getSupportedTypes
Deprecated.Description copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Parameters:
context- parse context- Returns:
- immutable set of media types
-
getSupportedTypes
Deprecated. -
setSupportedTypes
Deprecated. -
getCommand
Deprecated. -
setCommand
Deprecated.Sets the command to be run. This can include either ofINPUT_FILE_TOKENorOUTPUT_FILE_TOKENif the command needs filenames.- See Also:
-
getIgnoredLineConsumer
Deprecated.Gets lines consumer- Returns:
- consumer instance
-
setIgnoredLineConsumer
Deprecated.Set a consumer for the lines ignored by the parse functions- Parameters:
ignoredLineConsumer- consumer instance
-
getMetadataExtractionPatterns
Deprecated. -
setMetadataExtractionPatterns
Deprecated.Sets the map of regular expression patterns and Metadata keys. Any matching patterns will have the matching metadata entries set. Set this to null to disable Metadata extraction. -
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Deprecated.Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted ifsetMetadataExtractionPatterns(Map)has been called to set patterns.- Parameters:
stream- the document stream (input)handler- handler for the XHTML SAX events (output)metadata- document metadata (input and output)context- parse context- Throws:
IOException- if the document stream could not be readSAXException- if the SAX events could not be processedTikaException- if the document could not be parsed
-
check
Deprecated.Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.- Parameters:
checkCmd- The check command to runerrorValue- What is considered an error value?
-
check
Deprecated.
-