Package org.apache.tika.detect
Class TextDetector
java.lang.Object
org.apache.tika.detect.TextDetector
- All Implemented Interfaces:
Serializable,Detector
Deprecated.
This version of the Apache Tika library is deprecated. Use your own version of Apache Tika.
Content type detection of plain text documents. This detector looks at the
beginning of the document input stream and considers the document to be
a text document if no ASCII (ISO-Latin-1, UTF-8, etc.) control bytes are
found. As a special case some control bytes (up to 2% of all characters)
are also allowed in a text document if it also contains no or just a few
(less than 10%) characters above the 7-bit ASCII range.
Note that text documents with a character encoding like UTF-16 are better
detected with MagicDetector and an appropriate magic byte pattern.
- Since:
- Apache Tika 0.3
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDeprecated.Constructs aTextDetectorwhich will look at the default number of bytes from the beginning of the document.TextDetector(int bytesToTest) Deprecated.Constructs aTextDetectorwhich will look at a given number of bytes from the beginning of the document. -
Method Summary
Modifier and TypeMethodDescriptiondetect(InputStream input, Metadata metadata) Deprecated.Looks at the beginning of the document input stream to determine whether the document is text or not.
-
Constructor Details
-
TextDetector
public TextDetector()Deprecated.Constructs aTextDetectorwhich will look at the default number of bytes from the beginning of the document. -
TextDetector
public TextDetector(int bytesToTest) Deprecated.Constructs aTextDetectorwhich will look at a given number of bytes from the beginning of the document.
-
-
Method Details
-
detect
Deprecated.Looks at the beginning of the document input stream to determine whether the document is text or not.- Specified by:
detectin interfaceDetector- Parameters:
input- document input stream, ornullmetadata- ignored- Returns:
- "text/plain" if the input stream suggest a text document, "application/octet-stream" otherwise
- Throws:
IOException- if the document input stream could not be read
-