org.apache.jackrabbit.extractor
Class XMLTextExtractor
java.lang.Object
org.apache.jackrabbit.extractor.AbstractTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
- All Implemented Interfaces:
- TextExtractor
public class XMLTextExtractor
- extends AbstractTextExtractor
Text extractor for XML documents. This class extracts the text content
and attribute values from XML documents.
This class can handle any XML-based format
(application/xml+something), not just the base XML content
types reported by AbstractTextExtractor.getContentTypes(). However, it often makes
sense to use more specialized extractors that better understand the
specific content type.
|
Method Summary |
java.io.Reader |
extractText(java.io.InputStream stream,
java.lang.String type,
java.lang.String encoding)
Returns a reader for the text content of the given XML document. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XMLTextExtractor
public XMLTextExtractor()
- Creates a new
XMLTextExtractor instance.
extractText
public java.io.Reader extractText(java.io.InputStream stream,
java.lang.String type,
java.lang.String encoding)
throws java.io.IOException
- Returns a reader for the text content of the given XML document.
Returns an empty reader if the given encoding is not supported or
if the XML document could not be parsed.
- Parameters:
stream - XML documenttype - XML content typeencoding - character encoding, or null
- Returns:
- reader for the text content of the given XML document,
or an empty reader if the document could not be parsed
- Throws:
java.io.IOException - if the XML document stream can not be closed
Copyright © 2004-2011 The Apache Software Foundation. All Rights Reserved.