public class SingleDocumentExtraction extends Object
| Constructor and Description |
|---|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source,
extractors factory and output triple handler.
|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorGroup extractors,
TripleHandler output)
Builds an extractor by the specification of document source,
list of extractors and output triple handler.
|
SingleDocumentExtraction(DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source,
extractors factory and output triple handler, using the
DefaultConfiguration. |
| Modifier and Type | Method and Description |
|---|---|
String |
getDetectedMIMEType()
Returns the detected mimetype for the given
DocumentSource. |
List<Extractor> |
getMatchingExtractors() |
String |
getParserEncoding() |
boolean |
hasMatchingExtractors()
Check whether the given
DocumentSource content activates of not at least an extractor. |
SingleDocumentExtractionReport |
run()
Triggers the execution of all the
Extractor
registered to this class using the default extraction parameters. |
SingleDocumentExtractionReport |
run(ExtractionParameters extractionParameters)
Triggers the execution of all the
Extractor
registered to this class using the specified extraction parameters. |
void |
setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy,
if
null the MemCopyFactory will be used. |
void |
setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector,
if
null mimetype detection will
be skipped and all extractors will be activated. |
void |
setParserEncoding(String encoding)
Sets the document parser encoding.
|
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
configuration - configuration applied during extraction.in - input document source.extractors - list of extractors to be applied.output - output triple handler.public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
configuration - configuration applied during extraction.in - input document source.factory - the extractors factory.output - output triple handler.public SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
DefaultConfiguration.in - input document source.factory - the extractors factory.output - output triple handler.public void setLocalCopyFactory(LocalCopyFactory copyFactory)
null the MemCopyFactory will be used.copyFactory - local copy factory.DocumentSourcepublic void setMIMETypeDetector(MIMETypeDetector detector)
null mimetype detection will
be skipped and all extractors will be activated.detector - detector instance.public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters) throws ExtractionException, IOException
Extractor
registered to this class using the specified extraction parameters.extractionParameters - the parameters applied to the run execution.ExtractionException - if an error occurred during the data extraction.IOException - if an error occurred during the data access.public SingleDocumentExtractionReport run() throws IOException, ExtractionException
Extractor
registered to this class using the default extraction parameters.IOExceptionExtractionExceptionpublic String getDetectedMIMEType() throws IOException
DocumentSource.IOException - if an error occurred while accessing the data.public boolean hasMatchingExtractors()
throws IOException
DocumentSource content activates of not at least an extractor.true if at least an extractor is activated, false otherwise.IOExceptionpublic List<Extractor> getMatchingExtractors()
DocumentSource.public String getParserEncoding()
public void setParserEncoding(String encoding)
encoding - parser encoding.Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.