| Package | Description |
|---|---|
| org.apache.any23.extractor |
This package contains classes and interfaces modeling the
Extractor API. |
| org.apache.any23.extractor.html |
All the various
Extractor needed to distill RDF
from Microformats in HTML pages are contained in this package. |
| Class and Description |
|---|
| MicroformatExtractor
The abstract base class for any
Microformat specification extractor.
|
| Class and Description |
|---|
| AdrExtractor
Extractor for the adr
microformat.
|
| DocumentReport
Represents the validationReportBuilder generated by a
the
TagSoupParser when a document
is retrieved and validated. |
| EntityBasedMicroformatExtractor
Base class for microformat extractors based on entities.
|
| GeoExtractor
Extractor for the Geo
microformat.
|
| HCalendarExtractor
Extractor for the hCalendar
microformat.
|
| HCardExtractor
Extractor for the hCard
microformat.
|
| HeadLinkExtractor
This
Extractor.TagSoupDOMExtractor implementation
retrieves the LINKs declared within the HTML/HEAD page header. |
| HListingExtractor
Extractor for the hListing
microformat.
|
| HRecipeExtractor
Extractor for the hRecipe
microformat.
|
| HResumeExtractor
Extractor for the hResume
microformat.
|
| HReviewAggregateExtractor
Extractor for the hReview-aggregate
microformat.
|
| HReviewExtractor
Extractor for the hReview
microformat.
|
| HTMLDocument
A wrapper around the DOM representation of an HTML document.
|
| HTMLDocument.TextField
This class represents a text extracted from the HTML DOM related
to the node from which such test has been retrieved.
|
| HTMLMetaExtractor
This extractor represents the HTML META tag values
according the HTML4 specification.
|
| ICBMExtractor
Extractor for "ICBM coordinates" provided as META headers in the head
of an HTML page.
|
| LicenseExtractor
Extractor for the rel-license
microformat.
|
| MicroformatExtractor
The abstract base class for any
Microformat specification extractor.
|
| SpeciesExtractor
Extractor able to extract the Species Microformat.
|
| TitleExtractor
Extracts the value of the <title> element of an
HTML or XHTML page.
|
| TurtleHTMLExtractor
Extractor for Turtle/N3 format embedded within HTML
script tags.
|
| XFNExtractor
Extractor for the XFN
microformat.
|
Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.