com.norconex.importer.parser
Interface IDocumentParser
- All Superinterfaces:
- Serializable
- All Known Implementing Classes:
- AbstractTikaParser, FallbackParser, HTMLParser, PDFParser
public interface IDocumentParser
- extends Serializable
Implementations are responsible for parsing a document (InputStream) to
extract its text and metadata.
- Author:
- Pascal Essiembre
RDF_BASE_URI
static final String RDF_BASE_URI
- See Also:
- Constant Field Values
RDF_SUBJECT_CONTENT
static final String RDF_SUBJECT_CONTENT
- See Also:
- Constant Field Values
parseDocument
void parseDocument(InputStream inputStream,
ContentType contentType,
Writer outputStream,
Properties metadata)
throws DocumentParserException
- Parses a document.
- Parameters:
inputStream
- the document to parsecontentType
- the content type of the documentoutputStream
- where to save the extracted textmetadata
- where to store the metadata
- Throws:
DocumentParserException
Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.