com.norconex.importer.parser
Interface IDocumentParser

All Superinterfaces:
Serializable
All Known Implementing Classes:
AbstractTikaParser, FallbackParser, HTMLParser, PDFParser

public interface IDocumentParser
extends Serializable

Implementations are responsible for parsing a document (InputStream) to extract its text and metadata.

Author:
Pascal Essiembre

Field Summary
static String RDF_BASE_URI
           
static String RDF_SUBJECT_CONTENT
           
 
Method Summary
 void parseDocument(InputStream inputStream, ContentType contentType, Writer outputStream, Properties metadata)
          Parses a document.
 

Field Detail

RDF_BASE_URI

static final String RDF_BASE_URI
See Also:
Constant Field Values

RDF_SUBJECT_CONTENT

static final String RDF_SUBJECT_CONTENT
See Also:
Constant Field Values
Method Detail

parseDocument

void parseDocument(InputStream inputStream,
                   ContentType contentType,
                   Writer outputStream,
                   Properties metadata)
                   throws DocumentParserException
Parses a document.

Parameters:
inputStream - the document to parse
contentType - the content type of the document
outputStream - where to save the extracted text
metadata - where to store the metadata
Throws:
DocumentParserException


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.