com.norconex.importer.parser.impl
Class AbstractTikaParser
java.lang.Object
com.norconex.importer.parser.impl.AbstractTikaParser
- All Implemented Interfaces:
- IDocumentParser, Serializable
- Direct Known Subclasses:
- FallbackParser, HTMLParser, PDFParser
public class AbstractTikaParser
- extends Object
- implements IDocumentParser
Base class wrapping Apache Tika parser for use by the importer.
- Author:
- Pascal Essiembre
- See Also:
- Serialized Form
Constructor Summary |
AbstractTikaParser(org.apache.tika.parser.Parser parser,
String format)
Creates a new Tika-based parser. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AbstractTikaParser
public AbstractTikaParser(org.apache.tika.parser.Parser parser,
String format)
- Creates a new Tika-based parser.
- Parameters:
parser
- Tika parserformat
- one of Tika parser supported format
parseDocument
public final void parseDocument(InputStream inputStream,
ContentType contentType,
Writer output,
Properties metadata)
throws DocumentParserException
- Description copied from interface:
IDocumentParser
- Parses a document.
- Specified by:
parseDocument
in interface IDocumentParser
- Parameters:
inputStream
- the document to parsecontentType
- the content type of the documentoutput
- where to save the extracted textmetadata
- where to store the metadata
- Throws:
DocumentParserException
addTikaMetadata
protected void addTikaMetadata(org.apache.tika.metadata.Metadata tikaMeta,
Properties metadata)
Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.