com.norconex.importer.parser
Class DefaultDocumentParserFactory

java.lang.Object
  extended by com.norconex.importer.parser.DefaultDocumentParserFactory
All Implemented Interfaces:
IXMLConfigurable, IDocumentParserFactory, Serializable

public class DefaultDocumentParserFactory
extends Object
implements IDocumentParserFactory, IXMLConfigurable

Uses Apacke Tika for all its supported content types. For unknown content types, falls back to Tika generic media detector/parser.

XML configuration usage (not required since default):

  <documentParserFactory class="com.norconex.importer.parser.DefaultDocumentParserFactory" format="text|xml" />
 

Author:
Pascal Essiembre
See Also:
Serialized Form

Field Summary
static String DEFAULT_FORMAT
           
 
Constructor Summary
DefaultDocumentParserFactory()
          Creates a new document parser factory of "text" format.
DefaultDocumentParserFactory(String format)
          Creates a new document parser factory of the given format.
 
Method Summary
protected  IDocumentParser getFallbackParser()
           
 String getFormat()
           
 IDocumentParser getParser(String documentReference, ContentType contentType)
          Gets a parser based on content type, regardless of document reference (ignoring it).
 void loadFromXML(Reader in)
           
protected  void registerFallbackParser(IDocumentParser parser)
           
protected  void registerNamedParser(ContentType contentType, IDocumentParser parser)
           
 void saveToXML(Writer out)
           
 void setFormat(String format)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_FORMAT

public static final String DEFAULT_FORMAT
See Also:
Constant Field Values
Constructor Detail

DefaultDocumentParserFactory

public DefaultDocumentParserFactory()
Creates a new document parser factory of "text" format.


DefaultDocumentParserFactory

public DefaultDocumentParserFactory(String format)
Creates a new document parser factory of the given format.

Parameters:
format - dependent on parser expectations but typically, one of "text" or "xml"
Method Detail

getParser

public final IDocumentParser getParser(String documentReference,
                                       ContentType contentType)
Gets a parser based on content type, regardless of document reference (ignoring it).

Specified by:
getParser in interface IDocumentParserFactory
Parameters:
documentReference - document reference
contentType - content type
Returns:
document parser

getFormat

public String getFormat()

setFormat

public void setFormat(String format)

registerNamedParser

protected final void registerNamedParser(ContentType contentType,
                                         IDocumentParser parser)

registerFallbackParser

protected final void registerFallbackParser(IDocumentParser parser)

getFallbackParser

protected final IDocumentParser getFallbackParser()

loadFromXML

public void loadFromXML(Reader in)
                 throws IOException
Specified by:
loadFromXML in interface IXMLConfigurable
Throws:
IOException

saveToXML

public void saveToXML(Writer out)
               throws IOException
Specified by:
saveToXML in interface IXMLConfigurable
Throws:
IOException


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.