|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.norconex.importer.transformer.AbstractRestrictiveTransformer
com.norconex.importer.transformer.AbstractCharStreamTransformer
public abstract class AbstractCharStreamTransformer
Base class for transformers dealing with text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers.
For pre-parsing, non-text documents will simply be ignored and no
transformation will occur. To find out if a document is a text-one, the
metadata Importer.DOC_CONTENT_TYPE
value is used. By default
any content type starting with "text/" is considered text. This default
behavior can be changed with the setContentTypeRegex(String)
method.
One must make sure to only match text documents to parsing exceptions.
For post-parsing, all documents are assumed to be text.
Sub-classes can restrict to which document to apply this transformation
based on document metadata (see AbstractRestrictiveTransformer
).
Subclasses implementing IXMLConfigurable
should allow this inner
configuration:
<contentTypeRegex> (regex to identify text content-types, overridding default) </contentTypeRegex> <restrictTo caseSensitive="[false|true]" > property="(name of header/metadata name to match)" (regular expression of value to match) </restrictTo>
Constructor Summary | |
---|---|
AbstractCharStreamTransformer()
|
Method Summary | |
---|---|
boolean |
equals(Object obj)
|
String |
getContentTypeRegex()
|
int |
hashCode()
|
protected void |
loadFromXML(XMLConfiguration xml)
Convenience method for subclasses to load content type regex. |
protected void |
saveToXML(XMLStreamWriter writer)
Convenience method for subclasses to save content type regex. |
void |
setContentTypeRegex(String contentTypeRegex)
|
String |
toString()
|
protected void |
transformRestrictedDocument(String reference,
InputStream input,
OutputStream output,
Properties metadata,
boolean parsed)
|
protected abstract void |
transformTextDocument(String reference,
Reader input,
Writer output,
Properties metadata,
boolean parsed)
|
Methods inherited from class com.norconex.importer.transformer.AbstractRestrictiveTransformer |
---|
setRestriction, transformDocument |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public AbstractCharStreamTransformer()
Method Detail |
---|
public String getContentTypeRegex()
public void setContentTypeRegex(String contentTypeRegex)
protected final void transformRestrictedDocument(String reference, InputStream input, OutputStream output, Properties metadata, boolean parsed) throws IOException
transformRestrictedDocument
in class AbstractRestrictiveTransformer
IOException
protected abstract void transformTextDocument(String reference, Reader input, Writer output, Properties metadata, boolean parsed) throws IOException
IOException
protected void loadFromXML(XMLConfiguration xml)
loadFromXML
in class AbstractRestrictiveTransformer
xml
- xml configurationprotected void saveToXML(XMLStreamWriter writer) throws XMLStreamException
saveToXML
in class AbstractRestrictiveTransformer
writer
- XML writer
XMLStreamException
- problem savingpublic String toString()
toString
in class AbstractRestrictiveTransformer
public int hashCode()
hashCode
in class AbstractRestrictiveTransformer
public boolean equals(Object obj)
equals
in class AbstractRestrictiveTransformer
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |