public class EEBOPreTokenizer extends AbstractPreTokenizer implements PreTokenizer
| Modifier and Type | Field and Description |
|---|---|
protected static PatternReplacer |
doubleBackTicksReplacer
Double back-ticks.
|
protected static java.lang.String |
EEBOAlwaysSeparators
EEBO separators do not include the vertical bars.
|
protected static PatternReplacer |
singleBackTicksReplacer
Single back-tick followed by a capital letter.
|
protected static PatternReplacer |
wordOrSpanGapReplacer
Word or span gap.
|
alwaysSeparators, alwaysSeparatorsReplacer, asterisks, commaSeparator, commaSeparatorReplacer, hyphens, logger, periods| Constructor and Description |
|---|
EEBOPreTokenizer()
Create an EEBO pretokenizer.
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
pretokenize(java.lang.String line)
Prepare text for tokenization.
|
getLogger, setLoggercloseclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcloseprotected static final java.lang.String EEBOAlwaysSeparators
protected static final PatternReplacer wordOrSpanGapReplacer
protected static final PatternReplacer doubleBackTicksReplacer
protected static final PatternReplacer singleBackTicksReplacer
public java.lang.String pretokenize(java.lang.String line)
pretokenize in interface PreTokenizerpretokenize in class AbstractPreTokenizerline - The text to prepare for tokenization,