public class EEBOPostTokenizer extends AbstractPostTokenizer implements PostTokenizer
This post tokenizer processes tokens extracted from EEBO corpus texts. It removes soft hyphens and regularizes some EEBO specific tagging. This can be used for either original format EEBO texts or EEBO texts in TEIAnalytics format.
logger| Constructor and Description |
|---|
EEBOPostTokenizer()
Create an EEBO PostTokenizer.
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String[] |
postTokenize(java.lang.String token)
Process a token after tokenization.
|
getLogger, setLoggercloseclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitclosepublic java.lang.String[] postTokenize(java.lang.String token)
postTokenize in interface PostTokenizerpostTokenize in class AbstractPostTokenizertoken - The token to process after tokenization.The minimally processed token is typically results in an original spelling.
The maximally processed token typically results in a partially or completely standardized spelling.
These may be identical.