public class OrigFixer
extends java.lang.Object
<orig> tags are used in Wright archive documents to mark words split across a page break. A typical page break appears as follows:
<orig reg="larboard" TEIform="orig">lar-</orig>
<pb TEIform="pb"/> board hand,
The "reg=" attribute of the <orig> tag provides the original unsplit spelling of the word which is split across the page boundary. The whitespace following the </orig> and preceding the <pb>, as well as the whitespace following the </pb>, causes MorphAdorner to process the split word incorrectly as multiple words instead of a single word.
OrigFixer modifies the XML for <orig> tags as follows.
For example, OrigFixer modifies the sample text above to read:
<orig reg="larboard" TEIform="orig">lar?</orig><pb TEIform="pb"/>board hand,
These modifications allow MorphAdorner to process the split word correctly. The MorphAdorner tokenizers recognize the special substitute hyphen character, which is restored to a plain hyphen character by the XML output writers.
| Modifier and Type | Class and Description |
|---|---|
static class |
OrigFixer.OrigProcessor
JDOM element processor which fixes
|
| Modifier | Constructor and Description |
|---|---|
protected |
OrigFixer()
Allow overrides but no instantiation.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
fixOrigs(org.jdom2.Document document)
Fix
|