public abstract class AbstractPartOfSpeechTagger extends IsCloseableObject implements PartOfSpeechTagger, IsCloseable, UsesLexicon, UsesLogger
Provides default implementations for all of the PartOfSpeech interface methods. To create a new PartOfSpeech tagger, extend this class and override methods as needed. You must override the tagSentence method as a minimum.
| Modifier and Type | Field and Description |
|---|---|
protected java.lang.String[] |
contextRules
Context rules.
|
protected ContextualSmoother |
contextualSmoother
Contextual smoother.
|
protected Lexicon |
dynamicLexicon
Dynamic lexicon built on-the-fly for words not in static lexicon.
|
protected java.lang.String[] |
lexicalRules
Lexical rules.
|
protected LexicalSmoother |
lexicalSmoother
Lexical smoother.
|
protected Lexicon |
lexicon
Static lexicon used by tagger.
|
protected Logger |
logger
Logger used for output.
|
protected PartOfSpeechGuesser |
partOfSpeechGuesser
Part of speech guesser for words not in lexicon.
|
protected PostTokenizer |
postTokenizer
PostTokenizer for mapping raw tokens to initial spellings.
|
protected PartOfSpeechRetagger |
retagger
Fixup retagger.
|
protected int |
ruleCorrections
Number of corrections applied by rules.
|
protected TransitionMatrix |
transitionMatrix
Transition matrix used by tagger.
|
| Constructor and Description |
|---|
AbstractPartOfSpeechTagger()
Create tagger.
|
| Modifier and Type | Method and Description |
|---|---|
void |
clearRuleCorrections()
Clear count of successful rule applications.
|
protected void |
createPartOfSpeechGuesser()
Create a part of speech guesser.
|
ContextualSmoother |
getContextualSmoother()
Get the contextual smoother.
|
Lexicon |
getDynamicLexicon()
Get the dynamic word lexicon.
|
LexicalSmoother |
getLexicalSmoother()
Get the lexical smoother.
|
Lexicon |
getLexicon()
Get the static word lexicon.
|
Lexicon |
getLexicon(java.lang.String word)
Get the lexicon associated with a specific word.
|
Logger |
getLogger()
Get the logger.
|
java.lang.String |
getMostCommonTag(java.lang.String word)
Get the most common tag for a word.
|
PartOfSpeechGuesser |
getPartOfSpeechGuesser()
Get part of speech guesser.
|
PostTokenizer |
getPostTokenizer()
Get the postTokenizer.
|
PartOfSpeechRetagger |
getRetagger()
Get part of speech retagger.
|
int |
getRuleCorrections()
Get count of successful rule applications.
|
int |
getTagCount(java.lang.String word,
java.lang.String tag)
Get count of times a word appears with a given tag.
|
java.util.List<java.lang.String> |
getTagsForWord(java.lang.String word)
Get potential part of speech tags for a word.
|
TransitionMatrix |
getTransitionMatrix()
Get tag transition probabilities matrix.
|
void |
incrementRuleCorrections()
Increment count of successful rule applications.
|
<T extends AdornedWord> |
retagWords(java.util.List<T> taggedSentence)
Retag words in a tagged sentence.
|
void |
setContextRules(java.lang.String[] contextRules)
Set context rules for tagging.
|
void |
setContextualSmoother(ContextualSmoother contextualSmoother)
Set the contextual smoother.
|
void |
setLexicalRules(java.lang.String[] lexicalRules)
Set lexical rules for tagging.
|
void |
setLexicalSmoother(LexicalSmoother lexicalSmoother)
Set the lexical smoother.
|
void |
setLexicon(Lexicon lexicon)
Set the lexicon.
|
void |
setLogger(Logger logger)
Set the logger.
|
void |
setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
Set part of speech guesser.
|
void |
setPostTokenizer(PostTokenizer postTokenizer)
Set the postTokenizer.
|
void |
setRetagger(PartOfSpeechRetagger retagger)
Set part of speech retagger.
|
void |
setTransitionMatrix(TransitionMatrix transitionMatrix)
Set tag transition probabilities matrix.
|
abstract <T extends AdornedWord> |
tagAdornedWordList(java.util.List<T> sentence)
Tag a list of adorned words.
|
<T extends AdornedWord> |
tagAdornedWordSentence(java.util.List<T> sentence,
java.util.Set<java.lang.String> regIDSet)
Tag a sentence of adorned words.
|
<T extends AdornedWord> |
tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences,
java.util.Set<java.lang.String> regIDSet)
Tag a list of sentences.
|
java.util.List<AdornedWord> |
tagSentence(java.util.List<java.lang.String> sentence)
Tag a sentence.
|
java.util.List<java.util.List<AdornedWord>> |
tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.
|
boolean |
usesContextRules()
See if tagger uses context rules.
|
boolean |
usesLexicalRules()
See if tagger uses lexical rules.
|
boolean |
usesTransitionProbabilities()
See if tagger uses a probability transition matrix.
|
closeclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcloseprotected Lexicon lexicon
protected Lexicon dynamicLexicon
protected TransitionMatrix transitionMatrix
protected java.lang.String[] contextRules
protected java.lang.String[] lexicalRules
protected LexicalSmoother lexicalSmoother
protected ContextualSmoother contextualSmoother
protected PartOfSpeechRetagger retagger
protected PartOfSpeechGuesser partOfSpeechGuesser
protected PostTokenizer postTokenizer
protected int ruleCorrections
protected Logger logger
public Logger getLogger()
getLogger in interface UsesLoggerpublic void setLogger(Logger logger)
setLogger in interface UsesLoggerlogger - The logger.public boolean usesContextRules()
usesContextRules in interface PartOfSpeechTaggerpublic boolean usesLexicalRules()
usesLexicalRules in interface PartOfSpeechTaggerpublic boolean usesTransitionProbabilities()
usesTransitionProbabilities in interface PartOfSpeechTaggerpublic void setContextRules(java.lang.String[] contextRules)
throws InvalidRuleException
setContextRules in interface PartOfSpeechTaggercontextRules - String array of context rules.InvalidRuleException - if a rule is bad.
For taggers which do not use context rules, this is a no-op.
public void setLexicalRules(java.lang.String[] lexicalRules)
throws InvalidRuleException
setLexicalRules in interface PartOfSpeechTaggerlexicalRules - String array of lexical rules.InvalidRuleException - if a rule is bad.
For taggers which do not use lexical rules, this is a no-op.
public Lexicon getLexicon()
getLexicon in interface UsesLexicongetLexicon in interface PartOfSpeechTaggerpublic Lexicon getDynamicLexicon()
public Lexicon getLexicon(java.lang.String word)
getLexicon in interface PartOfSpeechTaggerword - The word whose source lexicon is sought.Most words do not have a source lexicon defined, in which case they come from the main static word lexicon. Usually only words derived by a suffix analysis have a source lexicon defined, which will of course be the suffix lexicon.
public void setLexicon(Lexicon lexicon)
setLexicon in interface UsesLexiconsetLexicon in interface PartOfSpeechTaggerlexicon - Lexicon used for tagging.public TransitionMatrix getTransitionMatrix()
getTransitionMatrix in interface PartOfSpeechTaggerpublic void setTransitionMatrix(TransitionMatrix transitionMatrix)
setTransitionMatrix in interface PartOfSpeechTaggertransitionMatrix - Tag probabilities transition matrix.
For taggers which do not use transition matrices, this is a no-op.
public PartOfSpeechGuesser getPartOfSpeechGuesser()
getPartOfSpeechGuesser in interface PartOfSpeechTaggerpublic void setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
setPartOfSpeechGuesser in interface PartOfSpeechTaggerpartOfSpeechGuesser - The part of speech guesser.public PartOfSpeechRetagger getRetagger()
getRetagger in interface PartOfSpeechTaggerpublic void setRetagger(PartOfSpeechRetagger retagger)
setRetagger in interface PartOfSpeechTaggerretagger - The part of speech retagger.public PostTokenizer getPostTokenizer()
getPostTokenizer in interface PartOfSpeechTaggerpublic void setPostTokenizer(PostTokenizer postTokenizer)
setPostTokenizer in interface PartOfSpeechTaggerpostTokenizer - The postTokenizer.public ContextualSmoother getContextualSmoother()
getContextualSmoother in interface PartOfSpeechTaggerpublic void setContextualSmoother(ContextualSmoother contextualSmoother)
setContextualSmoother in interface PartOfSpeechTaggercontextualSmoother - The contextual smoother.public LexicalSmoother getLexicalSmoother()
getLexicalSmoother in interface PartOfSpeechTaggerpublic void setLexicalSmoother(LexicalSmoother lexicalSmoother)
setLexicalSmoother in interface PartOfSpeechTaggerlexicalSmoother - The lexical smoother.public java.util.List<java.lang.String> getTagsForWord(java.lang.String word)
getTagsForWord in interface PartOfSpeechTaggerword - The word whose part of speech tags we want.When the word does not appear in the lexicon, the part of speech guesser is used to determine the tags based upon features of the word (suffix analysis, etc.).
public int getTagCount(java.lang.String word,
java.lang.String tag)
getTagCount in interface PartOfSpeechTaggerword - The word.tag - The part of speech tag.When the word does not appear in the lexicon, the part of speech guesser is used to compute a count based upon features of the word (suffix analysis, etc.).
public java.lang.String getMostCommonTag(java.lang.String word)
word - The word.public java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
tagSentences in interface PartOfSpeechTaggersentences - The list of sentences.
The sentences are a List of
Lists of words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWords.
public <T extends AdornedWord> java.util.List<java.util.List<T>> tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences, java.util.Set<java.lang.String> regIDSet)
tagAdornedWordSentences in interface PartOfSpeechTaggersentences - The list of sentences.regIDSet - Set of word IDs of words requiring special handling.
The sentences are a List of
Lists of adorned words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWords.
public <T extends AdornedWord> java.util.List<T> retagWords(java.util.List<T> taggedSentence)
retagWords in interface PartOfSpeechTaggertaggedSentence - The tagged sentence.This method calls the retagger, if any. If no retagger is defined, the input tagged sentence is returned unchanged. Override this method to add custom retagging without the use of a retagger.
public void clearRuleCorrections()
clearRuleCorrections in interface PartOfSpeechTaggerpublic void incrementRuleCorrections()
incrementRuleCorrections in interface PartOfSpeechTaggerpublic int getRuleCorrections()
getRuleCorrections in interface PartOfSpeechTaggerprotected void createPartOfSpeechGuesser()
public java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
tagSentence in interface PartOfSpeechTaggersentence - The sentence as a list of string words.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
public <T extends AdornedWord> java.util.List<T> tagAdornedWordSentence(java.util.List<T> sentence, java.util.Set<java.lang.String> regIDSet)
tagAdornedWordSentence in interface PartOfSpeechTaggersentence - The sentence as a list of adorned words.regIDSet - Set of word IDs of words requiring special handling.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List of
adorned words to be tagged. The output is
the same list with spellings, parts of speech, etc. added/modified.
public abstract <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> sentence)
tagAdornedWordList in interface PartOfSpeechTaggersentence - The sentence as an
AdornedWord.