public class SimpleRuleBasedTagger extends UnigramTagger implements PartOfSpeechTagger, PartOfSpeechRetagger
The simple rule-based part of speech tagger assigns the most commonly occurring part of speech to all words and then applies a small set of contextual rules to "fix up" the tagging. It's kind of a "Brill light."
This simple tagger is useful when very fast tagging without high accuracy is useful, e.g., in sentence splitting.
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix| Constructor and Description |
|---|
SimpleRuleBasedTagger()
Create a simple rule-based tagger.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
getCanAddOrDeleteWords()
Can retagger add or delete words in the original sentence?
|
<T extends AdornedWord> |
retagSentence(java.util.List<T> sentence)
Retag a sentence.
|
<T extends AdornedWord> |
retagWords(java.util.List<T> taggedSentence)
Retag words in a tagged sentence.
|
void |
setCanAddOrDeleteWords(boolean canAddOrDeleteWords)
Can retagger add or delete words in the original sentence?
|
java.util.List<AdornedWord> |
tagSentence(java.util.List<java.lang.String> sentence)
Tag a sentence.
|
java.lang.String |
toString()
Return tagger description.
|
tagAdornedWordList, tagWord, tagWordclearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentences, usesContextRules, usesLexicalRules, usesTransitionProbabilitiescloseclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitclearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentences, usesContextRules, usesLexicalRules, usesTransitionProbabilitiesclosepublic SimpleRuleBasedTagger()
public java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
tagSentence in interface PartOfSpeechTaggertagSentence in class AbstractPartOfSpeechTaggersentence - The sentence as a list of string words.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
public <T extends AdornedWord> java.util.List<T> retagWords(java.util.List<T> taggedSentence)
retagWords in interface PartOfSpeechTaggerretagWords in class AbstractPartOfSpeechTaggertaggedSentence - The tagged sentence.This method applies the short list of fixup rules. The resultant tagging is crude but good enough for tasks like sentence boundary detection.
public <T extends AdornedWord> java.util.List<T> retagSentence(java.util.List<T> sentence)
retagSentence in interface PartOfSpeechRetaggersentence - The sentence as an
AdornedWord .public boolean getCanAddOrDeleteWords()
getCanAddOrDeleteWords in interface PartOfSpeechRetaggerpublic void setCanAddOrDeleteWords(boolean canAddOrDeleteWords)
setCanAddOrDeleteWords in interface PartOfSpeechRetaggercanAddOrDeleteWords - true if retagger can add or
delete words.
Ignored here.
public java.lang.String toString()
toString in class UnigramTagger