public class BigramTagger extends AbstractPartOfSpeechTagger implements PartOfSpeechTagger
The bigram part of speech tagger assigns tags to words in a sentence assigning the most probable set of tags as determined by a bigram hidden Markov model given the possible tags of the previous words. The Viterbi algorithm is used to reduce the amount of computation required to find the optimal tag assignments.
| Modifier and Type | Field and Description |
|---|---|
protected int |
beamSearchRejections
Total number of states rejected by beam search criterion.
|
protected Map2D<java.lang.String,java.lang.String,Probability> |
contextualProbabilities
Contextual probabilities for a word in a sentence.
|
protected boolean |
debug
True for debug output.
|
protected Viterbi |
viterbi
Viterbi trellis for tags and probability scores.
|
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix| Constructor and Description |
|---|
BigramTagger()
Create a bigram tagger.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.util.List<java.lang.String> |
processWord(int wordIndex,
java.lang.String word,
java.util.List<java.lang.String> previousTags,
java.util.List<java.lang.String> tags)
Process a single word.
|
void |
setLogger(Logger logger)
Set the logger.
|
<T extends AdornedWord> |
tagAdornedWordList(java.util.List<T> taggedSentence)
Tag a sentence.
|
java.util.List<java.util.List<AdornedWord>> |
tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.
|
java.lang.String |
toString()
Return tagger description.
|
boolean |
usesTransitionProbabilities()
See if tagger uses a probability transition matrix.
|
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, usesContextRules, usesLexicalRulescloseclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitclearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, usesContextRules, usesLexicalRulescloseprotected boolean debug
protected Map2D<java.lang.String,java.lang.String,Probability> contextualProbabilities
protected int beamSearchRejections
protected Viterbi viterbi
public boolean usesTransitionProbabilities()
usesTransitionProbabilities in interface PartOfSpeechTaggerusesTransitionProbabilities in class AbstractPartOfSpeechTaggerpublic java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
tagSentences in interface PartOfSpeechTaggertagSentences in class AbstractPartOfSpeechTaggersentences - The list of sentences.
The sentences are a List of
Lists of words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWords.
public <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> taggedSentence)
tagAdornedWordList in interface PartOfSpeechTaggertagAdornedWordList in class AbstractPartOfSpeechTaggertaggedSentence - The sentence as an
AdornedWord.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
protected java.util.List<java.lang.String> processWord(int wordIndex,
java.lang.String word,
java.util.List<java.lang.String> previousTags,
java.util.List<java.lang.String> tags)
wordIndex - Index of word in sentence (starts at 0).word - Word being processed.previousTags - The previous word's tags.tags - The current word's tags.public void setLogger(Logger logger)
setLogger in interface UsesLoggersetLogger in class AbstractPartOfSpeechTaggerlogger - The logger.public java.lang.String toString()
toString in class java.lang.Object