public class RegexpTagger extends UnigramTagger implements PartOfSpeechTagger, CanTagOneWord
The regular expression part of speech tagger uses a regular expressions to assign a part of speech tag to a spelling.
| Modifier and Type | Field and Description |
|---|---|
protected java.util.regex.Matcher[] |
regexpMatchers |
protected java.util.regex.Pattern[] |
regexpPatterns
Parts of speech for each lexical rule.
|
protected java.lang.String[] |
regexpTags |
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix| Constructor and Description |
|---|
RegexpTagger()
Create a suffix tagger.
|
| Modifier and Type | Method and Description |
|---|---|
void |
setLexicalRules(java.lang.String[] lexicalRules)
Set lexical rules for tagging.
|
java.lang.String |
tagWord(java.lang.String word)
Tag a single word.
|
java.lang.String |
toString()
Return tagger description.
|
boolean |
usesLexicalRules()
See if tagger uses lexical rules.
|
tagAdornedWordList, tagWordclearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilitiescloseclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitclearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilitiestagWordcloseprotected java.util.regex.Pattern[] regexpPatterns
protected java.util.regex.Matcher[] regexpMatchers
protected java.lang.String[] regexpTags
public boolean usesLexicalRules()
usesLexicalRules in interface PartOfSpeechTaggerusesLexicalRules in class AbstractPartOfSpeechTaggerpublic void setLexicalRules(java.lang.String[] lexicalRules)
throws InvalidRuleException
setLexicalRules in interface PartOfSpeechTaggersetLexicalRules in class AbstractPartOfSpeechTaggerlexicalRules - String array of lexical rules.InvalidRuleException - if a rule is bad.
For the regular expression tagger, each rule takes the form:
regular-expression \t part-of-speech-tag
where "regular expression" is the regular expression and "part-of-speech-tag" is the part of speech tag to assign to a spelling matched by the regular expression. An ascii tab character (\t) separates the pattern from the tag.
public java.lang.String tagWord(java.lang.String word)
tagWord in interface CanTagOneWordtagWord in class UnigramTaggerword - The word.Applies each of the regular expressions stored in the lexical rules lexicon and returns the tag of associated with the first matching regular expression.
public java.lang.String toString()
toString in class UnigramTagger