public interface SentenceSplitter
| Modifier and Type | Method and Description |
|---|---|
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text)
Break text into sentences and tokens.
|
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text,
WordTokenizer tokenizer)
Break text into sentences and tokens.
|
int[] |
findSentenceOffsets(java.lang.String text,
java.util.List<java.util.List<java.lang.String>> sentences)
Find starting offsets of sentences extracted from a text.
|
void |
setAbbreviations(Abbreviations abbreviations)
Set abbreviations.
|
void |
setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
Set part of speech guesser.
|
void |
setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
Set sentence splitter iterator.
|
void setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
partOfSpeechGuesser - Part of speech guesser.
A sentence splitter may use part of speech information to disambiguate end-of-sentence boundary conditions. The part of speech guesser provides access to the lexicons and guessing algorithms for determining the possible parts of speech for a word without performing a full part of speech tagging operation.
void setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
sentenceSplitterIterator - Sentence splitter iterator.void setAbbreviations(Abbreviations abbreviations)
abbreviations - Abbreviations.java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text,
WordTokenizer tokenizer)
text - Text to break into sentences and tokens.tokenizer - Tokenizer to use for breaking sentences
into words.Word tokens may be words, numbers, punctuation, etc.
java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text)
text - Text to break into sentences and tokens.Word tokens may be words, numbers, punctuation, etc. The default word tokenizer is used.
int[] findSentenceOffsets(java.lang.String text,
java.util.List<java.util.List<java.lang.String>> sentences)
text - Text from which sentences were
extracted.sentences - List of sentences (each a list of
words) extracted from text.
N.B. If the sentences aren't from
the specified text, the resulting
offsets will be meaningless.