public class C99
extends java.lang.Object
Use of this code is free for academic, education, research and other non-profit making uses only.
| Modifier and Type | Class and Description |
|---|---|
protected static class |
C99.Region
Text segment region.
|
| Constructor and Description |
|---|
C99() |
| Modifier and Type | Method and Description |
|---|---|
protected static int[] |
boundaries(double[][] m,
int n)
Find density maximizing boundaries for regions in a similarity matrix.
|
protected static ContextVector[] |
normalize(java.lang.String[][] document,
ContextVector tf,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document.
|
protected static ContextVector[] |
normalize(java.lang.String[][] document,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document.
|
protected static double[][] |
rank(double[][] f,
int maskSize)
Apply hard ranking to matrix using a mask.
|
static java.lang.String[][][] |
segment(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments.
|
static java.lang.String[][][] |
segmentW(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments.
|
protected static double[][] |
similarity(ContextVector[] v)
Given context vectors, compute the similarity matrix.
|
protected static double[][] |
similarity(ContextVector[] v,
EntropyVector entropy)
Given context vectors, compute the similarity matrix.
|
protected static java.lang.String[][][] |
split(java.lang.String[][] text,
int[] boundaries)
Split text into segment blocks given topic boundaries.
|
protected static double[][] |
sum(double[][] rankMatrix)
Compute sum of rank matrix.
|
protected static int[] boundaries(double[][] m,
int n)
m - Similarity matrix.n - Number of regions to find.
If n = 1, the algorithm will determine the number of
regions.protected static ContextVector[] normalize(java.lang.String[][] document, StopWords stopWords, Stemmer stemmer)
document - Tokenized document.stopWords - Stop words.stemmer - Stemmer.protected static ContextVector[] normalize(java.lang.String[][] document, ContextVector tf, StopWords stopWords, Stemmer stemmer)
document - Tokenized document.tf - Term frequencies in document.stopWords - Stop words.stemmer - Stemmer.protected static double[][] rank(double[][] f,
int maskSize)
f - Matrix to which to apply hard ranking.maskSize - Mask size.
Hard ranking replaces a pixel value with the proportion of neighboring values it exceeds, using a maskSize x maskSize size mask.
public static java.lang.String[][][] segment(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
document - Document text as list of elementary
text blocks.n - Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s - Size of ranking mask.
Must be odd number >= 3.stopWords - Stop words.stemmer - Stemmer.public static java.lang.String[][][] segmentW(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
document - Document text as list of elementary
text blocks.n - Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s - Size of ranking mask.
Must be odd number >= 3.stopWords - Stop words.stemmer - Stemmer.protected static double[][] similarity(ContextVector[] v)
v - context vectors.protected static double[][] similarity(ContextVector[] v, EntropyVector entropy)
v - context vectors.entropy - entropy vector.protected static java.lang.String[][][] split(java.lang.String[][] text,
int[] boundaries)
text - Source text.boundaries - Boundaries.protected static double[][] sum(double[][] rankMatrix)
rankMatrix - Rank matrix.