|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.jasen.core.calculators.ChiSquaredCalculator
Performs all the chi probability calculations required by jASEN.
This is the core calculation class which ultimately determines the spam score for a message.
Most of the methods herein are a direct port from the Python implementation published by Gary Robinson.
| Constructor Summary | |
ChiSquaredCalculator()
|
|
| Method Summary | |
double |
calculateChi(double[] fws)
Calculates the chi distribution of the word probabilities. |
double |
calculateH(double[] probs)
Calculates the probability, as a value between 0.0 and 1.0, that the tokens provided indicate a HAM message |
double |
calculateInverseChiSquare(double fChi,
int n)
Calculates the inverse chi square for the given chi distribution. |
double |
calculateReverseChi(double[] fws)
Does the same as calculateChi, but does so on 1 - f(w). |
double |
calculateS(double[] probs)
Calculates the probability, as a value between 0.0 and 1.0, that the tokens provided indicate a SPAM message |
double[] |
calculateWordProbabilities(String[] words,
JasenMap map)
Calculates the probability of each word indicating spam. |
double |
confirmHypothesis(String[] words,
JasenMap map)
Confirms or rejects the null hypothesis that the message words indicate spam. |
static void |
main(String[] args)
Test harness only |
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public ChiSquaredCalculator()
| Method Detail |
public double confirmHypothesis(String[] words,
JasenMap map)
Specifically, this is defined as
I = H / (H + S)Where:
words - The word tokens extracted from the messagemap - The token map
public double calculateH(double[] probs)
probs - The word probabilities computed by calculateWordProbabilities
calculateWordProbabilities(String[], JasenMap)public double calculateS(double[] probs)
probs - The word probabilities computed by calculateWordProbabilities
calculateWordProbabilities(String[], JasenMap)
public double[] calculateWordProbabilities(String[] words,
JasenMap map)
Specifically, this method uses the following approach from Gary Robinson:
b(w) = (the number of spam e-mails containing the word w) / (the total number of spam e-mails)
g(w) = (the number of ham e-mails containing the word w) / (the total number of ham e-mails)
p(w) = b(w) / (b(w) + g(w))
Then we calculate:
f(w) = ((s * x) + (m * p(w)) / (s + m)Where:
words - The set of words for which the probabilities will be calculatedmap - The map of word probabilities
public double calculateChi(double[] fws)
This is defined as:
-2 ln ∏f(w)
Where
fws -
public double calculateReverseChi(double[] fws)
-2 ln ∏(1 - f(w))
fws -
calculateChi(double[])
public double calculateInverseChiSquare(double fChi,
int n)
Again taken from Gary Robinsons writings, this is defined as:
H = C-1( -2 ln ( ∏ f(w) )y, 2ny)Where:
fChi - The chi distribution calculated from calculateChi()n - The number of tokens
public static void main(String[] args)
args -
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||