|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jasen.core.engine.JasenMap
This class represents the core data for the chi probability calculations used by jASEN.
During training, the engine will tokenize each spam or ham email into meaningful String tokens. Each token is then classified and a record added (or updated) in the JasenMap.
Once training is complete, the map holds all the pertinent information about all tokens (words) ever seen by the engine. This information is then accessed by the engine during a scan.
NOTE:
For performance reasons the data contained in a map is loaded into memory by the engine. Obviously
this means the bigger the data file (token map), the more memory is required to operate the engine.
Accurate message scanning can only be achieved with a well populated map, ideally generated from a wide variety of spam/ham emails. It is recommended that the training corpus for each type of email be not less than 1000 emails, and preferrably not less than 5000
Field Summary | |
static int |
HAM
|
static int |
SPAM
|
Constructor Summary | |
JasenMap()
|
Method Summary | |
void |
addToken(String key,
int type)
Adds a token to the map. |
int |
getHamObservations()
Get the number of times a ham message (observation) has been encountered |
int |
getSpamObservations()
Gets the number of spam observations. |
JasenToken |
getToken(String key)
Gets the token associated with the key. |
Map |
getTokens()
Gets reference to the entire map of tokens. |
int |
getTotalObservations()
Gets the total number of observations in the map, both spam and ham |
void |
incrementObservations(int type)
Called each time a new observation is made |
Iterator |
iterator()
Returns an iterator for the token keys. |
void |
setHamObservations(int hamObservations)
Sets the number of ham observations |
void |
setSpamObservations(int spamObservations)
Sets the number of spam observations |
void |
setTokens(Map tokens)
Sets the token map |
int |
size()
Gets the number of tokens in the map |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int HAM
public static final int SPAM
Constructor Detail |
public JasenMap()
Method Detail |
public void addToken(String key, int type)
If the token already exists, the relevant counter is simply incremented
key
- The word tokentype
- One of JasenMap.HAM or JasenMap.SPAMpublic JasenToken getToken(String key)
key
- The key, or word, to which this token corresponds
public void incrementObservations(int type)
type
- One of JasenMap.HAM or JasenMap.SPAMpublic int getTotalObservations()
public int getHamObservations()
public void setHamObservations(int hamObservations)
hamObservations
- The number of ham observations in the mappublic int getSpamObservations()
public void setSpamObservations(int spamObservations)
spamObservations
- The number of ham observations in the mappublic Map getTokens()
JasenToken
public void setTokens(Map tokens)
tokens
- The token map to set.public Iterator iterator()
Actual tokens must still be obtained via the getToken method
getToken(String)
public int size()
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |