org.jasen.core.engine
Class JasenMap

java.lang.Object
  extended byorg.jasen.core.engine.JasenMap
All Implemented Interfaces:
Serializable

public class JasenMap
extends Object
implements Serializable

This class represents the core data for the chi probability calculations used by jASEN.

During training, the engine will tokenize each spam or ham email into meaningful String tokens. Each token is then classified and a record added (or updated) in the JasenMap.

Once training is complete, the map holds all the pertinent information about all tokens (words) ever seen by the engine. This information is then accessed by the engine during a scan.

NOTE:
For performance reasons the data contained in a map is loaded into memory by the engine. Obviously this means the bigger the data file (token map), the more memory is required to operate the engine.

Accurate message scanning can only be achieved with a well populated map, ideally generated from a wide variety of spam/ham emails. It is recommended that the training corpus for each type of email be not less than 1000 emails, and preferrably not less than 5000

Author:
Jason Polites
See Also:
Serialized Form

Field Summary
static int HAM
           
static int SPAM
           
 
Constructor Summary
JasenMap()
           
 
Method Summary
 void addToken(String key, int type)
          Adds a token to the map.
 int getHamObservations()
          Get the number of times a ham message (observation) has been encountered
 int getSpamObservations()
          Gets the number of spam observations.
 JasenToken getToken(String key)
          Gets the token associated with the key.
 Map getTokens()
          Gets reference to the entire map of tokens.
 int getTotalObservations()
          Gets the total number of observations in the map, both spam and ham
 void incrementObservations(int type)
          Called each time a new observation is made
 Iterator iterator()
          Returns an iterator for the token keys.
 void setHamObservations(int hamObservations)
          Sets the number of ham observations
 void setSpamObservations(int spamObservations)
          Sets the number of spam observations
 void setTokens(Map tokens)
          Sets the token map
 int size()
          Gets the number of tokens in the map
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

HAM

public static final int HAM
See Also:
Constant Field Values

SPAM

public static final int SPAM
See Also:
Constant Field Values
Constructor Detail

JasenMap

public JasenMap()
Method Detail

addToken

public void addToken(String key,
                     int type)
Adds a token to the map.

If the token already exists, the relevant counter is simply incremented

Parameters:
key - The word token
type - One of JasenMap.HAM or JasenMap.SPAM

getToken

public JasenToken getToken(String key)
Gets the token associated with the key.

Parameters:
key - The key, or word, to which this token corresponds
Returns:
The token matching the key, or null if no such token exists

incrementObservations

public void incrementObservations(int type)
Called each time a new observation is made

Parameters:
type - One of JasenMap.HAM or JasenMap.SPAM

getTotalObservations

public int getTotalObservations()
Gets the total number of observations in the map, both spam and ham

Returns:
SPAM + HAM number of observations

getHamObservations

public int getHamObservations()
Get the number of times a ham message (observation) has been encountered

Returns:
An integer representing the number of ham messages in the map

setHamObservations

public void setHamObservations(int hamObservations)
Sets the number of ham observations

Parameters:
hamObservations - The number of ham observations in the map

getSpamObservations

public int getSpamObservations()
Gets the number of spam observations. That is, the number if messages represented in the map that were spam

Returns:
An integer represeting the number of spam observations in the map

setSpamObservations

public void setSpamObservations(int spamObservations)
Sets the number of spam observations

Parameters:
spamObservations - The number of ham observations in the map

getTokens

public Map getTokens()
Gets reference to the entire map of tokens. Each element in the map is keyed on a word, and tied to a JasenToken object

Returns:
The total token map.
See Also:
JasenToken

setTokens

public void setTokens(Map tokens)
Sets the token map

Parameters:
tokens - The token map to set.

iterator

public Iterator iterator()
Returns an iterator for the token keys.

Actual tokens must still be obtained via the getToken method

Returns:
An Iterator into the keyset of the token map. Each value returned by Iterator will be a String representing the key of the JasenToken object
See Also:
getToken(String)

size

public int size()
Gets the number of tokens in the map

Returns:
The number of tokens