|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jasen.core.token.SpamTokenizer
This class is used exlusively by the EmailTokenizer.
EmailTokenizer
Field Summary | |
static char[] |
DELIMITER_CHARS
These are characters which should always be treated as delimiters except when within a url This array MUST be sorted to faciliate a binary search |
static int |
MAX_TOKEN_LENGTH
|
static int |
MIN_TOKEN_LENGTH
|
static char[] |
STOP_CHARS
List list does NOT contain "$,@,?,!" as we want to retain these. |
static String[] |
STOP_WORDS
|
static double |
TOKEN_RECOGNITION_THRESHOLD
|
Constructor Summary | |
SpamTokenizer()
|
Method Summary | |
int |
getMaxTokens()
Gets the maximum number of tokens to be extracted prior to aborting the tokenization process |
void |
setMaxTokens(int i)
|
String[] |
tokenize(Reader reader,
boolean onlyUrls,
TokenErrorRecorder recorder)
|
String[] |
tokenize(String str,
boolean onlyUrls,
TokenErrorRecorder recorder)
Custom implementation which only returns urls This is used for mail headers specifically |
String[] |
tokenize(String str,
TokenErrorRecorder recorder)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static int MIN_TOKEN_LENGTH
public static int MAX_TOKEN_LENGTH
public static double TOKEN_RECOGNITION_THRESHOLD
public static String[] STOP_WORDS
public static char[] STOP_CHARS
public static char[] DELIMITER_CHARS
Constructor Detail |
public SpamTokenizer()
Method Detail |
public String[] tokenize(String str, boolean onlyUrls, TokenErrorRecorder recorder) throws IOException
str
- onlyUrls
-
IOException
public String[] tokenize(String str, TokenErrorRecorder recorder) throws IOException
IOException
public String[] tokenize(Reader reader, boolean onlyUrls, TokenErrorRecorder recorder) throws IOException
IOException
public int getMaxTokens()
public void setMaxTokens(int i)
i
-
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |