|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jasen.core.linguistics.LinguisticAnalyzer
Singleton linguistic analyzer class used to determine if a word is valid .
Field Summary | |
static double |
DEFAULT_THRESHOLD
|
static char[] |
EXTENDED_UNICODE_REPLACE
|
static char[] |
EXTENDED_UNICODE_SEARCH
|
static char[] |
STANDARD_UNICODE_REPLACE
|
static char[] |
STANDARD_UNICODE_SEARCH
|
Method Summary | |
static String |
clean(String word)
Uses the replacement facilities of the analyzer to "estimate" the best character replacements to clean the word. |
static char |
getExtendedReplacement(char chr)
Gets the most logical standard ASCII replacement for the extended ASCII character passed |
static char |
getFullReplacement(char chr)
Looks for either standard, or extended replacements for the given character |
static LinguisticAnalyzer |
getInstance()
Returns the current instance, or creates and initialises the internal analyzer |
static char |
getStandardReplacement(char chr)
Does a stanndard replacement of ASCII characters to ASCII characters. |
double |
getWordScore(String word)
Computes the probability that the given word is a "real" word |
double |
getWordScore(String word,
boolean clean)
Computes the probability that the given word is a "real" word |
boolean |
isWord(String word)
Returns true if the word is valid according to the default threshold of 0.1. |
boolean |
isWord(String word,
boolean clean)
Returns true if the word is valid according to the default threshold of 0.1. |
boolean |
isWord(String word,
double threshold,
boolean clean)
Returns true if the word is valid according to the given threshold. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final double DEFAULT_THRESHOLD
public static char[] STANDARD_UNICODE_SEARCH
public static char[] STANDARD_UNICODE_REPLACE
public static char[] EXTENDED_UNICODE_SEARCH
public static char[] EXTENDED_UNICODE_REPLACE
Method Detail |
public static final LinguisticAnalyzer getInstance()
IOException
public double getWordScore(String word, boolean clean)
word
- clean
- If true, abberate characters (non alphabetical) are removed
public double getWordScore(String word)
word
- The word to test
public boolean isWord(String word, double threshold, boolean clean)
word
- The word to testthreshold
- Should be a value between 0.0 and 1.0
public boolean isWord(String word, boolean clean)
word
- The word to testclean
- If true, the word has extended ASCII characters replaced with ASCII equivalents.
public boolean isWord(String word)
word
- The word to test
public static String clean(String word)
word
- The word to investigate.
public static char getExtendedReplacement(char chr)
chr
- The character to replace. Usually non ASCII
public static char getStandardReplacement(char chr)
This is used in situations where a word has been deliberately obfuscated by using similar looking characters in replacement for the actual alternative.
For example: The word: "he||0 w0r|d" should be interpreted as "hello world"
chr
- The standard ASCII character to replace
public static char getFullReplacement(char chr)
chr
- The character to replace
getExtendedReplacement(char)
,
getStandardReplacement(char)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |