|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjavax.swing.text.html.HTMLEditorKit.ParserCallback
org.jasen.core.parsers.StandardHTMLParser
org.jasen.core.parsers.SpamHTMLParser
Extracts plain text elements from an HTML document.
This implementation is specific to parsing the text out of spam emails
Field Summary | |
static String |
BGCOLOR_NAME
The CSS name for background colors (background-color) |
static String |
COLOR_NAME
The CSS name for foreground colors (color) |
static float |
COLOR_THRESHOLD
Deprecated. Use getContrastThreshold |
static int |
DEFAULT_BGCOLOR
The default numerical bacjground color (white) |
static int |
DEFAULT_COLOR
The default numerical foreground color (black) |
static String |
DEFAULT_STR_BGCOLOR
String (hex) value for the default background color (white) |
static String |
DEFAULT_STR_COLOR
String (hex) value for the default foreground color (black) |
static int |
ELEMENT_THRESHOLD
Deprecated. Use getMicroElementSize |
static int |
FONTSIZE_THRESHOLD
Deprecated. Use getMicroFontSize |
static String[] |
HTML_COLOR_NAMES
|
static String[] |
HTML_COLOR_VALUES
|
static double |
TOKEN_RECOGNITION_THRESHOLD
Deprecated. Not used |
static String |
URL_REGEX
Deprecated. Not used |
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
Constructor Summary | |
SpamHTMLParser()
|
Method Summary | |
int |
getConcealedHtmlCount()
Gets the number of times concealed html was found |
float |
getContrastThreshold()
Gets the threshold for contrast between foreground and background content elements. |
int |
getFalseAnchorCount()
Gets the number if occurrences of "false" anchor tags. |
int |
getImageCount()
Gets the number of times images were found |
int |
getMicroElementSize()
Gets the size (in pixels) of the minimum allowable element dimension (usually height). |
int |
getMicroFontSize()
Gets the size (in points) of the minimum allowable font size. |
int |
getSrcCgiCount()
Gets the number of times the source attribute of a tag referenced a remote CGI script |
int |
getSrcPortCount()
Gets the list of url ports found in tags with a src attribute |
List |
getUrlPorts()
Gets the list of url ports found in anchor tags in the message html part |
void |
handleEndTag(HTML.Tag t,
int pos)
|
void |
handleStartTag(HTML.Tag t,
MutableAttributeSet a,
int pos)
|
void |
handleText(char[] text,
int pos)
|
ParserData |
parse(javax.mail.internet.MimeMessage mm,
JasenMessage message,
MimeMessageTokenizer tokenizer)
Parses the given JasenMessage and returns the results of the parse as a ParserData object. |
void |
setContrastThreshold(float contrastThreshold)
Sets the threshold for contrast between foreground and background content elements. |
void |
setMicroElementSize(int microElementSize)
Sets the size (in pixels) of the minimum allowable element dimension (usually height). |
void |
setMicroFontSize(int microFontSize)
Sets the size (in points) of the minimum allowable font size. |
Methods inherited from class org.jasen.core.parsers.StandardHTMLParser |
extractText, extractText, extractText, handleComment, handleSimpleTag, setEncoding |
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleEndOfLineString, handleError |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int DEFAULT_BGCOLOR
public static final int DEFAULT_COLOR
public static final String DEFAULT_STR_BGCOLOR
public static final String DEFAULT_STR_COLOR
public static final float COLOR_THRESHOLD
public static final int FONTSIZE_THRESHOLD
public static final int ELEMENT_THRESHOLD
public static final double TOKEN_RECOGNITION_THRESHOLD
public static final String BGCOLOR_NAME
public static final String COLOR_NAME
public static final String URL_REGEX
public static String[] HTML_COLOR_NAMES
public static String[] HTML_COLOR_VALUES
Constructor Detail |
public SpamHTMLParser()
Method Detail |
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
handleStartTag
in class StandardHTMLParser
public void handleText(char[] text, int pos)
handleText
in class StandardHTMLParser
public void handleEndTag(HTML.Tag t, int pos)
handleEndTag
in class StandardHTMLParser
public int getConcealedHtmlCount()
public int getImageCount()
public int getSrcCgiCount()
public int getSrcPortCount()
public List getUrlPorts()
public int getFalseAnchorCount()
These exist where an anchor tag displays a url as the text component,
but this url does not match the actual url of the href.
public float getContrastThreshold()
public void setContrastThreshold(float contrastThreshold)
contrastThreshold
- A value between 0.0 and 1.0getContrastThreshold()
public int getMicroElementSize()
public void setMicroElementSize(int microElementSize)
microElementSize
- The size in pixels. It is recommended that this be less than 10.
Default is 5.public int getMicroFontSize()
public void setMicroFontSize(int microFontSize)
microFontSize
- A size in points. Default is 1public ParserData parse(javax.mail.internet.MimeMessage mm, JasenMessage message, MimeMessageTokenizer tokenizer) throws JasenException
HTMLParser
parse
in interface HTMLParser
parse
in class StandardHTMLParser
JasenException
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |