org.jasen.core
Class StandardParserData

java.lang.Object
  extended byorg.jasen.core.StandardParserData
All Implemented Interfaces:
ParserData

public class StandardParserData
extends Object
implements ParserData

Holds the information obtained from parsing and tokenizing the message.

Author:
Jason Polites

Constructor Summary
StandardParserData()
           
 
Method Summary
 int getConcealedHtmlCount()
          Gets the number of occurrances of concealed HTML.
 int getFalseAnchorCount()
          Gets the count of anchor tags whose text was URL text (eg http://...) .
 String getHtmlAsText()
          Gets the HTML part of the message as plain text
 int getImageCount()
          Gets the number of images in the email body.
 String[] getMessageTokens()
          Gets the single (word) tokens extracted from the message
 int getObfuscatedCharacterCount()
          Gets the number of character obfuscation observations.
 List getPorts()
          Gets the list of TCP ports found appended to URLs in the HTML body of the message.
 int getSrcCgiCount()
          Gets the number of occurrances of SRC cgi references.
 int getSrcPortCount()
          Gets the list of anchor or image src (or href) attributes which had alternate TCP ports appended.
 String getTextParsed()
          Gets the text part of the message as "clean" text
 TokenErrorRecorder getTokenErrorRecorder()
          Gets the recorder used to record errors or anomalies found while tokenizing during a parse
 void setConcealedHtmlCount(int concealedHtmlCount)
          Sets the number of occurrances of concealed HTML.
 void setFalseAnchorCount(int falseAnchorCount)
          Sets the count of anchor tags whose text was URL text (eg http://...) but did not match the href attribute .
 void setHtmlAsText(String parsedHtml)
          Sets the parsed html.
 void setImageCount(int imageCount)
          Sets the number of images in the email body.
 void setMessageTokens(String[] htmlTokens)
          Sets the message tokens obtained from tokenization.
 void setPorts(List ports)
          Sets the list of TCP ports found appended to URLs in the HTML body of the message.
 void setSrcCgiCount(int srcCgiCount)
          Sets the number of occurrances of SRC cgi references.
 void setSrcPortCount(int srcPortCount)
          Sets the list of anchor or image src (or href) attributes which had alternate TCP ports appended.
 void setTextParsed(String textParsed)
          Sets the parsed (cleaned) text resulting from the message parse.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StandardParserData

public StandardParserData()
Method Detail

getMessageTokens

public String[] getMessageTokens()
Description copied from interface: ParserData
Gets the single (word) tokens extracted from the message

Specified by:
getMessageTokens in interface ParserData
Returns:
The tokens in the message as a String array

setMessageTokens

public void setMessageTokens(String[] htmlTokens)
Sets the message tokens obtained from tokenization.

Parameters:
htmlTokens -
See Also:
EmailTokenizer

getHtmlAsText

public String getHtmlAsText()
Description copied from interface: ParserData
Gets the HTML part of the message as plain text

Specified by:
getHtmlAsText in interface ParserData
Returns:
The HTML part of the message as plain text

setHtmlAsText

public void setHtmlAsText(String parsedHtml)
Sets the parsed html. That is, the plain text components of the html in the message.

Parameters:
parsedHtml -

getConcealedHtmlCount

public int getConcealedHtmlCount()
Gets the number of occurrances of concealed HTML.

Returns:
Returns the concealedHtmlCount.

setConcealedHtmlCount

public void setConcealedHtmlCount(int concealedHtmlCount)
Sets the number of occurrances of concealed HTML.

Parameters:
concealedHtmlCount - The concealedHtmlCount to set.

getImageCount

public int getImageCount()
Gets the number of images in the email body.

Returns:
Returns the imageCount.

setImageCount

public void setImageCount(int imageCount)
Sets the number of images in the email body.

Parameters:
imageCount - The imageCount to set.

getSrcCgiCount

public int getSrcCgiCount()
Gets the number of occurrances of SRC cgi references.

That is, occurrances of HTML tags where a SRC (or other remote reference) which would normally be expected to be a flat file (eg the IMG tag) was found to reference a cgi script or similar. This often indicates the presence of mail bugs

Returns:
Returns the srcCgiCount.

setSrcCgiCount

public void setSrcCgiCount(int srcCgiCount)
Sets the number of occurrances of SRC cgi references.

Parameters:
srcCgiCount - The srcCgiCount to set.
See Also:
getSrcCgiCount()

getTextParsed

public String getTextParsed()
Description copied from interface: ParserData
Gets the text part of the message as "clean" text

Specified by:
getTextParsed in interface ParserData
Returns:

setTextParsed

public void setTextParsed(String textParsed)
Sets the parsed (cleaned) text resulting from the message parse.

Parameters:
textParsed -

getPorts

public List getPorts()
Gets the list of TCP ports found appended to URLs in the HTML body of the message.

Returns:
A list of String objects

setPorts

public void setPorts(List ports)
Sets the list of TCP ports found appended to URLs in the HTML body of the message.

Parameters:
ports - A list of String objects

getSrcPortCount

public int getSrcPortCount()
Gets the list of anchor or image src (or href) attributes which had alternate TCP ports appended.

Returns:
The number of occurrances

setSrcPortCount

public void setSrcPortCount(int srcPortCount)
Sets the list of anchor or image src (or href) attributes which had alternate TCP ports appended.

Parameters:
srcPortCount - The number of occurrances.

getFalseAnchorCount

public int getFalseAnchorCount()
Gets the count of anchor tags whose text was URL text (eg http://...) . but did not match the href attribute.

Returns:
The number of occurrances.

setFalseAnchorCount

public void setFalseAnchorCount(int falseAnchorCount)
Sets the count of anchor tags whose text was URL text (eg http://...) but did not match the href attribute .

Parameters:
falseAnchorCount - The number of occurrances.

getObfuscatedCharacterCount

public int getObfuscatedCharacterCount()
Gets the number of character obfuscation observations.
These are instances where non ascii characters are used to obscure normal words.

Returns:
The number of occurrances.

getTokenErrorRecorder

public TokenErrorRecorder getTokenErrorRecorder()
Description copied from interface: ParserData
Gets the recorder used to record errors or anomalies found while tokenizing during a parse

Specified by:
getTokenErrorRecorder in interface ParserData
Returns:
The current token recorder