org.jasen.core.parsers
Class URLParser

java.lang.Object
  extended byjavax.swing.text.html.HTMLEditorKit.ParserCallback
      extended byorg.jasen.core.parsers.URLParser

public class URLParser
extends HTMLEditorKit.ParserCallback

Looks specifically for URL sequences in email content, both text and HTML.

The rationale here is than two spam emails with different content may in fact be referening the same url.

This also provides for future enhancements based on blocking of content associated with black-listed domains

Author:
Jason Polites

Field Summary
static String URL_PREFIX
           
static String[] URL_WORDS
          This array MUST be sorted to faciliate a binary search
 
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
URLParser()
           
URLParser(String prefix)
           
 
Method Summary
 String[] getUrlArray()
          Returns the contents of the parser as an array of String objects
 List getUrls()
          Returns the list of URL objects as Strings
 void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos)
           
 void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
           
 void parse(InputStream in)
           
 void parse(Reader in)
           
 void parse(String str)
           
 
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
flush, handleComment, handleEndOfLineString, handleEndTag, handleError, handleText
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URL_PREFIX

public static final String URL_PREFIX
See Also:
Constant Field Values

URL_WORDS

public static String[] URL_WORDS
This array MUST be sorted to faciliate a binary search

Constructor Detail

URLParser

public URLParser()

URLParser

public URLParser(String prefix)
Method Detail

handleStartTag

public void handleStartTag(HTML.Tag t,
                           MutableAttributeSet a,
                           int pos)

handleSimpleTag

public void handleSimpleTag(HTML.Tag t,
                            MutableAttributeSet a,
                            int pos)

parse

public void parse(String str)
           throws IOException
Throws:
IOException

parse

public void parse(InputStream in)
           throws IOException
Throws:
IOException

parse

public void parse(Reader in)
           throws IOException
Throws:
IOException

getUrlArray

public String[] getUrlArray()
Returns the contents of the parser as an array of String objects

Returns:

getUrls

public List getUrls()
Returns the list of URL objects as Strings

Returns: