org.jasen.core.parsers
Class URLParser
java.lang.Object
javax.swing.text.html.HTMLEditorKit.ParserCallback
org.jasen.core.parsers.URLParser
- public class URLParser
- extends HTMLEditorKit.ParserCallback
Looks specifically for URL sequences in email content, both text and HTML.
The rationale here is than two spam emails with different content may in fact be referening the same url.
This also provides for future enhancements based on blocking of content associated with black-listed domains
- Author:
- Jason Polites
URL_PREFIX
public static final String URL_PREFIX
- See Also:
- Constant Field Values
URL_WORDS
public static String[] URL_WORDS
- This array MUST be sorted to faciliate a binary search
URLParser
public URLParser()
URLParser
public URLParser(String prefix)
handleStartTag
public void handleStartTag(HTML.Tag t,
MutableAttributeSet a,
int pos)
handleSimpleTag
public void handleSimpleTag(HTML.Tag t,
MutableAttributeSet a,
int pos)
parse
public void parse(String str)
throws IOException
- Throws:
IOException
parse
public void parse(InputStream in)
throws IOException
- Throws:
IOException
parse
public void parse(Reader in)
throws IOException
- Throws:
IOException
getUrlArray
public String[] getUrlArray()
- Returns the contents of the parser as an array of String objects
- Returns:
getUrls
public List getUrls()
- Returns the list of URL objects as Strings
- Returns: