org.jasen.core.parsers
Class StandardHTMLParser

java.lang.Object
  extended byjavax.swing.text.html.HTMLEditorKit.ParserCallback
      extended byorg.jasen.core.parsers.StandardHTMLParser
All Implemented Interfaces:
HTMLParser
Direct Known Subclasses:
SpamHTMLParser

public class StandardHTMLParser
extends HTMLEditorKit.ParserCallback
implements HTMLParser

Parses the HTML part of an email for two main purposes.


Field Summary
 
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
StandardHTMLParser()
           
 
Method Summary
 String extractText(InputStream in)
          Extracts plain text from the html given by the input stream and returns it as a String
 void extractText(InputStream in, OutputStream out)
          Extracts the plain text components of the html given by the input stream and writes this plain text to the given output stream
 String extractText(String html)
          Extracts plain text from the given html String and returns it as a String
 void handleComment(char[] text, int pos)
           
 void handleEndTag(HTML.Tag t, int pos)
           
 void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos)
           
 void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
           
 void handleText(char[] text, int pos)
           
 ParserData parse(javax.mail.internet.MimeMessage mm, JasenMessage message, MimeMessageTokenizer tokenizer)
          Parses the given JasenMessage and returns the results of the parse as a ParserData object.
 void setEncoding(String encoding)
          Sets the encoding to use on the output stream (optional)
 
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
flush, handleEndOfLineString, handleError
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StandardHTMLParser

public StandardHTMLParser()
Method Detail

extractText

public void extractText(InputStream in,
                        OutputStream out)
                 throws JasenException
Extracts the plain text components of the html given by the input stream and writes this plain text to the given output stream

Parameters:
in - The input stream from which the html is read
out - The ouput stream to which the plain text is written
Throws:
JasenException

setEncoding

public void setEncoding(String encoding)
Sets the encoding to use on the output stream (optional)

Parameters:
encoding -

extractText

public String extractText(InputStream in)
                   throws JasenException
Extracts plain text from the html given by the input stream and returns it as a String

Parameters:
in - The input stream from which the html is read
Returns:
A String containing the plain text of the html
Throws:
JasenException

extractText

public String extractText(String html)
                   throws JasenException
Extracts plain text from the given html String and returns it as a String

Parameters:
html - The String containing the html
Returns:
The String containing the plain text
Throws:
JasenException

handleComment

public void handleComment(char[] text,
                          int pos)

handleEndTag

public void handleEndTag(HTML.Tag t,
                         int pos)

handleSimpleTag

public void handleSimpleTag(HTML.Tag t,
                            MutableAttributeSet a,
                            int pos)

handleStartTag

public void handleStartTag(HTML.Tag t,
                           MutableAttributeSet a,
                           int pos)

handleText

public void handleText(char[] text,
                       int pos)

parse

public ParserData parse(javax.mail.internet.MimeMessage mm,
                        JasenMessage message,
                        MimeMessageTokenizer tokenizer)
                 throws JasenException
Description copied from interface: HTMLParser
Parses the given JasenMessage and returns the results of the parse as a ParserData object.

Specified by:
parse in interface HTMLParser
Parameters:
message -
Returns:
The parsed data containing the results of the parse
Throws:
JasenException