org.sandev.basics.util
Class XMLTextProcessing

java.lang.Object
  extended byorg.sandev.basics.util.XMLTextProcessing

public class XMLTextProcessing
extends java.lang.Object

Provides raw text translation services for XML.

This class leverages the StringCharacterIterator combined with Character.isWhite to do its work. It does not actually make use of StringTokenizer or StreamTokenizer (not that those share anything in common either).


Constructor Summary
XMLTextProcessing()
           
 
Method Summary
static java.lang.String convertFromXML(java.lang.String text)
          Performs the inverse of the convertToXML character escapes.
static java.lang.String convertToHTML(java.lang.String text, boolean linkHref, boolean linkEmail, boolean translateFormat)
          Like convertToXML, except less stringent about things like apostrophes, quotes and ampersands.
static java.lang.String convertToXML(java.lang.String text, boolean linkHref, boolean linkEmail, boolean translateFormat)
          Convert the given text to valid XML plaintext.
static void escapeCharacter(java.lang.StringBuffer buf, char currChar, boolean stringentEscape)
          Append the character or the equivalent XML escape string to the given buffer.
static java.lang.String getPrefix(java.lang.String token)
          Return the open parenthesis or other prefix this token starts with, or the empty string if it is unprefixed.
static java.lang.String getSuffix(java.lang.String token)
          Return the close parentheses or other suffix this token ends with, or the empty string if it is unsuffixed.
static java.lang.String processToXML(java.lang.String text, boolean linkHref, boolean linkEmail, boolean translateFormat, boolean stringentEscape)
          Workhorse for convertToXML, convertToHTML methods.
static java.lang.String translateToken(java.lang.String token, boolean linkHref, boolean linkEmail)
          If the given token looks like an email address or a hyperlink then make it into one.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLTextProcessing

public XMLTextProcessing()
Method Detail

convertToXML

public static java.lang.String convertToXML(java.lang.String text,
                                            boolean linkHref,
                                            boolean linkEmail,
                                            boolean translateFormat)
Convert the given text to valid XML plaintext. This method has three main functions:
  1. Escape any problematic characters that XML would otherwise try to process during subsequent parsing.
  2. Translate newlines into html breaks so they don't get lost
  3. Trap things like email addresses or URLs and translate them into hyperlinks for display.


convertToHTML

public static java.lang.String convertToHTML(java.lang.String text,
                                             boolean linkHref,
                                             boolean linkEmail,
                                             boolean translateFormat)
Like convertToXML, except less stringent about things like apostrophes, quotes and ampersands.


processToXML

public static java.lang.String processToXML(java.lang.String text,
                                            boolean linkHref,
                                            boolean linkEmail,
                                            boolean translateFormat,
                                            boolean stringentEscape)
Workhorse for convertToXML, convertToHTML methods. If translateFormat is true, then newlines are converted into breaks. We also convert tab characters into four non-break spaces, but since those can't easily be entered into most interfaces (tabbing usually takes you to the next entry field) we also convert sequential hard spaces into "nbsp"s. The way this works is every second space is replaced with an nbsp and not echoed.

The tough part about this is that in an HTML display, a space between two characters gets displayed, while space at the beginning of a line is typically ignored. So "blah nbsp;blah" is two spaces whereas " nbsp;blah" at the beginning of a line is 1 space. So when creating an indented list in text, we lose the first space character, so cut-and-paste into an editor loses one level of indenting. To avoid this we would need to trap whether we were at the beginning of a new line or not, which doesn't seem worth it. The relative positions look ok.


translateToken

public static java.lang.String translateToken(java.lang.String token,
                                              boolean linkHref,
                                              boolean linkEmail)
If the given token looks like an email address or a hyperlink then make it into one. Translations are toggled based on the parameter flags. Basically if a token starts with http:// then we treat it as a hyperlink. Otherwise if it contains an "@" character and a "." treat it as an email address.


getPrefix

public static java.lang.String getPrefix(java.lang.String token)
Return the open parenthesis or other prefix this token starts with, or the empty string if it is unprefixed.


getSuffix

public static java.lang.String getSuffix(java.lang.String token)
Return the close parentheses or other suffix this token ends with, or the empty string if it is unsuffixed.


escapeCharacter

public static void escapeCharacter(java.lang.StringBuffer buf,
                                   char currChar,
                                   boolean stringentEscape)
Append the character or the equivalent XML escape string to the given buffer. This replaces things like apostrophes, ampersands and the like with their equivalent escape strings.


convertFromXML

public static java.lang.String convertFromXML(java.lang.String text)
Performs the inverse of the convertToXML character escapes.