public class StringBean extends NodeVisitor implements java.io.Serializable
Text within <SCRIPT></SCRIPT> tags is removed.
The text within <PRE></PRE> tags is not altered.
The property Strings
, which is the output property is null
until a URL is set. So a typical usage is:
StringBean sb = new StringBean (); sb.setLinks (false); sb.setReplaceNonBreakingSpaces (true); sb.setCollapse (true); sb.setURL ("http://www.netbeans.org"); // the HTTP is performed here String s = sb.getStrings ();You can also use the StringBean as a NodeVisitor on your own parser, in which case you have to refetch your page if you change one of the properties because it resets the Strings property:
StringBean sb = new StringBean (); Parser parser = new Parser ("http://cbc.ca"); parser.visitAllNodesWith (sb); String s = sb.getStrings (); sb.setLinks (true); parser.reset (); parser.visitAllNodesWith (sb); String sl = sb.getStrings ();According to Nick Burch, who contributed the patch, this is handy if you don't want StringBean to wander off and get the content itself, either because you already have it, it's not on a website etc.
限定符和类型 | 字段和说明 |
---|---|
static java.lang.String |
PROP_COLLAPSE_PROPERTY
Property name in event where the 'collapse whitespace' state changes.
|
static java.lang.String |
PROP_CONNECTION_PROPERTY
Property name in event where the connection changes.
|
static java.lang.String |
PROP_LINKS_PROPERTY
Property name in event where the 'embed links' state changes.
|
static java.lang.String |
PROP_REPLACE_SPACE_PROPERTY
Property name in event where the 'replace non-breaking spaces'
state changes.
|
static java.lang.String |
PROP_STRINGS_PROPERTY
Property name in event where the URL contents changes.
|
static java.lang.String |
PROP_URL_PROPERTY
Property name in event where the URL changes.
|
构造器和说明 |
---|
StringBean()
Create a StringBean object.
|
限定符和类型 | 方法和说明 |
---|---|
void |
addPropertyChangeListener(java.beans.PropertyChangeListener listener)
Add a PropertyChangeListener to the listener list.
|
boolean |
getCollapse()
Get the current 'collapse whitespace' state.
|
java.net.URLConnection |
getConnection()
Get the current connection.
|
boolean |
getLinks()
Get the current 'include links' state.
|
boolean |
getReplaceNonBreakingSpaces()
Get the current 'replace non breaking spaces' state.
|
java.lang.String |
getStrings()
Return the textual contents of the URL.
|
java.lang.String |
getURL()
Get the current URL.
|
static void |
main(java.lang.String[] args)
Unit test.
|
void |
removePropertyChangeListener(java.beans.PropertyChangeListener listener)
Remove a PropertyChangeListener from the listener list.
|
void |
setCollapse(boolean collapse)
Set the current 'collapse whitespace' state.
|
void |
setConnection(java.net.URLConnection connection)
Set the parser's connection.
|
void |
setLinks(boolean links)
Set the 'include links' state.
|
void |
setReplaceNonBreakingSpaces(boolean replace)
Set the 'replace non breaking spaces' state.
|
void |
setURL(java.lang.String url)
Set the URL to extract strings from.
|
void |
visitEndTag(Tag tag)
Resets the state of the PRE and SCRIPT flags.
|
void |
visitStringNode(Text string)
Appends the text to the output.
|
void |
visitTag(Tag tag)
Appends a NEWLINE to the output if the tag breaks flow, and
possibly sets the state of the PRE and SCRIPT flags.
|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode
public static final java.lang.String PROP_STRINGS_PROPERTY
public static final java.lang.String PROP_LINKS_PROPERTY
public static final java.lang.String PROP_URL_PROPERTY
public static final java.lang.String PROP_REPLACE_SPACE_PROPERTY
public static final java.lang.String PROP_COLLAPSE_PROPERTY
public static final java.lang.String PROP_CONNECTION_PROPERTY
public StringBean()
Links
is set false
so text appears like a
browser would display it, albeit without the colour or underline clues
normally associated with a link.
ReplaceNonBreakingSpaces
is set true
, so
that printing the text works, but the extra information regarding these
formatting marks is available if you set it false.
Collapse
is set true
, so text appears
compact like a browser would display it.
public void addPropertyChangeListener(java.beans.PropertyChangeListener listener)
listener
- The PropertyChangeListener to be added.public void removePropertyChangeListener(java.beans.PropertyChangeListener listener)
listener
- The PropertyChangeListener to be removed.public java.lang.String getStrings()
public boolean getLinks()
true
if link text is included in the text extracted
from the URL, false
otherwise.public void setLinks(boolean links)
links
- Use true
if link text is to be included in the
text extracted from the URL, false
otherwise.public java.lang.String getURL()
null
if this property has not been set yet.public void setURL(java.lang.String url)
url
- The URL that text should be fetched from.public boolean getReplaceNonBreakingSpaces()
true
if non-breaking spaces (character '\u00a0',
numeric character reference   or character entity
reference ) are to be replaced with normal
spaces (character '\u0020').public void setReplaceNonBreakingSpaces(boolean replace)
replace
- true
if non-breaking spaces
(character '\u00a0', numeric character reference  
or character entity reference ) are to be replaced with normal
spaces (character '\u0020').public boolean getCollapse()
true
this emulates the operation of browsers
in interpretting text where user agents should collapse input
white space sequences when producing output inter-word space.
See HTML specification section 9.1 White space
http://www.w3.org/TR/html4/struct/text.html#h-9.1.true
if sequences of whitespace (space '\u0020',
tab '\u0009', form feed '\u000C', zero-width space '\u200B',
carriage-return '\r' and NEWLINE '\n') are to be replaced with a single
space.public void setCollapse(boolean collapse)
setCollapse (getCollapse ());
collapse
- If true
, sequences of whitespace
will be reduced to a single space.public java.net.URLConnection getConnection()
null
if it
hasn't been set or the parser hasn't been constructed yet.public void setConnection(java.net.URLConnection connection)
connection
- New value of property Connection.public void visitStringNode(Text string)
visitStringNode
在类中 NodeVisitor
string
- The text node.public void visitTag(Tag tag)
visitTag
在类中 NodeVisitor
tag
- The tag to examine.public void visitEndTag(Tag tag)
visitEndTag
在类中 NodeVisitor
tag
- The end tag to process.public static void main(java.lang.String[] args)
args
- Pass arg[0] as the URL to process.