public class Lexer extends java.lang.Object implements java.io.Serializable, NodeFactory
nextNode()
is called, another node is returned until
the stream is exhausted, and null
is returned.限定符和类型 | 字段和说明 |
---|---|
static boolean |
STRICT_REMARKS
Process remarks strictly flag.
|
static java.lang.String |
VERSION_DATE
The date of the version ("Jun 10, 2006").
|
static double |
VERSION_NUMBER
The floating point version number (1.6).
|
static java.lang.String |
VERSION_STRING
The display version ("1.6 (Release Build Jun 10, 2006)").
|
static java.lang.String |
VERSION_TYPE
The type of version ("Release Build").
|
构造器和说明 |
---|
Lexer()
Creates a new instance of a Lexer.
|
Lexer(Page page)
Creates a new instance of a Lexer.
|
Lexer(java.lang.String text)
Creates a new instance of a Lexer.
|
Lexer(java.net.URLConnection connection)
Creates a new instance of a Lexer.
|
限定符和类型 | 方法和说明 |
---|---|
Remark |
createRemarkNode(Page page,
int start,
int end)
Create a new remark node.
|
Text |
createStringNode(Page page,
int start,
int end)
Create a new string node.
|
Tag |
createTagNode(Page page,
int start,
int end,
java.util.Vector attributes)
Create a new tag node.
|
java.lang.String |
getCurrentLine()
Get the current line.
|
int |
getCurrentLineNumber()
Get the current line number.
|
Cursor |
getCursor()
Get the current scanning position.
|
NodeFactory |
getNodeFactory()
Get the current node factory.
|
Page |
getPage()
Get the page this lexer is working on.
|
int |
getPosition()
Get the current cursor position.
|
static java.lang.String |
getVersion()
Return the version string of this parser.
|
static void |
main(java.lang.String[] args)
Mainline for command line operation
|
Node |
nextNode()
Get the next node from the source.
|
Node |
nextNode(boolean quotesmart)
Get the next node from the source.
|
Node |
parseCDATA()
Return CDATA as a text node.
|
Node |
parseCDATA(boolean quotesmart)
Return CDATA as a text node.
|
void |
reset()
Reset the lexer to start parsing from the beginning again.
|
void |
setCursor(Cursor cursor)
Set the current scanning position.
|
void |
setNodeFactory(NodeFactory factory)
Set the current node factory.
|
void |
setPage(Page page)
Set the page this lexer is working on.
|
void |
setPosition(int position)
Set the current cursor position.
|
public static final double VERSION_NUMBER
public static final java.lang.String VERSION_TYPE
public static final java.lang.String VERSION_DATE
public static final java.lang.String VERSION_STRING
public static boolean STRICT_REMARKS
true
, remarks are not terminated by ---$gt;
or --!$gt;, i.e. more than two dashes. If false
,
a more lax (and closer to typical browser handling) remark parsing
is used.
Default true
.public Lexer()
public Lexer(Page page)
page
- The page with HTML text.public Lexer(java.lang.String text)
text
- The text to parse.public Lexer(java.net.URLConnection connection) throws ParserException
connection
- The url to parse.ParserException
- If an error occurs opening the connection.public static java.lang.String getVersion()
"[floating point number] ([build-type] [build-date])"
public Page getPage()
public void setPage(Page page)
page
- The page that nodes will be read from.public Cursor getCursor()
public void setCursor(Cursor cursor)
cursor
- The lexer's new cursor position.public NodeFactory getNodeFactory()
public void setNodeFactory(NodeFactory factory)
factory
- The node factory to be used by the lexer.public int getPosition()
public void setPosition(int position)
position
- The new character offset into the source.public int getCurrentLineNumber()
public java.lang.String getCurrentLine()
public void reset()
nextNode()
will return the first lexeme on the page.public Node nextNode() throws ParserException
null
if no
more lexemes are present.ParserException
- If there is a problem with the
underlying page.public Node nextNode(boolean quotesmart) throws ParserException
quotesmart
- If true
, strings ignore quoted contents.null
if no
more lexemes are present.ParserException
- If there is a problem with the
underlying page.public Node parseCDATA() throws ParserException
TextNode
of the CDATA or null
if none.ParserException
- If a problem occurs reading from the source.public Node parseCDATA(boolean quotesmart) throws ParserException
parseCDATA()
this method provides for
parsing CDATA that may contain quoted strings that have embedded
ETAGO ("</") delimiters and skips single and multiline comments.quotesmart
- If true
the strict definition of CDATA is
extended to allow for single or double quoted ETAGO ("</") sequences.TextNode
of the CDATA or null
if none.ParserException
- If a problem occurs reading from the source.parseCDATA()
public Text createStringNode(Page page, int start, int end)
createStringNode
在接口中 NodeFactory
page
- The page the node is on.start
- The beginning position of the string.end
- The ending positiong of the string.public Remark createRemarkNode(Page page, int start, int end)
createRemarkNode
在接口中 NodeFactory
page
- The page the node is on.start
- The beginning position of the remark.end
- The ending positiong of the remark.public Tag createTagNode(Page page, int start, int end, java.util.Vector attributes)
createTagNode
在接口中 NodeFactory
page
- The page the node is on.start
- The beginning position of the tag.end
- The ending positiong of the tag.attributes
- The attributes contained in this tag.public static void main(java.lang.String[] args) throws java.net.MalformedURLException, ParserException
args
- [0] The URL to parse.java.net.MalformedURLException
- If the provided URL cannot be resolved.ParserException
- If the parse fails.