Lexer

java.lang.Object
- org.htmlparser.lexer.Lexer

所有已实现的接口:

java.io.Serializable, NodeFactory
```
public class Lexer
extends java.lang.Object
implements java.io.Serializable, NodeFactory
```
This class parses the HTML stream into nodes. There are three major types of nodes (lexemes):
- Remark
- Text
- Tag
Each time nextNode() is called, another node is returned until the stream is exhausted, and null is returned.
另请参阅:

序列化表格

字段概要

字段
限定符和类型	字段和说明
`static boolean`	`STRICT_REMARKS` Process remarks strictly flag.
`static java.lang.String`	`VERSION_DATE` The date of the version ("Jun 10, 2006").
`static double`	`VERSION_NUMBER` The floating point version number (1.6).
`static java.lang.String`	`VERSION_STRING` The display version ("1.6 (Release Build Jun 10, 2006)").
`static java.lang.String`	`VERSION_TYPE` The type of version ("Release Build").

构造器概要

构造器
构造器和说明
`Lexer()` Creates a new instance of a Lexer.
`Lexer(Page page)` Creates a new instance of a Lexer.
`Lexer(java.lang.String text)` Creates a new instance of a Lexer.
`Lexer(java.net.URLConnection connection)` Creates a new instance of a Lexer.

方法概要

所有方法静态方法实例方法具体方法
限定符和类型	方法和说明
`Remark`	`createRemarkNode(Page page, int start, int end)` Create a new remark node.
`Text`	`createStringNode(Page page, int start, int end)` Create a new string node.
`Tag`	`createTagNode(Page page, int start, int end, java.util.Vector attributes)` Create a new tag node.
`java.lang.String`	`getCurrentLine()` Get the current line.
`int`	`getCurrentLineNumber()` Get the current line number.
`Cursor`	`getCursor()` Get the current scanning position.
`NodeFactory`	`getNodeFactory()` Get the current node factory.
`Page`	`getPage()` Get the page this lexer is working on.
`int`	`getPosition()` Get the current cursor position.
`static java.lang.String`	`getVersion()` Return the version string of this parser.
`static void`	`main(java.lang.String[] args)` Mainline for command line operation
`Node`	`nextNode()` Get the next node from the source.
`Node`	`nextNode(boolean quotesmart)` Get the next node from the source.
`Node`	`parseCDATA()` Return CDATA as a text node.
`Node`	`parseCDATA(boolean quotesmart)` Return CDATA as a text node.
`void`	`reset()` Reset the lexer to start parsing from the beginning again.
`void`	`setCursor(Cursor cursor)` Set the current scanning position.
`void`	`setNodeFactory(NodeFactory factory)` Set the current node factory.
`void`	`setPage(Page page)` Set the page this lexer is working on.
`void`	`setPosition(int position)` Set the current cursor position.

从类继承的方法 java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- 字段详细资料
  - VERSION_NUMBER
```
public static final double VERSION_NUMBER
```
    The floating point version number (1.6).
    
    另请参阅:
    
    常量字段值
  - VERSION_TYPE
```
public static final java.lang.String VERSION_TYPE
```
    The type of version ("Release Build").
    
    另请参阅:
    
    常量字段值
  - VERSION_DATE
```
public static final java.lang.String VERSION_DATE
```
    The date of the version ("Jun 10, 2006").
    
    另请参阅:
    
    常量字段值
  - VERSION_STRING
```
public static final java.lang.String VERSION_STRING
```
    The display version ("1.6 (Release Build Jun 10, 2006)").
    
    另请参阅:
    
    常量字段值
  - STRICT_REMARKS
```
public static boolean STRICT_REMARKS
```
    Process remarks strictly flag. If true, remarks are not terminated by ---$gt; or --!$gt;, i.e. more than two dashes. If false, a more lax (and closer to typical browser handling) remark parsing is used. Default true.
- 构造器详细资料
  - Lexer
```
public Lexer()
```
    Creates a new instance of a Lexer.
  - Lexer
```
public Lexer(Page page)
```
    Creates a new instance of a Lexer.
    
    参数:
    
    page - The page with HTML text.
  - Lexer
```
public Lexer(java.lang.String text)
```
    Creates a new instance of a Lexer.
    
    参数:
    
    text - The text to parse.
  - Lexer
```
public Lexer(java.net.URLConnection connection)
      throws ParserException
```
    Creates a new instance of a Lexer.
    
    参数:
    
    connection - The url to parse.
    
    抛出:
    
    ParserException - If an error occurs opening the connection.
- 方法详细资料
  - getVersion
```
public static java.lang.String getVersion()
```
    Return the version string of this parser.
    返回:
    A string of the form:
    "[floating point number] ([build-type] [build-date])"
  - getPage
```
public Page getPage()
```
    Get the page this lexer is working on.
    
    返回:
    
    The page that nodes are being read from.
  - setPage
```
public void setPage(Page page)
```
    Set the page this lexer is working on.
    
    参数:
    
    page - The page that nodes will be read from.
  - getCursor
```
public Cursor getCursor()
```
    Get the current scanning position.
    
    返回:
    
    The lexer's cursor position.
  - setCursor
```
public void setCursor(Cursor cursor)
```
    Set the current scanning position.
    
    参数:
    
    cursor - The lexer's new cursor position.
  - getNodeFactory
```
public NodeFactory getNodeFactory()
```
    Get the current node factory.
    
    返回:
    
    The lexer's node factory.
  - setNodeFactory
```
public void setNodeFactory(NodeFactory factory)
```
    Set the current node factory.
    
    参数:
    
    factory - The node factory to be used by the lexer.
  - getPosition
```
public int getPosition()
```
    Get the current cursor position.
    
    返回:
    
    The current character offset into the source.
  - setPosition
```
public void setPosition(int position)
```
    Set the current cursor position.
    
    参数:
    
    position - The new character offset into the source.
  - getCurrentLineNumber
```
public int getCurrentLineNumber()
```
    Get the current line number.
    
    返回:
    
    The line number the lexer's working on.
  - getCurrentLine
```
public java.lang.String getCurrentLine()
```
    Get the current line.
    
    返回:
    
    The string the lexer's working on.
  - reset
```
public void reset()
```
    Reset the lexer to start parsing from the beginning again. The underlying components are reset such that the next call to nextNode() will return the first lexeme on the page.
  - nextNode
```
public Node nextNode()
              throws ParserException
```
    Get the next node from the source.
    
    返回:
    
    A Remark, Text or Tag, or null if no more lexemes are present.
    
    抛出:
    
    ParserException - If there is a problem with the underlying page.
  - nextNode
```
public Node nextNode(boolean quotesmart)
              throws ParserException
```
    Get the next node from the source.
    
    参数:
    
    quotesmart - If true, strings ignore quoted contents.
    
    返回:
    
    A Remark, Text or Tag, or null if no more lexemes are present.
    
    抛出:
    
    ParserException - If there is a problem with the underlying page.
  - parseCDATA
```
public Node parseCDATA()
                throws ParserException
```
    Return CDATA as a text node. According to appendix B.3.2 Specifying non-HTML data of the HTML 4.01 Specification:
    Element content
    When script or style data is the content of an element (SCRIPT and STYLE), the data begins immediately after the element start tag and ends at the first ETAGO ("</") delimiter followed by a name start character ([a-zA-Z]); note that this may not be the element's end tag. Authors should therefore escape "</" within the content. Escape mechanisms are specific to each scripting or style sheet language.
    
    返回:
    
    The TextNode of the CDATA or null if none.
    
    抛出:
    
    ParserException - If a problem occurs reading from the source.
  - parseCDATA
```
public Node parseCDATA(boolean quotesmart)
                throws ParserException
```
    Return CDATA as a text node. Slightly less rigid than parseCDATA() this method provides for parsing CDATA that may contain quoted strings that have embedded ETAGO ("</") delimiters and skips single and multiline comments.
    
    参数:
    
    quotesmart - If true the strict definition of CDATA is extended to allow for single or double quoted ETAGO ("</") sequences.
    
    返回:
    
    The TextNode of the CDATA or null if none.
    
    抛出:
    
    ParserException - If a problem occurs reading from the source.
    
    另请参阅:
    
    parseCDATA()
  - createStringNode
```
public Text createStringNode(Page page,
                             int start,
                             int end)
```
    Create a new string node.
    
    指定者:
    
    createStringNode 在接口中 NodeFactory
    
    参数:
    
    page - The page the node is on.
    
    start - The beginning position of the string.
    
    end - The ending positiong of the string.
    
    返回:
    
    The created Text node.
  - createRemarkNode
```
public Remark createRemarkNode(Page page,
                               int start,
                               int end)
```
    Create a new remark node.
    
    指定者:
    
    createRemarkNode 在接口中 NodeFactory
    
    参数:
    
    page - The page the node is on.
    
    start - The beginning position of the remark.
    
    end - The ending positiong of the remark.
    
    返回:
    
    The created Remark node.
  - createTagNode
```
public Tag createTagNode(Page page,
                         int start,
                         int end,
                         java.util.Vector attributes)
```
    Create a new tag node. Note that the attributes vector contains at least one element, which is the tag name (standalone attribute) at position zero. This can be used to decide which type of node to create, or gate other processing that may be appropriate.
    
    指定者:
    
    createTagNode 在接口中 NodeFactory
    
    参数:
    
    page - The page the node is on.
    
    start - The beginning position of the tag.
    
    end - The ending positiong of the tag.
    
    attributes - The attributes contained in this tag.
    
    返回:
    
    The created Tag node.
  - main
```
public static void main(java.lang.String[] args)
                 throws java.net.MalformedURLException,
                        ParserException
```
    Mainline for command line operation
    
    参数:
    
    args - [0] The URL to parse.
    
    抛出:
    
    java.net.MalformedURLException - If the provided URL cannot be resolved.
    
    ParserException - If the parse fails.

类 Lexer

字段概要

构造器概要

方法概要

从类继承的方法 java.lang.Object

字段详细资料

VERSION_NUMBER

VERSION_TYPE

VERSION_DATE

VERSION_STRING

STRICT_REMARKS

构造器详细资料

Lexer

Lexer

Lexer

Lexer

方法详细资料

getVersion

getPage

setPage

getCursor

setCursor

getNodeFactory

setNodeFactory

getPosition

setPosition

getCurrentLineNumber

getCurrentLine

reset

nextNode

nextNode

parseCDATA

parseCDATA

createStringNode

createRemarkNode

createTagNode

main