HTML5_Tokenizer
in package
Table of Contents
- ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
- CDATA = 2
- CHARACTER = 4
- COMMENT = 3
- DIGIT = '0123456789'
- DOCTYPE = 0
- ENDTAG = 2
- EOF = 6
- HEX = '0123456789ABCDEFabcdef'
- LOWER_ALPHA = 'abcdefghijklmnopqrstuvwxyz'
- PARSEERROR = 7
- PCDATA = 0
- PLAINTEXT = 3
- RCDATA = 1
- SPACECHARACTER = 5
- STARTTAG = 1
- UPPER_ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
- WHITESPACE = " "
- $content_model : int
- $stream : HTML5_InputStream
- $token : mixed
- Current token that is being built, but not yet emitted. Also is the last token emitted, if applicable.
- $tree : HTML5_TreeBuilder
- __construct() : mixed
- getTree() : HTML5_TreeBuilder
- parse() : mixed
- Performs the actual parsing of the document.
- parseFragment() : mixed
- save() : DOMDocument|DOMNodeList
- Returns a serialized representation of the tree.
- stream() : HTML5_InputStream
- Returns the input stream.
- emitToken() : mixed
- Emits a token, passing it on to the tree builder.
- characterReferenceInAttributeValue() : mixed
- consumeCharacterReference() : string
Constants
ALPHA
public
mixed
ALPHA
= 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
CDATA
public
mixed
CDATA
= 2
CHARACTER
public
mixed
CHARACTER
= 4
COMMENT
public
mixed
COMMENT
= 3
DIGIT
public
mixed
DIGIT
= '0123456789'
DOCTYPE
public
mixed
DOCTYPE
= 0
ENDTAG
public
mixed
ENDTAG
= 2
EOF
public
mixed
EOF
= 6
HEX
public
mixed
HEX
= '0123456789ABCDEFabcdef'
LOWER_ALPHA
public
mixed
LOWER_ALPHA
= 'abcdefghijklmnopqrstuvwxyz'
PARSEERROR
public
mixed
PARSEERROR
= 7
PCDATA
public
mixed
PCDATA
= 0
PLAINTEXT
public
mixed
PLAINTEXT
= 3
RCDATA
public
mixed
RCDATA
= 1
SPACECHARACTER
public
mixed
SPACECHARACTER
= 5
STARTTAG
public
mixed
STARTTAG
= 1
UPPER_ALPHA
public
mixed
UPPER_ALPHA
= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
WHITESPACE
public
mixed
WHITESPACE
= "
"
Properties
$content_model
protected
int
$content_model
Current content model we are parsing as.
$stream
protected
HTML5_InputStream
$stream
Points to an InputStream object.
$token
Current token that is being built, but not yet emitted. Also is the last token emitted, if applicable.
protected
mixed
$token
$tree
private
HTML5_TreeBuilder
$tree
Tree builder that the tokenizer emits token to.
Methods
__construct()
public
__construct( $data[, HTML5_TreeBuilder|null $builder = null ]) : mixed
Parameters
- $data :
-
| Data to parse
- $builder : HTML5_TreeBuilder|null = null
Return values
mixed —getTree()
public
getTree() : HTML5_TreeBuilder
Return values
HTML5_TreeBuilder —The tree
parse()
Performs the actual parsing of the document.
public
parse() : mixed
Return values
mixed —parseFragment()
public
parseFragment([null $context = null ]) : mixed
Parameters
- $context : null = null
Return values
mixed —save()
Returns a serialized representation of the tree.
public
save() : DOMDocument|DOMNodeList
Return values
DOMDocument|DOMNodeList —stream()
Returns the input stream.
public
stream() : HTML5_InputStream
Return values
HTML5_InputStream —emitToken()
Emits a token, passing it on to the tree builder.
protected
emitToken( $token[, bool $checkStream = true ][, bool $dry = false ]) : mixed
Parameters
Return values
mixed —characterReferenceInAttributeValue()
private
characterReferenceInAttributeValue([bool $allowed = false ]) : mixed
Parameters
- $allowed : bool = false
Return values
mixed —consumeCharacterReference()
private
consumeCharacterReference([bool $allowed = false ][, bool $inattr = false ]) : string
Parameters
- $allowed : bool = false
- $inattr : bool = false