Package org.jsoup.parser
Class TokenQueue
java.lang.Object
org.jsoup.parser.TokenQueue
- All Implemented Interfaces:
AutoCloseable
A character reader with helpers focusing on parsing CSS selectors. Used internally by jsoup. API subject to changes.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final char[]private static final char[]private static final charprivate static final charprivate final CharacterReaderprivate static final charprivate static final char -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidadvance()Drops the next character off the queue.private static voidappendEscaped(StringBuilder out, char c) private static voidappendEscapedCodepoint(StringBuilder out, char c) chompBalanced(char open, char close) Pulls a balanced string off the queue.voidclose()charconsume()Consume one character off queue.voidConsumes the supplied sequence of the queue, case-insensitively.private voidConsume a CSS identifier (ID or class) off the queue.Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).private StringconsumeEscapedCssIdentifier(char... matches) Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.consumeToAny(String... seq) Consumes to the first sequence provided, or to the end of the queue.booleanPulls the next run of whitespace characters of the queue.(package private) charcurrent()static StringGiven a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be valid in a selector.booleanisEmpty()Is the queue empty?private static booleanisIdent(char c) private static booleanisIdentStart(char c) private static booleanisNewline(char c) private static booleanisNonAscii(char c) private static booleanisValidCodePoint(int codePoint) booleanmatchChomp(char c) If the queue matches the supplied (case-sensitive) character, consume it off the queue.booleanmatchChomp(String seq) If the queue case-insensitively matches the supplied string, consume it off the queue.booleanmatches(char c) Tests if the next character on the queue matches the character, case-sensitively.booleanTests if the next characters on the queue match the sequence, case-insensitively.booleanmatchesAny(char... seq) Tests if the next characters match any of the sequences, case-sensitively.private booleanmatchesCssIdentifier(char... matches) booleanTests if queue starts with a whitespace character.booleanTest if the queue matches a tag word character (letter or digit).Consume and return whatever is left on the queue.toString()static StringUnescape a \ escaped string.
-
Field Details
-
Esc
private static final char Esc- See Also:
-
Hyphen_Minus
private static final char Hyphen_Minus- See Also:
-
Unicode_Null
private static final char Unicode_Null- See Also:
-
Replacement
private static final char Replacement- See Also:
-
reader
-
ElementSelectorChars
private static final char[] ElementSelectorChars -
CssIdentifierChars
private static final char[] CssIdentifierChars
-
-
Constructor Details
-
TokenQueue
Create a new TokenQueue.- Parameters:
data- string of data to back queue.
-
-
Method Details
-
isEmpty
public boolean isEmpty()Is the queue empty?- Returns:
- true if no data left in queue.
-
consume
public char consume()Consume one character off queue.- Returns:
- first character on queue.
-
advance
public void advance()Drops the next character off the queue. -
current
char current() -
matches
Tests if the next characters on the queue match the sequence, case-insensitively.- Parameters:
seq- String to check queue for.- Returns:
- true if the next characters match.
-
matches
public boolean matches(char c) Tests if the next character on the queue matches the character, case-sensitively. -
matchesAny
public boolean matchesAny(char... seq) Tests if the next characters match any of the sequences, case-sensitively.- Parameters:
seq- list of chars to case-sensitively check for- Returns:
- true of any matched, false if none did
-
matchChomp
If the queue case-insensitively matches the supplied string, consume it off the queue.- Parameters:
seq- String to search for, and if found, remove from queue.- Returns:
- true if found and removed, false if not found.
-
matchChomp
public boolean matchChomp(char c) If the queue matches the supplied (case-sensitive) character, consume it off the queue. -
matchesWhitespace
public boolean matchesWhitespace()Tests if queue starts with a whitespace character.- Returns:
- if starts with whitespace
-
matchesWord
public boolean matchesWord()Test if the queue matches a tag word character (letter or digit).- Returns:
- if matches a word character
-
consume
Consumes the supplied sequence of the queue, case-insensitively. If the queue does not start with the supplied sequence, will throw an illegal state exception -- but you should be running match() against that condition.- Parameters:
seq- sequence to remove from head of queue.
-
consumeTo
Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.- Parameters:
seq- String to end on (and not include in return, but leave on queue). Case-sensitive.- Returns:
- The matched data consumed from queue.
-
consumeToAny
Consumes to the first sequence provided, or to the end of the queue. Leaves the terminator on the queue.- Parameters:
seq- any number of terminators to consume to. Case-insensitive.- Returns:
- consumed string
-
chompBalanced
Pulls a balanced string off the queue. E.g. if queue is "(one (two) three) four", (,) will return "one (two) three", and leave " four" on the queue. Unbalanced openers and closers can be quoted (with ' or ") or escaped (with \). Those escapes will be left in the returned string, which is suitable for regexes (where we need to preserve the escape), but unsuitable for contains text strings; use unescape for that.- Parameters:
open- openerclose- closer- Returns:
- data matched from the queue
-
unescape
Unescape a \ escaped string.- Parameters:
in- backslash escaped string- Returns:
- unescaped string
-
escapeCssIdentifier
Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be valid in a selector.- See Also:
-
appendEscaped
-
appendEscapedCodepoint
-
consumeWhitespace
public boolean consumeWhitespace()Pulls the next run of whitespace characters of the queue.- Returns:
- Whether consuming whitespace or not
-
consumeElementSelector
Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).- Returns:
- tag name
-
consumeCssIdentifier
Consume a CSS identifier (ID or class) off the queue.Note: For backwards compatibility this method supports improperly formatted CSS identifiers, e.g.
1instead of\31.- Returns:
- The unescaped identifier.
- Throws:
IllegalArgumentException- if an invalid escape sequence was found. Afterward, the state of the TokenQueue is undefined.- See Also:
-
consumeCssEscapeSequenceInto
-
isNonAscii
private static boolean isNonAscii(char c) -
isIdentStart
private static boolean isIdentStart(char c) -
isIdent
private static boolean isIdent(char c) -
isNewline
private static boolean isNewline(char c) -
isValidCodePoint
private static boolean isValidCodePoint(int codePoint) -
consumeEscapedCssIdentifier
-
matchesCssIdentifier
private boolean matchesCssIdentifier(char... matches) -
remainder
Consume and return whatever is left on the queue.- Returns:
- remainder of queue.
-
toString
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-