Class TokenQueue

java.lang.Object
org.jsoup.parser.TokenQueue
All Implemented Interfaces:
AutoCloseable

public class TokenQueue extends Object implements AutoCloseable
A character reader with helpers focusing on parsing CSS selectors. Used internally by jsoup. API subject to changes.
  • Field Details

    • Esc

      private static final char Esc
      See Also:
    • Hyphen_Minus

      private static final char Hyphen_Minus
      See Also:
    • Unicode_Null

      private static final char Unicode_Null
      See Also:
    • Replacement

      private static final char Replacement
      See Also:
    • reader

      private final CharacterReader reader
    • ElementSelectorChars

      private static final char[] ElementSelectorChars
    • CssIdentifierChars

      private static final char[] CssIdentifierChars
  • Constructor Details

    • TokenQueue

      public TokenQueue(String data)
      Create a new TokenQueue.
      Parameters:
      data - string of data to back queue.
  • Method Details

    • isEmpty

      public boolean isEmpty()
      Is the queue empty?
      Returns:
      true if no data left in queue.
    • consume

      public char consume()
      Consume one character off queue.
      Returns:
      first character on queue.
    • advance

      public void advance()
      Drops the next character off the queue.
    • current

      char current()
    • matches

      public boolean matches(String seq)
      Tests if the next characters on the queue match the sequence, case-insensitively.
      Parameters:
      seq - String to check queue for.
      Returns:
      true if the next characters match.
    • matches

      public boolean matches(char c)
      Tests if the next character on the queue matches the character, case-sensitively.
    • matchesAny

      public boolean matchesAny(char... seq)
      Tests if the next characters match any of the sequences, case-sensitively.
      Parameters:
      seq - list of chars to case-sensitively check for
      Returns:
      true of any matched, false if none did
    • matchChomp

      public boolean matchChomp(String seq)
      If the queue case-insensitively matches the supplied string, consume it off the queue.
      Parameters:
      seq - String to search for, and if found, remove from queue.
      Returns:
      true if found and removed, false if not found.
    • matchChomp

      public boolean matchChomp(char c)
      If the queue matches the supplied (case-sensitive) character, consume it off the queue.
    • matchesWhitespace

      public boolean matchesWhitespace()
      Tests if queue starts with a whitespace character.
      Returns:
      if starts with whitespace
    • matchesWord

      public boolean matchesWord()
      Test if the queue matches a tag word character (letter or digit).
      Returns:
      if matches a word character
    • consume

      public void consume(String seq)
      Consumes the supplied sequence of the queue, case-insensitively. If the queue does not start with the supplied sequence, will throw an illegal state exception -- but you should be running match() against that condition.
      Parameters:
      seq - sequence to remove from head of queue.
    • consumeTo

      public String consumeTo(String seq)
      Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.
      Parameters:
      seq - String to end on (and not include in return, but leave on queue). Case-sensitive.
      Returns:
      The matched data consumed from queue.
    • consumeToAny

      public String consumeToAny(String... seq)
      Consumes to the first sequence provided, or to the end of the queue. Leaves the terminator on the queue.
      Parameters:
      seq - any number of terminators to consume to. Case-insensitive.
      Returns:
      consumed string
    • chompBalanced

      public String chompBalanced(char open, char close)
      Pulls a balanced string off the queue. E.g. if queue is "(one (two) three) four", (,) will return "one (two) three", and leave " four" on the queue. Unbalanced openers and closers can be quoted (with ' or ") or escaped (with \). Those escapes will be left in the returned string, which is suitable for regexes (where we need to preserve the escape), but unsuitable for contains text strings; use unescape for that.
      Parameters:
      open - opener
      close - closer
      Returns:
      data matched from the queue
    • unescape

      public static String unescape(String in)
      Unescape a \ escaped string.
      Parameters:
      in - backslash escaped string
      Returns:
      unescaped string
    • escapeCssIdentifier

      public static String escapeCssIdentifier(String in)
      Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be valid in a selector.
      See Also:
    • appendEscaped

      private static void appendEscaped(StringBuilder out, char c)
    • appendEscapedCodepoint

      private static void appendEscapedCodepoint(StringBuilder out, char c)
    • consumeWhitespace

      public boolean consumeWhitespace()
      Pulls the next run of whitespace characters of the queue.
      Returns:
      Whether consuming whitespace or not
    • consumeElementSelector

      public String consumeElementSelector()
      Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).
      Returns:
      tag name
    • consumeCssIdentifier

      public String consumeCssIdentifier()
      Consume a CSS identifier (ID or class) off the queue.

      Note: For backwards compatibility this method supports improperly formatted CSS identifiers, e.g. 1 instead of \31.

      Returns:
      The unescaped identifier.
      Throws:
      IllegalArgumentException - if an invalid escape sequence was found. Afterward, the state of the TokenQueue is undefined.
      See Also:
    • consumeCssEscapeSequenceInto

      private void consumeCssEscapeSequenceInto(StringBuilder out)
    • isNonAscii

      private static boolean isNonAscii(char c)
    • isIdentStart

      private static boolean isIdentStart(char c)
    • isIdent

      private static boolean isIdent(char c)
    • isNewline

      private static boolean isNewline(char c)
    • isValidCodePoint

      private static boolean isValidCodePoint(int codePoint)
    • consumeEscapedCssIdentifier

      private String consumeEscapedCssIdentifier(char... matches)
    • matchesCssIdentifier

      private boolean matchesCssIdentifier(char... matches)
    • remainder

      public String remainder()
      Consume and return whatever is left on the queue.
      Returns:
      remainder of queue.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable