Package org.jsoup.parser
Class CharacterReader
java.lang.Object
org.jsoup.parser.CharacterReader
- All Implemented Interfaces:
AutoCloseable
CharacterReader consumes tokens off a string. Used internally by jsoup. API subject to changes.
If the underlying reader throws an IOException during any operation, the CharacterReader will throw an
UncheckedIOException. That won't happen with String / StringReader inputs.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static interface -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final SoftPool<char[]> (package private) static final intprivate intprivate intprivate intprivate char[]private int(package private) static final charprivate intprivate intprivate Stringprivate intprivate static final intprivate Readerprivate boolean(package private) static final intprivate static final intprivate String[]private static final int -
Constructor Summary
ConstructorsConstructorDescriptionCharacterReader(Reader input) CharacterReader(Reader input, int sz) CharacterReader(String input) -
Method Summary
Modifier and TypeMethodDescriptionvoidadvance()Moves the current position by one.private voidbufferUp()private static StringcacheString(char[] charBuf, String[] stringCache, int start, int count) Caches short strings, as a flyweight pattern, to reduce GC load.voidclose()intGet the current column number (that the reader has consumed to).(package private) intcolumnNumber(int pos) charconsume()Consume one character off the queue.(package private) StringconsumeAttributeQuoted(boolean single) (package private) String(package private) String(package private) String(package private) String(package private) String(package private) StringRead characters while the input predicate returns true.(package private) StringconsumeMatching(CharacterReader.CharPredicate func, int maxLength) Read characters while the input predicate returns true, up to a maximum length.(package private) String(package private) StringconsumeTo(char c) Reads characters up to the specific char.Reads the characters up to (but not including) the specified case-sensitive string.consumeToAny(char... chars) Read characters until the first of any delimiters is found.(package private) StringconsumeToAnySorted(char... chars) (package private) String(package private) booleancontainsIgnoreCase(String seq) Used to check presence of , when we're in RCData and see a invalid input: '<'xxx.charcurrent()Get the char at the current position.private voidReads into the buffer.booleanisEmpty()Tests if all the content has been read.private booleanbooleanCheck if the tracking of newlines is enabled.intGet the current line number (that the reader has consumed to).(package private) intlineNumber(int pos) private intlineNumIndex(int pos) (package private) voidmark()(package private) booleanmatchConsume(String seq) (package private) boolean(package private) booleanmatches(char c) (package private) boolean(package private) booleanmatchesAny(char... seq) Tests if the next character in the queue matches any of the characters in the sequence, case sensitively.(package private) booleanmatchesAnySorted(char[] seq) (package private) booleanChecks if the current pos matches an ascii alpha (A-Z a-z) per https://infra.spec.whatwg.org/#ascii-alpha(package private) boolean(package private) booleanmatchesIgnoreCase(String seq) (package private) intnextIndexOf(char c) Returns the number of characters between the current position and the next instance of the input char(package private) intnextIndexOf(CharSequence seq) Returns the number of characters between the current position and the next instance of the input sequenceintpos()Gets the position currently read to in the content.(package private) StringGet a formatted string representing the current line and column positions.(package private) static booleanrangeEquals(char[] charBuf, int start, int count, String cached) Check if the value of the provided range equals the string.(package private) booleanrangeEquals(int start, int count, String cached) (package private) booleanTests if the buffer has been fully read.(package private) voidprivate voidScans the buffer for newline position, and tracks their location in newlinePositions.toString()voidtrackNewlines(boolean track) Enables or disables line number tracking.(package private) voidUnconsume one character (bufPos--).(package private) voidunmark()
-
Field Details
-
EOF
static final char EOF- See Also:
-
MaxStringCacheLen
private static final int MaxStringCacheLen- See Also:
-
StringCacheSize
private static final int StringCacheSize- See Also:
-
stringCache
-
StringPool
-
BufferSize
static final int BufferSize- See Also:
-
RefillPoint
static final int RefillPoint- See Also:
-
RewindLimit
private static final int RewindLimit- See Also:
-
reader
-
charBuf
private char[] charBuf -
bufPos
private int bufPos -
bufLength
private int bufLength -
fillPoint
private int fillPoint -
consumed
private int consumed -
bufMark
private int bufMark -
readFully
private boolean readFully -
BufferPool
-
newlinePositions
-
lineNumberOffset
private int lineNumberOffset -
lastIcSeq
-
lastIcIndex
private int lastIcIndex
-
-
Constructor Details
-
CharacterReader
-
CharacterReader
-
CharacterReader
-
-
Method Details
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-
bufferUp
private void bufferUp() -
doBufferUp
private void doBufferUp()Reads into the buffer. Will throw an UncheckedIOException if the underling reader throws an IOException.- Throws:
UncheckedIOException- if the underlying reader throws an IOException
-
mark
void mark() -
unmark
void unmark() -
rewindToMark
void rewindToMark() -
pos
public int pos()Gets the position currently read to in the content. Starts at 0.- Returns:
- current position
-
readFully
boolean readFully()Tests if the buffer has been fully read. -
trackNewlines
public void trackNewlines(boolean track) Enables or disables line number tracking. By default, will be off.Tracking line numbers improves the legibility of parser error messages, for example. Tracking should be enabled before any content is read to be of use.- Parameters:
track- set tracking on|off- Since:
- 1.14.3
-
isTrackNewlines
public boolean isTrackNewlines()Check if the tracking of newlines is enabled.- Returns:
- the current newline tracking state
- Since:
- 1.14.3
-
lineNumber
public int lineNumber()Get the current line number (that the reader has consumed to). Starts at line #1.- Returns:
- the current line number, or 1 if line tracking is not enabled.
- Since:
- 1.14.3
- See Also:
-
lineNumber
int lineNumber(int pos) -
columnNumber
public int columnNumber()Get the current column number (that the reader has consumed to). Starts at column #1.- Returns:
- the current column number
- Since:
- 1.14.3
- See Also:
-
columnNumber
int columnNumber(int pos) -
posLineCol
String posLineCol()Get a formatted string representing the current line and column positions. E.g.5:10indicating line number 5 and column number 10.- Returns:
- line:col position
- Since:
- 1.14.3
- See Also:
-
lineNumIndex
private int lineNumIndex(int pos) -
scanBufferForNewlines
private void scanBufferForNewlines()Scans the buffer for newline position, and tracks their location in newlinePositions. -
isEmpty
public boolean isEmpty()Tests if all the content has been read.- Returns:
- true if nothing left to read.
-
isEmptyNoBufferUp
private boolean isEmptyNoBufferUp() -
current
public char current()Get the char at the current position.- Returns:
- char
-
consume
public char consume()Consume one character off the queue.- Returns:
- first character on queue, or EOF if the queue is empty.
-
unconsume
void unconsume()Unconsume one character (bufPos--). MUST only be called directly after a consume(), and no chance of a bufferUp. -
advance
public void advance()Moves the current position by one. -
nextIndexOf
int nextIndexOf(char c) Returns the number of characters between the current position and the next instance of the input char- Parameters:
c- scan target- Returns:
- offset between current position and next instance of target. -1 if not found.
-
nextIndexOf
Returns the number of characters between the current position and the next instance of the input sequence- Parameters:
seq- scan target- Returns:
- offset between current position and next instance of target. -1 if not found.
-
consumeTo
Reads characters up to the specific char.- Parameters:
c- the delimiter- Returns:
- the chars read
-
consumeTo
Reads the characters up to (but not including) the specified case-sensitive string.If the sequence is not found in the buffer, will return the remainder of the current buffered amount, less the length of the sequence, such that this call may be repeated.
- Parameters:
seq- the delimiter- Returns:
- the chars read
-
consumeMatching
Read characters while the input predicate returns true.- Returns:
- characters read
-
consumeMatching
Read characters while the input predicate returns true, up to a maximum length.- Parameters:
func- predicate to testmaxLength- maximum length to read. -1 indicates no maximum- Returns:
- characters read
-
consumeToAny
Read characters until the first of any delimiters is found.- Parameters:
chars- delimiters to scan for- Returns:
- characters read up to the matched delimiter.
-
consumeToAnySorted
-
consumeData
String consumeData() -
consumeAttributeQuoted
-
consumeRawData
String consumeRawData() -
consumeTagName
String consumeTagName() -
consumeToEnd
String consumeToEnd() -
consumeLetterSequence
String consumeLetterSequence() -
consumeLetterThenDigitSequence
String consumeLetterThenDigitSequence() -
consumeHexSequence
String consumeHexSequence() -
consumeDigitSequence
String consumeDigitSequence() -
matches
boolean matches(char c) -
matches
-
matchesIgnoreCase
-
matchesAny
boolean matchesAny(char... seq) Tests if the next character in the queue matches any of the characters in the sequence, case sensitively.- Parameters:
seq- list of characters to check for- Returns:
- true if any matched, false if none did
-
matchesAnySorted
boolean matchesAnySorted(char[] seq) -
matchesAsciiAlpha
boolean matchesAsciiAlpha()Checks if the current pos matches an ascii alpha (A-Z a-z) per https://infra.spec.whatwg.org/#ascii-alpha- Returns:
- if it matches or not
-
matchesDigit
boolean matchesDigit() -
matchConsume
-
matchConsumeIgnoreCase
-
containsIgnoreCase
Used to check presence of , when we're in RCData and see a invalid input: '<'xxx. Only finds consistent case. -
toString
-
cacheString
Caches short strings, as a flyweight pattern, to reduce GC load. Just for this doc, to prevent leaks. Simplistic, and on hash collisions just falls back to creating a new string, vs a full HashMap with Entry list. That saves both having to create objects as hash keys, and running through the entry list, at the expense of some more duplicates. -
rangeEquals
Check if the value of the provided range equals the string. -
rangeEquals
-