All Classes and Interfaces
Class
Description
Abstract parent class for analysis factories
TokenizerFactory,
TokenFilterFactory and CharFilterFactory.
SmartChineseAnalyzer abstract dictionary implementation.
Base class for payload encoders.
Base implementation for
PagedMutable and PagedGrowableWriter.
This class is the base of
QueryConfigHandler and FieldConfig.This class should be extended by nodes intending to represent range queries.
An object whose RAM usage can be computed.
Helper methods for constructing nested resource descriptions
and debugging RAM usage.
This class acts as the base class for the implementations of the first
normalization of the informative content in the DFR framework.
Model of the information gain based on the ratio of two Bernoulli processes.
Model of the information gain based on Laplace's law of succession.
This collector specializes in collecting the most relevant document (group head) for each
group that matches the query.
Represents a group head.
Specialized implementation for sorting by score
General implementation using a
FieldComparator to select the group headA collector that collects all groups that match the
query.
This processor verifies if
StandardQueryConfigHandler.ConfigurationKeys.ALLOW_LEADING_WILDCARD is defined in the
QueryConfigHandler.This exception is thrown when there is an attempt to
access something that has already been closed.
This is the rev 502 of the Snowball SVN trunk,
now located at GitHub,
but modified:
made abstract and introduced abstract method stem to avoid expensive reflection in filter class.
Provides a base class for analysis based offset strategies to extend from.
Wraps an
Analyzer and string text that represents multiple values delimited by a specified character.Helper class for loading named SPIs from classpath (e.g.
An Analyzer builds TokenStreams, which analyze text.
Strategy defining how TokenStreamComponents are reused per call to
Analyzer.tokenStream(String, java.io.Reader).This class encapsulates the outer components of a token stream.
Manages analysis data configuration for SmartChineseAnalyzer
This processor verifies if
StandardQueryConfigHandler.ConfigurationKeys.ANALYZER
is defined in the QueryConfigHandler.Extension to
Analyzer suitable for Analyzers which wrap
other Analyzers.Analyzes the input text and then suggests matches based
on prefix matches to any tokens in the indexed text.
Suggester that first analyzes the surface form, adds the
analyzed form to a weighted FST, and then does the same
thing at lookup time.
Factory for conjunctions
A
AndQueryNode represents an AND boolean operation performed on a
list of nodes.A
AnyQueryNode represents an ANY operator performed on a list of
nodes.Builds a BooleanQuery of SHOULD clauses, possibly with
some minimum number to match.
Strips all characters after an apostrophe (including the apostrophe itself).
Factory for
ApostropheFilter.Analyzer for Arabic.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies ArabicNormalizer to normalize the orthography.Factory for
ArabicNormalizationFilter.Normalizer for Arabic.
A
TokenFilter that applies ArabicStemmer to stem Arabic words..Factory for
ArabicStemFilter.Stemmer for Arabic.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Analyzer for Armenian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
An
InPlaceMergeSorter for object arrays.An
IntroSorter for object arrays.A
TimSorter for object arrays.Methods for manipulating arrays.
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
Factory for
ASCIIFoldingFilter.Base interface for attributes.
An AttributeFactory creates instances of
AttributeImpls.Expert: AttributeFactory returning an instance of the given
clazz for the
attributes it implements.Base class for Attributes that can be added to a
AttributeSource.This interface is used to reflect contents of
AttributeSource or AttributeImpl.An AttributeSource contains a list of different
AttributeImpls,
and methods to add and get them.This class holds the state of an AttributeSource.
Construction of basic automata.
Represents an automaton and all its states and transitions.
Records new states and transitions and then
Automaton.Builder.finish() creates the Automaton.Automaton provider for
RegExp.
RegExp.toAutomaton(AutomatonProvider,int)A
Query that will match terms against a finite-state machine.A FilteredTermsEnum that enumerates terms based upon what is accepted by a
DFA.
Calculate the final score as the average score of all payloads seen.
Axiomatic approaches for IR.
F1EXP is defined as Sum(tf(term_doc_freq)*ln(docLen)*IDF(term))
where IDF(t) = pow((N+1)/df(t), k) N=total num of docs, df=doc freq
F1LOG is defined as Sum(tf(term_doc_freq)*ln(docLen)*IDF(term))
where IDF(t) = ln((N+1)/df(t)) N=total num of docs, df=doc freq
F2EXP is defined as Sum(tfln(term_doc_freq, docLen)*IDF(term))
where IDF(t) = pow((N+1)/df(t), k) N=total num of docs, df=doc freq
F2EXP is defined as Sum(tfln(term_doc_freq, docLen)*IDF(term))
where IDF(t) = ln((N+1)/df(t)) N=total num of docs, df=doc freq
F3EXP is defined as Sum(tf(term_doc_freq)*IDF(term)-gamma(docLen, queryLen))
where IDF(t) = pow((N+1)/df(t), k) N=total num of docs, df=doc freq
gamma(docLen, queryLen) = (docLen-queryLen)*queryLen*s/avdl
NOTE: the gamma function of this similarity creates negative scores
F3EXP is defined as Sum(tf(term_doc_freq)*IDF(term)-gamma(docLen, queryLen))
where IDF(t) = ln((N+1)/df(t)) N=total num of docs, df=doc freq
gamma(docLen, queryLen) = (docLen-queryLen)*queryLen*s/avdl
NOTE: the gamma function of this similarity creates negative scores
Base utility class for implementing a
CharFilter.Base class for implementing
CompositeReaders based on an array
of sub-readers.Base implementation for a concrete
Directory that uses a LockFactory for locking.Attribute for
Token.getBaseForm().Attribute for
Token.getBaseForm().A abstract implementation of
FragListBuilder.Base FragmentsBuilder implementation that supports colored pre/post
tags and multivalued fields.
A base TermsEnum that adds default implementations for
BaseTermsEnum.attributes()
BaseTermsEnum.termState()
BaseTermsEnum.seekExact(BytesRef)
BaseTermsEnum.seekExact(BytesRef, TermState)
In some cases, the default implementation may be slow and consume huge memory, so subclass SHOULD have its own
implementation if possible.This class acts as the base class for the specific basic model
implementations in the DFR framework.
Geometric as limiting form of the Bose-Einstein model.
An approximation of the I(ne) model.
The basic tf-idf model of randomness.
Tf-idf model of randomness, based on a mixture of Poisson and inverse
document frequency.
Factory for creating basic term queries
Stores all statistics commonly used ranking methods.
Analyzer for Basque.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
TokenFilter for Beider-Morse phonetic encoding.
Factory for
BeiderMorseFilter.Analyzer for Bengali.
Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies BengaliNormalizer to normalize the
orthography.Factory for
BengaliNormalizationFilter.Normalizer for Bengali.
A
TokenFilter that applies BengaliStemmer to stem Bengali words.Factory for
BengaliStemFilter.Stemmer for Bengali.
An indexed 128-bit
BigInteger field.SmartChineseAnalyzer Bigram dictionary.
Base class for a binary-encoded in-memory dictionary.
Base class for a binary-encoded in-memory dictionary.
Used to specify where (dictionary) resources get loaded from.
Used to specify where (dictionary) resources get loaded from.
A per-document numeric value.
Field that stores a per-document
BytesRef value.A
DocValuesFieldUpdates which holds updates of documents, of a single
BinaryDocValuesField.Buffers up pending byte[] per doc, then flushes when
segment flushes.
An indexed binary field for fast range filters.
Binds variable names in expressions to actual data.
Graph representing possible token pairs (bigrams) at each start offset in the sentence.
Interface for Bitset-like structures.
Bits impl of the specified length with all bits set.
Bits impl of the specified length with no bits set.
Base implementation for a bit set.
A
DocIdSetIterator which iterates over set bits in a
bit set.A producer of
BitSets per segment.A producer of
Bits per segment.Exposes a slice of an existing Bits as a new Bits.
Static helper methods for
FST.Arc.BitTable.A variety of high efficiency bit twiddling routines.
Basic parameters for indexing points on the BKD tree.
Offline Radix selector for BKD tree.
Sliced reference to points in an PointWriter.
Handles intersection of an multi-dimensional shape in byte[] space with a block KD-tree previously written with
BKDWriter.Reusable
DocIdSetIterator to handle low cardinality leaves.Used to track all state for a single call to
BKDReader.intersect(org.apache.lucene.index.PointValues.IntersectVisitor).Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller
and smaller N-dim rectangles (cells) until the number of points in a given
rectangle is <=
config.maxPointsInLeafNode.flat representation of a kd-tree
Extension of the AnalyzingInfixSuggester which transforms the weight
after search to take into account the position of the searched term into
the indexed text.
The different types of blender.
A
Query that blends index statistics across multiple terms.A Builder for
BlendedTermQuery.A
BlendedTermQuery.RewriteMethod that creates a DisjunctionMaxQuery out
of the sub queries.A
BlendedTermQuery.RewriteMethod defines how queries for individual terms should
be merged.Decodes the raw bytes of a block when the index is read, according to the
BlockEncoder used during the writing of the index.Encodes the raw bytes of a block when the index is written.
Writable byte buffer.
BlockGroupingCollector performs grouping with a
single pass collector, as long as you are grouping by a
doc block field, ie all documents sharing a given group
value were indexed as a doc block using the atomic
IndexWriter.addDocuments()
or IndexWriter.updateDocuments()
API.Block header containing block metadata.
Reads/writes block header.
Select a value from a block of documents.
Type of selection to perform.
One term block line.
Reads/writes block lines with terms encoded incrementally inside a block.
Scorer for conjunctions that checks the maximum scores of each clause in
order to potentially skip over blocks that can't have competitive matches.
DocIdSetIterator that skips non-competitive docs by checking
the max score of the provided Scorer for the current block.Provides random access to a stream written with
BlockPackedWriter.Reader for sequences of longs written with
BlockPackedWriter.A writer for large sequences of longs.
Seeks the block corresponding to a given term, read the block bytes, and
scans the block terms.
Handles a terms dict, but decouples all details of
doc/freqs/positions reading to an instance of
PostingsReaderBase.Holds all state required for
PostingsReaderBase
to produce a PostingsEnum without re-seeking the
terms dict.Writes terms dict, block-encoding (column stride) each
term's metadata for each set of terms between two
index terms.
Uses
OrdsBlockTreeTermsWriter with Lucene84PostingsWriter.A block-based terms index and dictionary that assigns
terms to variable length blocks according to how they
share prefixes.
Block-based terms index and dictionary writer.
Writes blocks in the block file.
Class used to create index-time
FuzzySet appropriately configured for
each field.
A
PostingsFormat useful for low doc-frequency fields such as primary
keys.A
Query that treats multiple fields as a single stream and scores
terms as if you had indexed them as a single term in a single field.A builder for
BM25FQuery.A classifier approximating naive bayes classifier by using pure queries on BM25.
BM25 Similarity.
Collection statistics for the BM25 model.
Abstract
FunctionValues implementation which supports retrieving boolean values.A clause in a BooleanQuery.
Specifies how clauses are to occur in matching documents.
A
BooleanModifierNode has the same behaviour as
ModifierQueryNode, it only indicates that this modifier was added by
BooleanQuery2ModifierNodeProcessor and not by the user.
This processor is used to apply the correct
ModifierQueryNode to BooleanQueryNodes children.A Query that matches documents matching boolean combinations of other
queries, e.g.
A builder for boolean queries.
Thrown when an attempt is made to add more than
BooleanQuery.getMaxClauseCount() clauses.
This processor is used to apply the correct
ModifierQueryNode to
BooleanQueryNodes children.Builder for
BooleanQueryA
BooleanQueryNode represents a list of elements which do not have an
explicit boolean operator defined between them.Builds a
BooleanQuery object from a BooleanQueryNode object.BulkScorer that is used for pure disjunctions and disjunctions
that have low values of BooleanQuery.Builder.setMinimumNumberShouldMatch(int)
and dense clauses.Simple similarity that gives terms a score that is equal to their query
boost.
This processor removes every
BooleanQueryNode that contains only one
child and returns this child.Expert: the Weight for BooleanQuery, used to
normalize, score and explain these queries.
Abstract parent class for those
ValueSource implementations which
apply boolean logic to their valuesAdd this
Attribute to a TermsEnum returned by MultiTermQuery.getTermsEnum(Terms,AttributeSource)
and update the boost on each returned term.Implementation class for
BoostAttribute.Builder for
PayloadScoreQueryA
Query wrapper that allows to give a boost to the wrapped query.A
BoostQueryNode boosts the QueryNode tree which is under this node.This builder basically reads the
Query object set on the
BoostQueryNode child using
QueryTreeBuilder.QUERY_TREE_BUILDER_TAGID and applies the boost value
defined in the BoostQueryNode.This processor iterates the query node tree looking for every
FieldableNode that has StandardQueryConfigHandler.ConfigurationKeys.BOOST in its
config.Finds fragment boundaries: pluggable into
BaseFragmentsBuilderAnalyzer for Brazilian Portuguese language.A
TokenFilter that applies BrazilianStemmer.Factory for
BrazilianStemFilter.A stemmer for Brazilian Portuguese words.
A
BoundaryScanner implementation that uses BreakIterator to find
boundaries in the text.Wraps RuleBasedBreakIterator, making object reuse convenient and
emitting a rule status for emoji sequences.
Wraps another
Checksum with an internal buffer
to speed up checksum calculations.Simple implementation of
ChecksumIndexInput that wraps
another input and delegates calls.Base implementation class for buffered
IndexInput.Implementation of an IndexInput that reads from a portion of a file.
This wrapper buffers incoming elements.
Holds buffered deletes and updates, by docID, term or query for a
single segment.
Tracks the stream of
FrozenBufferedUpdates.Tracks the contiguous range of packets that have finished resolving.
Holds all per-segment internal state used while resolving deletions.
This class is a workaround for JDK bug
JDK-8252739.
Builds a minimal FST (maps an IntsRef term to an arbitrary
output) from pre-sorted terms with outputs.
Expert: holds a pending (seen but not yet serialized) arc.
Reusable buffer for building nodes with fixed length arcs (binary search or direct addressing).
Expert: holds a pending (seen but not yet serialized) Node.
Analyzer for Bulgarian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer
class accesses the static final set the first time.;
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words.Factory for
BulgarianStemFilter.Light Stemmer for Bulgarian.
Efficient sequential read/write of packed integers.
Non-specialized
BulkOperation for PackedInts.Format.PACKED.Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Efficient sequential read/write of packed integers.
Non-specialized
BulkOperation for PackedInts.Format.PACKED_SINGLE_BLOCK.This class is used to score a range of documents at
once, and is returned by
Weight.bulkScorer(org.apache.lucene.index.LeafReaderContext).DataInput backed by a byte array.
DataOutput backed by a byte array.
Class that Posting and PostingVector use to write byte
streams into shared fixed-size byte[] arrays.
Abstract class for allocating and freeing byte
blocks.
A simple
ByteBlockPool.Allocator that never recycles.A simple
ByteBlockPool.Allocator that never recycles, but
tracks how much total RAM is in use.A guard that is created for every
ByteBufferIndexInput that tries on best effort
to reject any access to the ByteBuffer behind, once it is unmapped.Pass in an implementation of this interface to cleanup ByteBuffers.
Base IndexInput implementation that uses an array
of ByteBuffers to represent a file.
This class adds offset support to ByteBufferIndexInput, which is needed for slices.
Optimization of ByteBufferIndexInput for when there is only one buffer
A
DataOutput storing data in a list of ByteBuffers.An implementation of a
ByteBuffer allocation and recycling policy.A
ByteBuffer-based Directory implementation that
can be used to store index files on the heap.An
IndexOutput writing to a ByteBuffersDataOutput.Automaton representation for matching UTF-8 byte[].
An FST
Outputs implementation where each output
is a sequence of bytes.Class to write byte streams into slices of shared
byte[].
Represents byte[], as a slice (offset + length) into an
existing byte[].
A simple append only random-access
BytesRef array that stores full
copies of the appended bytes in a ByteBlockPool.An extension of
BytesRefIterator that allows retrieving the index of the current elementUsed to iterate the elements of an array in a given order.
A builder for
BytesRef instances.Specialized
BytesRef comparator that
FixedLengthBytesRefArray.iterator(Comparator) has optimizations
for.An implementation for retrieving
FunctionValues instances for string based fields.Enumerates all input (BytesRef) + output pairs in an
FST.
Holds a single input (BytesRef) + output pair.
BytesRefHash is a special purpose hash-map like data-structure
optimized for BytesRef instances.Manages allocation of the per-term addresses.
A simple
BytesRefHash.BytesStartArray that tracks
memory allocation using a private Counter
instance.A simple iterator interface for
BytesRef iteration.Collects
BytesRef and then allows one to iterate over their sorted order.This attribute can be used if you have the raw term bytes to be indexed.
Implementation class for
BytesTermAttribute.This class implements a simple byte vector with access to the underlying
array.
Caches all docs, and optionally also scores, coming from
a search, and is then able to replay them to another
collector.
This expression value source shares one value cache when generating
ExpressionFunctionValues
such that only one value along the whole generation tree is corresponding to one nameA simplistic Lucene based NaiveBayes classifier, with caching feature, see
http://en.wikipedia.org/wiki/Naive_Bayes_classifierThis class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
Class used to match candidate queries selected by a Presearcher from a Monitor
query index.
A filter to apply normal capitalization rules to Tokens.
Factory for
CapitalizationFilter.Analyzer for Catalan.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
A Cell is a portion of a trie.
Character category data.
Character category data.
Automaton representation for matching char[].
Utility class to write tokenizers or token filters.
A simple IO buffer to use with
CharacterUtils.fill(CharacterBuffer, Reader).Wraps a char[] as CharacterIterator for processing with a BreakIterator
A CharacterIterator used internally for use with
BreakIteratorA simple class that stores key Strings as char[]'s in a
hash table.
Empty
CharArrayMap.UnmodifiableCharArrayMap optimized for speed.Matches a character array
A simple class that stores Strings as char[]'s in a
hash table.
Subclasses of CharFilter can be chained to filter a Reader
They can be used as
Reader with additional offset
correction.Abstract parent class for analysis factories that create
CharFilter
instances.An FST
Outputs implementation where each output
is a sequence of characters.Represents char[], as a slice (offset + length) into an existing char[].
Deprecated.
This comparator is only a transition mechanism
A builder for
CharsRef instances.This interface describes a character stream that maintains line and
column number positions of the characters.
This interface describes a character stream that maintains line and
column number positions of the characters.
This interface describes a character stream that maintains line and
column number positions of the characters.
The term text of a Token.
Default implementation of
CharTermAttribute.An abstract base class for simple, character-oriented tokenizers.
Internal SmartChineseAnalyzer character type constants.
This class implements a simple char vector with access to the underlying
array.
Basic tool and API to check the health of an index and
write a new segments file that removes reference to
problematic segments.
Run-time configuration options for CheckIndex commands.
Returned from
CheckIndex.checkIndex() detailing the health and status of the index.Status from testing DocValues
Status from testing field infos.
Status from testing field norms.
Status from testing index sort
Status from testing livedocs
Status from testing PointValues
Holds the status of each segment in the index.
Status from testing stored fields.
Status from testing term index.
Status from testing stored fields.
Walks the entire N-dimensional points space, verifying that all points fall within the last cell's boundaries.
Utility class to check a block join index.
Extension of IndexInput, computing checksum as it goes.
Represents a circle on the earth's surface.
2D circle implementation containing spatial logic.
An
Analyzer that tokenizes text with StandardTokenizer,
normalizes content with CJKWidthFilter, folds case with
LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter,
and filters stopwords with StopFilterForms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
Factory for
CJKBigramFilter.A
CharFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
Factory for
CJKWidthCharFilter.A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
Factory for
CJKWidthFilter.Filters
ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of
English stop words.Normalizes tokens extracted with
ClassicTokenizer.Factory for
ClassicFilter.Expert: Historical scoring implementation.
A grammar-based tokenizer constructed with JFlex
Factory for
ClassicTokenizer.This class implements the classic lucene StandardTokenizer up until 3.0
The result of a call to
Classifier.assignClass(String) holding an assigned class of type T and a score.A classifier, see
http://en.wikipedia.org/wiki/Classifier_(mathematics), which assign classes of type
TSimple
ResourceLoader that uses ClassLoader.getResourceAsStream(String)
and Class.forName(String,boolean,ClassLoader) to open resources and
classes, respectively.Java's builtin ThreadLocal has a serious flaw:
it can take an arbitrarily long amount of time to
dereference the things you had stored in it, even once the
ThreadLocal instance itself is no longer referenced.
Encodes/decodes an inverted index segment.
This static holder class prevents classloading deadlock by delaying
init of default codecs and available codecs until needed.
LeafReader implemented by codec APIs.
Utility class for reading and writing versioned headers.
Removes words that are too long or too short from the stream.
Factory for
CodepointCountFilter.Extension of
CharTermAttributeImpl that encodes the term
text as a binary Unicode collation key instead of as UTF-8 bytes.
Converts each token into its
CollationKey, and then
encodes the bytes as an index term.Indexes collation keys as a single-valued
SortedDocValuesField.
Configures
KeywordTokenizer with CollationAttributeFactory.Expert: representation of a group in
FirstPassGroupingCollector,
tracking the top doc and FieldComparator slot.Contains statistics for a collection (field).
Throw this exception in
LeafCollector.collect(int) to prematurely
terminate collection of the current leaf.Methods for manipulating (sorting) collections.
Expert: Collectors are primarily meant to be used to
gather raw results from a search, and implement sorting
or custom result filtering, collation, etc.
A manager of collectors.
Default implementation of
MemoryTracker that tracks
allocations and allows setting a memory limit per collectorA suggestion generated by combining one or more original query terms
Class containing some useful methods used by command line tools
Construct bigrams for frequently occurring terms while indexing.
Constructs a
CommonGramsFilter.Wrap a CommonGramsFilter optimizing phrase queries by only returning single
words when they are not a member of a bigram.
Construct
CommonGramsQueryFilter.Configuration options common across queryparser implementations.
A query that executes high-frequency terms in a optional sub-query to prevent
slow queries due to "common" terms like stopwords.
Base class for comparison operators useful within an "if"/conditional.
This class accumulates the (freq, norm) pairs that may produce competitive scores.
The Compile class is used to compile a stemmer table.
Immutable class holding compiled details for a given
Automaton.
Automata are compiled into different internal forms for the
most efficient execution depending upon the language they accept.
CompletionPostingsFormat
for org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat.Wraps an
Analyzer
to provide additional completion-only tuning
(e.g.
Weighted FSTs for any indexed
SuggestField is built on CompletionFieldsConsumer.write(Fields,NormsProducer).
Completion index (.cmp) is opened and read at instantiation to read in
SuggestField
numbers and their FST offsets in the Completion dictionary (.lkp).
A
PostingsFormat which supports document suggestion based on
indexed SuggestFields.An enum that allows to control if suggester FSTs are loaded into memory or read off-heap
Abstract
Query that match documents containing terms with a specified prefix
filtered by BitsProducer.Expert: Responsible for executing the query against an
appropriate suggester and collecting the results
via a collector.
Holder for suggester and field-level info
for a suggest field
Wrapped
Terms
used by SuggestField and ContextSuggestField
to access corresponding suggester and their attributesA
ConcatenateGraphFilter but we can set the payload and provide access to config options.Expert: the Weight for CompletionQuery, used to
score and explain these queries.
QueryParser which permits complex phrase query syntax eg "(john jon
jonathan~) peters*".
2D Geometry object that supports spatial relationships with bounding boxes,
triangles and points.
Used by withinTriangle to check the within relationship between a triangle and the query shape
(e.g.
2D multi-component geometry implementation represented as an interval tree of components.
Base class for composite queries (such as AND/OR/NOT)
An internal BreakIterator for multilingual text, following recommendations
from: UAX #29: Unicode Text Segmentation.
Instances of this reader type can only
be used to get stored fields from the underlying LeafReaders,
but it is not possible to directly retrieve postings.
IndexReaderContext for CompositeReader instance.A read-only
Directory that consists of a view over a compound file.Encodes/decodes compound files
Base class for decomposition token filters.
A
StoredFieldsFormat that compresses documents in chunks in
order to improve the compression ratio.A serialized document, you need to decode its input in order to get an actual
Document.A
TermVectorsFormat that compresses chunks of documents together in
order to improve the compression ratio.Compression algorithm used for suffixes of a block of terms.
A compression mode.
A data compressor.
Concatenates/Joins every incoming token with a separator into one output token for every path through the
token stream (which is a graph).
Attribute providing access to the term builder and UTF-16 conversion
Implementation of
ConcatenateGraphFilter.BytesRefBuilderTermAttributeJust escapes the
ConcatenateGraphFilter.SEP_LABEL byte with an extra.Factory for
ConcatenateGraphFilter.A TokenStream that takes an array of input TokenStreams as sources, and
concatenates them together.
A
MergeScheduler that runs each merge using a
separate thread.Utility class for concurrently loading queries into a Monitor.
Allows skipping TokenFilters based on the current set of attributes.
Abstract parent class for analysis factories that create
ConditionalTokenFilter instancesAn instance of this class represents a key that is used to retrieve a value
from
AbstractQueryConfig.Utility class to generate the confusion matrix of a
Classifiera confusion matrix, backed by a
Map representing the linearized matrixA conjunction of DocIdSetIterators.
A conjunction of DocIdSetIterators.
Conjunction between a
DocIdSetIterator and one or more BitSetIterators.TwoPhaseIterator implementing a conjunction.Scorer for conjunctions, sets of queries, all of which are required.
Common super class for multiple sub spans required in a document.
n-gram connection cost data
n-gram connection cost data
Some useful constants.
A query that wraps another query and simply returns a constant score equal to
1 for every document that matches the query.
We return this as our
BulkScorer so that if the CSQ
wraps a query with its own optimized top-level
scorer (e.g.Builder for
ConstantScoreQueryA constant-scoring
Scorer.A Weight that has a constant score equal to the boost of the wrapped query.
ConstNumberSource is the base class for all constant numbersConstValueSource returns a constant for all documentsA
CompletionQuery that matches documents specified by
a wrapped CompletionQuery supporting boosting and/or filtering
by specified contexts.Holder for context value meta data
SuggestField which additionally takes in a set of
contexts.The
ContextSuggestField.PrefixTokenFilter wraps a TokenStream and adds a set
prefixes ahead.Utility class that runs a thread to manage periodicc
reopens of a
ReferenceManager, with methods to wait for a specific
index changes to become visible.Assembles a QueryBuilder which uses only core Lucene Query objects
Assembles a QueryBuilder which uses Query objects from
Lucene's
sandbox and queries
modules in addition to core queries.Assembles a QueryBuilder which uses Query objects from
Lucene's
queries module in addition to core queries.This exception is thrown when Lucene detects
an inconsistency in the index.
Simple counter class
A
Query that allows to have a configurable number or required
matches per document.A
Scorer whose number of matches is per-document.Utility class for parsing CSV text
Utility class for parsing CSV text
A general-purpose Analyzer that can be created with a builder-style API.
Builder for
CustomAnalyzer.Factory class for a
ConditionalTokenFilterBuilds a
QueryTree for a query that needs custom treatment
The default query analyzers will use the QueryVisitor API to extract
terms from queries.A
BreakIterator that breaks the text whenever a certain separator, provided as a constructor argument, is found.Analyzer for Czech language.A
TokenFilter that applies CzechStemmer to stem Czech words.Factory for
CzechStemFilter.Light Stemmer for Czech.
Builds a minimal, deterministic
Automaton that accepts a set of
strings.DFSA state with
char labels on transitions.Create tokens for phonetic matches based on Daitch–Mokotoff Soundex.
Factory for
DaitchMokotoffSoundexFilter.Analyzer for Danish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Abstract base class for performing read operations of Lucene's low-level
data types.
Abstract base class for performing write operations of Lucene's low-level
data types.
Utility class for creating training / test / cross validation indexes from the original index.
Filters all tokens that cannot be parsed to a date, using the provided
DateFormat.Factory for
DateRecognizerFilter.Provides support for converting dates to strings and vice-versa.
Specifies the time granularity.
Folds all Unicode digits in
[:General_Category=Decimal_Number:]
to Basic Latin digits (0-9).Factory for
DecimalDigitFilter.A token that was generated from a compound.
A decompressor.
Default policy is to allocate a bitset with 10% saturation given a unique term per document.
Simple
Encoder implementation that does not modify the outputDefault
ICUTokenizerConfig that is generally applicable
to many languages.Default general purpose indexing chain, which handles
indexing all types of fields.
Creates a formatted snippet from the top passages.
This processor verifies if
StandardQueryConfigHandler.ConfigurationKeys.PHRASE_SLOP
is defined in the QueryConfigHandler.ValueSource implementation which only returns the values from the provided
ValueSources which are available for a particular docId.A compression mode that trades speed for compression ratio.
An analyzer wrapper, that doesn't allow to wrap components or readers.
A
DeletedQueryNode represents a node that was deleted from the query
node tree.Characters before the delimiter are the "token", those after are the boost.
Factory for
DelimitedBoostTokenFilter.Characters before the delimiter are the "token", those after are the payload.
Factory for
DelimitedPayloadTokenFilter.Characters before the delimiter are the "token", the textual integer after is the term frequency.
Factory for
DelimitedTermFrequencyTokenFilter.TermState serializer which encodes each file pointer as a delta relative
to a base file pointer.Implements the Divergence from Independence (DFI) model based on Chi-square statistics
(i.e., standardized Chi-squared distance from independence in term frequency tf).
Implements the divergence from randomness (DFR) framework
introduced in Gianni Amati and Cornelis Joost Van Rijsbergen.
In-memory structure for the dictionary (.dic) and affix (.aff)
data of a hunspell dictionary.
Dictionary interface for retrieving morphological data
by id.
Dictionary interface for retrieving morphological data
by id.
A simple interface representing a Dictionary.
Implementation of
Dictionary.FlagParsingStrategy that assumes each flag is encoded as two ASCII characters whose codes
must be combined into a single character.Abstraction of the process of parsing flags taken from the affix and dic files
A morpheme extracted from a compound token.
Implementation of
Dictionary.FlagParsingStrategy that assumes each flag is encoded in its numerical form.Simple implementation of
Dictionary.FlagParsingStrategy that treats the chars in each String as a individual flags.Tool to build dictionaries.
Tool to build dictionaries.
Format of the dictionary.
A
TokenFilter that decomposes compound words found in many Germanic languages.Factory for
DictionaryCompoundWordTokenFilter.A token stored in a
Dictionary.The Diff object generates a patch string.
The DiffIt class is a means generate patch commands from an already prepared
stemmer table.
Direct wrapping of 16-bits values to a backing array.
Direct wrapping of 32-bits values to a backing array.
Direct wrapping of 64-bits values to a backing array.
Direct wrapping of 8-bits values to a backing array.
Writer for
DirectDocValuesFormatIn-memory docvalues format that does no (or very little)
compression.
Reader for
DirectDocValuesFormatRetrieves an instance previously written by
DirectMonotonicWriter.In-memory metadata that needs to be kept around for
DirectMonotonicReader to read data from disk.Write monotonically-increasing sequences of integers.
A
Directory provides an abstraction layer for storing a
list of files.DirectoryReader is an implementation of
CompositeReader
that can read indexes in a Directory.Wraps
Lucene84PostingsFormat format for on-disk
storage, but then at read time loads and stores all
terms and postings directly in RAM as byte[], int[].Retrieves an instance previously written by
DirectWriterSimple automaton-based spellchecker.
Holds a spelling correction for internal usage inside
DirectSpellChecker.Class for writing packed integers to be directly read from Directory.
A priority queue of DocIdSetIterators that orders by current doc ID.
A priority queue of DocIdSetIterators that orders by current doc ID.
Wrapper used in
DisiPriorityQueue.A
DocIdSetIterator which is a disjunction of the approximations of
the provided iterators.A
DocIdSetIterator which is a disjunction of the approximations of
the provided iterators.A
MatchesIterator that combines matches from a set of sub-iterators
Matches are sorted by their start positions, and then by their end positions, so that
prefixes sort first.A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum
score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
Builder for
DisjunctionMaxQueryThe Scorer for DisjunctionMaxQuery.
A helper to propagate block boundaries for disjunctions.
Base class for Scorers that score disjunctions.
A Scorer for OR like queries, counterpart of
ConjunctionScorer.Factory for NEAR queries
Interface for queries that can be nested as subqueries
into a span near.
A second pass grouping collector that keeps track of distinct values for a specified field for the top N group.
Returned by
DistinctValuesCollector.getGroups(),
representing the value and set of distinct values for the group.The probabilistic distribution used to model term occurrence
in information-based models.
Log-logistic distribution.
The smoothed power-law (SPL) distribution for the information-based framework
that is described in the original paper.
A
TopDocsCollector that controls diversity in results by ensuring no
more than maxHitsPerKey results from a common source are collected in the
final results.An extension to ScoreDoc that includes a key used for grouping purposes
Function to divide "a" by "b"
Comparator that sorts by asc _doc
DocFreqValueSource returns the number of documents containing the term.Utility class to help merging documents from sub-readers according to either simple
concatenated (unsorted) order, or by a specified index-time sort, skipping
deleted documents and remapping non-deleted documents.
Represents one sub-reader being merged
A DocIdSet contains a set of doc ids.
A builder of
DocIdSets.Utility class to efficiently add many docs in one go.
This abstract class defines methods to iterate over a set of non-decreasing
doc ids.
Accumulator for documents that have a value for a field.
Serves as base class for FunctionValues based on DocTermsIndex.
Custom Exception to be thrown when the DocTermsIndex for a field cannot be generated
utility class for converting Lucene
Documents to Double vectors.Documents are the unit of indexing and search.
A classifier, see
http://en.wikipedia.org/wiki/Classifier_(mathematics), which assign classes of type
T to a Documents
Dictionary with terms, weights, payload (optional) and contexts (optional)
information taken from stored/indexed fields in a Lucene index.
A
StoredFieldVisitor that creates a Document from stored fields.This class accepts multiple added documents and directly
writes segment files.
DocumentsWriterDeleteQueue is a non-blocking linked pending deletes
queue.This class controls
DocumentsWriterPerThread flushing during
indexing.The IndexingChain must define the
DocumentsWriterPerThread.IndexingChain.getChain(int, SegmentInfo, Directory, FieldInfos.Builder, LiveIndexWriterConfig, Consumer) method
which returns the DocConsumer that the DocumentsWriter calls to process the
documents.DocumentsWriterPerThreadPool controls DocumentsWriterPerThread instances
and their thread assignments during indexing.Controls the health status of a
DocumentsWriter sessions.
Dictionary with terms and optionally payload and
optionally contexts information
taken from stored fields in a Lucene index.
This class contains utility methods and constants for DocValues
Abstract API that consumes numeric, binary and
sorted docvalues.
Tracks state of one binary sub-reader that we are merging
A merged
TermsEnum.Tracks state of one numeric sub-reader that we are merging
Tracks state of one sorted sub-reader that we are merging
Tracks state of one sorted numeric sub-reader that we are merging
Tracks state of one sorted set sub-reader that we are merging
A
Query that matches documents that have a value for a given field
as reported by doc values iterators.Holds updates of a single DocValues field, for a set of documents within one segment.
An iterator over documents and their updated values.
Encodes/decodes per-document values.
This static holder class prevents classloading deadlock by delaying
init of doc values formats until needed.
Like
DocValuesTermsQuery, but this query only
runs on a long NumericDocValuesField or a
SortedNumericDocValuesField, matching
all documents whose value in the specified field is
contained in the provided set of long values.Abstract API that produces numeric, binary, sorted, sortedset,
and sortednumeric docvalues.
Rewrites MultiTermQueries into a filter, using DocValues for term enumeration.
Holds statistics for a DocValues field.
Holds DocValues statistics for a numeric field storing
double values.Holds DocValues statistics for a numeric field storing
long values.Holds statistics for a numeric DocValues field.
Holds statistics for a sorted DocValues field.
Holds DocValues statistics for a sorted-numeric field storing
double values.Holds DocValues statistics for a sorted-numeric field storing
long values.Holds statistics for a sorted-numeric DocValues field.
Holds statistics for a sorted-set DocValues field.
A
Collector which computes statistics for a DocValues field.A
Query that only accepts documents whose
term value in the specified field is contained in the
provided set of allowed terms.DocValues types.
An in-place update to a DocValues field.
An in-place update to a binary DocValues field
An in-place update to a numeric DocValues field
Helper methods for parsing XML
Comparator based on
Double.compare(double, double) for numHits.Function that returns a constant double value for every document.
Abstract
FunctionValues implementation which supports retrieving double values.Syntactic sugar for encoding doubles as NumericDocValues
via
Double.doubleToRawLongBits(double).Obtains double field values from
LeafReader.getNumericDocValues(java.lang.String) and makes
those values available as other numeric types, casting as needed.Filter for DoubleMetaphone (supporting secondary codes)
Factory for
DoubleMetaphoneFilter.An indexed
double field for fast range filters.Builder for multi range queries for DoublePoints
An indexed Double Range field.
Represents a contiguous range of double values, with an inclusive minimum and
exclusive maximum
DocValues field for DoubleRange.
Groups double values into ranges
A GroupSelector implementation that groups documents by double values
Per-segment, per-document double values, which can be calculated at search-time
Base class for producing
DoubleValues
To obtain a DoubleValues object for a leaf reader, clients should call
DoubleValuesSource.rewrite(IndexSearcher) against the top-level searcher, and then
call DoubleValuesSource.getValues(LeafReaderContext, DoubleValues) on the resulting
DoubleValuesSource.Abstract
ValueSource implementation which wraps two ValueSources
and applies an extendible float function to their values.This builder does nothing.
Analyzer for Dutch language.This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Creates new instances of
EdgeNGramTokenFilter.Tokenizes the given token into n-grams of given size(s).
Tokenizes the input from an edge into n-grams of given size(s).
Creates new instances of
EdgeNGramTokenizer.Internal tree node: represents geometry edge from [x1, y1] to [x2, y2].
Removes elisions from a
TokenStream.Factory for
ElisionFilter.Abstract base class implementing a
DocValuesProducer that has no doc values.An always exhausted token stream.
Encodes original text.
Analyzer for English.A
TokenFilter that applies EnglishMinimalStemmer to stem
English words.Factory for
EnglishMinimalStemFilter.Minimal plural stemmer for English.
TokenFilter that removes possessives (trailing 's) from words.
Factory for
EnglishPossessiveFilter.This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Obtains int field values from
LeafReader.getNumericDocValues(java.lang.String) and makes
those values available as other numeric types, casting as needed.A parser needs to implement
EscapeQuerySyntax to allow the QueryNode
to escape the queries, when the toQueryString method is called.Type of escaping: String for escaping syntax,
NORMAL for escaping reserved words (like AND) in terms
Implementation of
EscapeQuerySyntax for the standard lucene
syntax.Analyzer for Estonian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
The
ExitableDirectoryReader wraps a real index DirectoryReader and
allows for a QueryTimeout implementation object to be checked periodically
to see if the thread should exit or not.Wrapper class for another FilterAtomicReader.
Wrapper class for another PointValues implementation that is used by ExitableFields.
Wrapper class for a SubReaderWrapper that is used by the ExitableDirectoryReader.
Wrapper class for another Terms implementation that is used by ExitableFields.
Wrapper class for TermsEnum that is used by ExitableTerms for implementing an
exitable enumeration of terms.
Exception that is thrown to prematurely terminate a term enumeration.
A query match containing the score explanation of the match
Expert: Describes the score computation for document and query.
Base class that computes the value of an expression for a document.
A
DoubleValues which evaluates an expressionA
Rescorer that uses an expression to re-score
first pass hits.The
ExtendableQueryParser enables arbitrary query parser extension
based on a customizable field naming scheme.Wraps an IntervalIterator and extends the bounds of its intervals
Useful for specifying gaps in an ordered iterator; if you want to match
`a b [2 spaces] c`, you can search for phrase(a, extended(b, 0, 2), c)
An interval with prefix bounds extended by n will skip over matches that
appear in positions lower than n
ExtensionQuery holds all query components extracted from the original
query string like the query field and the extension query string.The
Extensions class represents an extension mapping to associate
ParserExtension instances with extension keys.This class represents a generic pair.
Builds and iterates over sequences stored on disk.
Iterate over byte refs in a file.
An efficient implementation of JavaCC's CharStream interface.
An efficient implementation of JavaCC's CharStream interface.
An efficient implementation of JavaCC's CharStream interface.
Another highlighter implementation.
A
DoubleValuesSource instance which can be used to read the values of a feature from a
FeatureField for documents.Field that can be used to store static scoring factors into
documents.Sorts using the value of a specified feature name from a
FeatureField.Expert: directly create a field for a document.
Specifies whether and how a field should be stored.
A query node implements
FieldableNode interface to indicate that its
children and itself are associated to a specific field.This listener listens for every field configuration request and assign a
StandardQueryConfigHandler.ConfigurationKeys.BOOST to the
equivalent FieldConfig based on a defined map: fieldName -> boostValue stored in
StandardQueryConfigHandler.ConfigurationKeys.FIELD_BOOST_MAP.A base class for ValueSource implementations that retrieve values for
a single field from DocValues.
Expert: a FieldComparator compares hits so as to determine their
sort order when collecting the top results with
TopFieldCollector.Sorts by descending relevance.
Sorts by field's natural Term sort order, using
ordinals.
Sorts by field's natural Term sort order.
Provides a
FieldComparator for custom field sorting.This class represents a field configuration.
This interface should be implemented by classes that wants to listen for
field configuration requests.
This listener listens for every field configuration request and assign a
StandardQueryConfigHandler.ConfigurationKeys.DATE_RESOLUTION to the equivalent FieldConfig based
on a defined map: fieldName -> DateTools.Resolution stored in
StandardQueryConfigHandler.ConfigurationKeys.FIELD_DATE_RESOLUTION_MAP.Expert: A ScoreDoc which also contains information about
how to sort the referenced document.
FieldFragList has a list of "frag info" that is used by FragmentsBuilder class
to create fragments (snippets).
List of term offsets + weight for a frag info
Represents the list of term offsets for some text
Internal highlighter abstraction that operates on a per field basis.
Access to the Field Info file that describes document fields and whether or
not they are indexed.
Collection of
FieldInfos (accessible by number or by name).Encodes/decodes
FieldInfosThis class tracks the number and position / offset parameters of terms
being added to the index.
Wrapper to allow
SpanQuery objects participate in composite
single-field SpanQueries by 'lying' about their search field.Metadata and stats for one field in the index.
Reads/writes field metadata.
Pair of
FieldMetadata and BlockTermState for a specific field.Ultimately returns an
OffsetsEnum yielding potentially highlightable words in the text.FieldPhraseList has a list of WeightedPhraseInfo that is used by FragListBuilder
to create a FieldFragList object.
Represents the list of term offsets and boost for some text
Term offsets (start + end)
FieldQuery breaks down query object into terms/phrases and keeps
them in a QueryPhraseMap structure.
Internal structure of a query for highlighting: represents
a nested query structure
A
FieldQueryNode represents a element that contains field/text tupleBuilds a
TermQuery object from a FieldQueryNode object.BlockTree's implementation of
Terms.Provides a
Terms index for fields that have it, and lists which fields do.Abstract API that consumes terms, doc, freq, prox, offset and
payloads postings.
Efficient index format for block-based
Codecs.Abstract API that produces terms, doc, freq, prox, offset and
payloads postings.
Forms an OR query of the provided query across multiple fields.
Iterates over terms in across multiple fields.
FieldTermStack is a stack that keeps query terms in the specified field
of the document to be highlighted.Single term with its position/offsets in the document and IDF weight.
Describes the properties of a field.
This class efficiently buffers numeric and binary field updates and stores
terms, values and metadata in a memory efficient way without creating large amounts
of objects.
Struct like class that is used to iterate over all updates in this buffer
Expert: A hit queue for sorting by hits by terms in more than one field.
Extension of ScoreDoc to also store the
FieldComparator slot.An implementation of
FieldValueHitQueue which is optimized in case
there is more than one comparator.An implementation of
FieldValueHitQueue which is optimized in case
there is just one comparator.This interface should be implemented by
QueryNode that holds a field
and an arbitrary value.Dictionary represented by a text file.
Expert: A Directory instance that switches files between
two other Directory instances.
Simple
ResourceLoader that opens resource files
from the local file system, optionally resolving against
a base directory.Delegates all methods to a wrapped
BinaryDocValues.A codec that forwards all its method calls to another codec.
A
FilterCodecReader contains another CodecReader, which it
uses as its basic source of data, possibly transforming the data along the
way or providing additional functionality.Collector delegator.Directory implementation that delegates calls to another directory.
A FilterDirectoryReader wraps another DirectoryReader, allowing implementations
to transform or extend it.
Factory class passed to FilterDirectoryReader constructor that allows
subclasses to wrap the filtered DirectoryReader's subreaders.
Abstract decorator class of a DocIdSetIterator
implementation that provides on-demand filter/validation
mechanism on an underlying DocIdSetIterator.
An IntervalsSource that filters the intervals from another IntervalsSource
Abstract class for enumerating a subset of all terms.
Return value, if term should be accepted or the iteration should
END.Abstract base class for TokenFilters that may remove tokens.
An
Iterator implementation that filters elements with a boolean predicate.LeafCollector delegator.A
FilterLeafReader contains another LeafReader, which it
uses as its basic source of data, possibly transforming the data along the
way or providing additional functionality.Base class for filtering
Fields
implementations.Base class for filtering
PostingsEnum implementations.Base class for filtering
Terms implementations.Base class for filtering
TermsEnum implementations.A MatchesIterator that delegates all calls to another MatchesIterator
A wrapper for
MergePolicy instances.Delegates all methods to a wrapped
NumericDocValues.Filter a
Scorable, intercepting methods and optionally changing
their return values
The default implementation simply passes all calls to its delegate, with
the exception of Scorable.setMinCompetitiveScore(float) which defaults
to a no-op.A
FilterScorer contains another Scorer, which it
uses as its basic source of data, possibly transforming the data along the
way or providing additional functionality.Delegates all methods to a wrapped
SortedDocValues.Delegates all methods to a wrapped
SortedNumericDocValues.Delegates all methods to a wrapped
SortedSetDocValues.A
Spans implementation wrapping another spans instance,
allowing to filter spans matches easily by implementing FilterSpans.accept(org.apache.lucene.search.spans.Spans)Status returned from
FilterSpans.accept(Spans) that indicates
whether a candidate match should be accepted, rejected, or rejected
and move on to the next document.A
FilterWeight contains another Weight and implements
all abstract methods by calling the contained weight's method.Filter outputs a single token which is a concatenation of the sorted and
de-duplicated set of input tokens.
Factory for
FingerprintFilter.Iterates all accepted strings.
Nodes for path stack.
Analyzer for Finnish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words.Factory for
FinnishLightStemFilter.Light Stemmer for Finnish.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
FirstPassGroupingCollector is the first of two passes necessary
to collect grouped hits.
Deprecated.
Fix the token filters that create broken offsets in the first place.
Factory for
FixBrokenOffsetsFilter.Immutable twin of FixedBitSet.
BitSet of fixed length (numBits), backed by accessible (
FixedBitSet.getBits())
long[], accessed with an int index, implementing Bits and
DocIdSet.TermsIndexReader for simple every Nth terms indexes.
Selects every Nth term as and index term, and hold term
bytes (mostly) fully expanded in memory.
Just like
BytesRefArray except all values have the same length.A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
Factory for
FixedShingleFilter
Parameters are:
shingleSize - how many tokens should be combined into each shingle (default: 2)
tokenSeparator - how tokens should be joined together in the shingle (default: space)
fillerToken - what should be added in place of stop words (default: _ )
This attribute can be used to pass different flags down the
Tokenizer chain,
e.g.Default implementation of
FlagsAttribute.Converts an incoming graph token stream, such as one from
SynonymGraphFilter, into a flat form so that
all nodes form a single linear chain with no side paths.Holds all tokens leaving a given input position.
Gathers up merged input positions into a single output position,
only for the current "frontier" of nodes we've seen but can't yet
output because they are not frozen.
Factory for
FlattenGraphFilter.Comparator based on
Float.compare(float, float) for numHits.Abstract
FunctionValues implementation which supports retrieving float values.Syntactic sugar for encoding floats as NumericDocValues
via
Float.floatToRawIntBits(float).Encode a character array Float as a
BytesRef.Obtains float field values from
LeafReader.getNumericDocValues(java.lang.String) and makes those
values available as other numeric types, casting as needed.An indexed
float field for fast range filters.Builder for multi range queries for FloatPoints
KNN search on top of N dimensional indexed float points.
An indexed Float Range field.
DocValues field for FloatRange.
Default
FlushPolicy implementation that flushes new segments based on
RAM used and document count depending on the IndexWriter's
IndexWriterConfig.A FlushInfo provides information required for a FLUSH context.
FlushPolicy controls when segments are flushed from a RAM resident
internal data-structure to the IndexWriters Directory.Query wrapper that forces its wrapped Query to use the default doc-by-doc
BulkScorer.
Utility class to encode/decode increasing sequences of 128 integers.
Processes terms found in the original text, typically by applying some form
of mark-up to highlight terms in HTML search results pages.
Encode all values in normal area with fixed bit width,
which is determined by the max value in this block.
Reads from a single byte[].
FragListBuilder is an interface for FieldFragList builder classes.
Implements the policy for breaking text into multiple fragments for
consideration by the
Highlighter class.FragmentsBuilder is an interface for fragments (snippets) builder classes.Builds an ngram model from the text sent to
FreeTextSuggester.build(org.apache.lucene.search.suggest.InputIterator) and predicts based on the last grams-1 tokens in
the request sent to FreeTextSuggester.lookup(java.lang.CharSequence, boolean, int).Analyzer for French language.A
TokenFilter that applies FrenchLightStemmer to stem French
words.Factory for
FrenchLightStemFilter.Light Stemmer for French.
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words.Factory for
FrenchMinimalStemFilter.Light Stemmer for French.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Implements limited (iterators only, no stats)
Fields interface over the in-RAM buffered
fields/terms/postings, to flush postings through the
PostingsFormat.A
TimSorter which sorts two parallel arrays of doc IDs and
offsets in one go.A ring buffer that tracks the frequency of the integers that it contains.
A bag of integers.
Holds buffered deletes and updates by term or query, once pushed.
This class helps iterating a term dictionary and consuming all the docs for each terms.
Base class for Directory implementations that store index
files in the file system.
Base class for file system based locking implementation.
Represents an finite state machine (FST), using a
compact byte[] format.
Represents a single arc.
Helper methods to read the bit-table of a direct addressing node.
Reads bytes stored in an FST.
Specifies allowed range of each int input label for
this FST.
Finite state automata based implementation of "autocomplete" functionality.
A single completion for a given key.
Finite state automata based implementation of "autocomplete" functionality.
An adapter from
Lookup API to FSTCompletion.Immutable stateless
FST-based index dictionary kept in memory.Provides stateful
FSTDictionary.Browser to seek in the FSTDictionary.Builds an immutable
FSTDictionary.Can next() and advance() through the terms in an FST
A custom FST outputs implementation that stores block data
(BytesRef), long ordStart, long numTerms.
FST term dict + Lucene50PBF
Abstraction for reading/writing bytes necessary for FST.
An FST
Outputs implementation for
FSTTermsWriter.Represents the metadata for one term.
FST-based terms dictionary reader.
FST-based term dict, using metadata as FST output.
Holds a pair (automaton, fst) of states and accumulated output in the intersected machine.
A query that retrieves all documents with a
DoubleValues value matching a predicate
This query works by a linear scan of the index, and is best used in
conjunction with other queries that can restrict the number of
documents visitedReturns a score for each document based on a ValueSource,
often some function of the value of a field.
A Query wrapping a
ValueSource that matches docs in which the values in the value source match a configured
range.A query that wraps another query, and uses a DoubleValuesSource to
replace or modify the wrapped query's score
If the DoubleValuesSource doesn't return a value for a particular document,
then that document will be given a score of 0.
Represents field values as different types.
Abstraction of the logic required to fill the value of a specified doc into
a reusable
MutableValue.Additional methods from Java 9's
java.util.Arrays.Additional methods from Java 9's
java.util.Objects.Builds a set of CompiledAutomaton for fuzzy matching on a given term,
with specified maximum edit distance, fixed prefix and whether or not
to allow transpositions.
A
CompletionQuery that match documents containing terms
within an edit distance of the specified prefix.Configuration parameters for
FuzzyQuerysFuzzifies ALL terms provided as strings and then picks the best n differentiating terms.
Builder for
FuzzyLikeThisQueryImplements the fuzzy search query.
A
FuzzyQueryNode represents a element that contains
field/text/similarity tupleBuilds a
FuzzyQuery object from a FuzzyQueryNode object.This processor iterates the query node tree looking for every
FuzzyQueryNode, when this kind of node is found, it checks on the
query configuration for
StandardQueryConfigHandler.ConfigurationKeys.FUZZY_CONFIG, gets the
fuzzy prefix length and default similarity from it and set to the fuzzy node.
A class used to represent a set of many, potentially large, values (e.g.
Result from
FuzzySet.contains(BytesRef):
can never return definitively YES (always MAYBE),
but can sometimes definitely return NO.Implements a fuzzy
AnalyzingSuggester.Subclass of TermsEnum for enumerating all terms that are similar
to the specified filter term.
Used for sharing automata between segments
Levenshtein automata are large and expensive to build; we don't want to build
them directly on the query because this can blow up caches that use queries
as keys; we also don't want to rebuild them for every segment.
Thrown to indicate that there was an issue creating a fuzzy query for a given term.
Analyzer for Galician.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words.Factory for
GalicianMinimalStemFilter.Minimal Stemmer for Galician
A
TokenFilter that applies GalicianStemmer to stem
Galician words.Factory for
GalicianStemFilter.Galician stemmer implementing "Regras do lematizador para o galego".
The Gener object helps in the discarding of nodes which break the reduction
effort and defend the structure against large reductions.
reusable geopoint encoding methods
A predicate that checks whether a given point is within a component2D geometry.
A predicate that checks whether a given point is within a distance of another point.
Basic reusable geo-spatial utility methods
used to define the orientation of 3 points
-1 = Clockwise
0 = Colinear
1 = Counter-clockwise
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Analyzer for German language.A
TokenFilter that applies GermanLightStemmer to stem German
words.Factory for
GermanLightStemFilter.Light Stemmer for German.
A
TokenFilter that applies GermanMinimalStemmer to stem German
words.Factory for
GermanMinimalStemFilter.Minimal Stemmer for German.
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
Factory for
GermanNormalizationFilter.A
TokenFilter that stems German words.Factory for
GermanStemFilter.A stemmer for German words.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Utility to get document frequency and total number of occurrences (sum of the tf for each doc) of a term.
A collector that collects all ordinals from a specified field matching the query.
Formats text with different color intensity depending on the score of the
term.
An abstract TokenFilter that exposes its input stream as a graph
Call
GraphTokenFilter.incrementBaseToken() to move the root of the graph to the next
position in the TokenStream, GraphTokenFilter.incrementGraphToken() to move along
the current graph, and GraphTokenFilter.incrementGraph() to reset to the next graph
based at the current root.Consumes a TokenStream and creates an
Automaton where the transition labels are terms from
the TermToBytesRefAttribute.Outputs the dot (graphviz) string for the viterbi lattice.
Outputs the dot (graphviz) string for the viterbi lattice.
Analyzer for the Greek language.Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
Factory for
GreekLowerCaseFilter.A
TokenFilter that applies GreekStemmer to stem Greek
words.Factory for
GreekStemFilter.A stemmer for Greek words, according to: Development of a Stemmer for the
Greek Language. Georgios Ntais
Represents one group in the results.
Base class for computing grouped facets.
Represents a facet entry with a value and a count.
The grouped facet result.
Contains the local grouped segment counts for a particular segment.
Convenience class to perform grouping in a non distributed environment.
A
GroupQueryNode represents a location where the original user typed
real parenthesis on the query string.Builds no object, it only returns the
Query object set on the
GroupQueryNode object using a
QueryTreeBuilder.QUERY_TREE_BUILDER_TAGID tag.Concrete implementations of this class define what to collect for individual
groups during the second-pass of a grouping search.
Defines a group, for use by grouping collectors
A GroupSelector acts as an iterator over documents.
What to do with the current value
A
DataOutput that can be used to build a byte[].Implements
PackedInts.Mutable, but grows the
bit count of the underlying packed ints on-demand.An indexed
half-float field for fast range filters.This directory wrapper overrides
Directory.copyFrom(Directory, String, String, IOContext) in order
to optionally use a hard-link instead of a full byte by byte file copy if applicable.Base class for hashing functions that can be referred to by name.
Utility class to read buffered points from in-heap arrays.
Reusable implementation for a point value on-heap
Utility class to write new points into in-heap arrays.
Finds the optimal segmentation of a sentence into Chinese words
HighFreqTerms class extracts the top n most frequent terms
(by document frequency) from an existing Lucene index and reports their
document frequency.Compares terms by docTermFreq
Priority queue for TermStats objects
Compares terms by totalTermFreq
HighFrequencyDictionary: terms taken from the given field
of a Lucene index, which appear in a number of documents
above a given threshold.
Marks up highlighted terms found in the best sections of
text, using configurable
Fragmenter, Scorer, Formatter,
Encoder and tokenizers.QueryMatch object that contains the hit positions of a matching Query
Represents an individual hit
Analyzer for Hindi.
Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies HindiNormalizer to normalize the
orthography.Factory for
HindiNormalizationFilter.Normalizer for Hindi.
A
TokenFilter that applies HindiStemmer to stem Hindi words.Factory for
HindiStemFilter.Light Stemmer for Hindi.
Used for defining custom algorithms to allow searches to early terminate
Implementation of HitsThresholdChecker which allows global hit counting
Default implementation of HitsThresholdChecker to be used for single threaded execution
Tokenizer for Chinese or mixed Chinese-English text.
Factory for
HMMChineseTokenizerA CharFilter that wraps another Reader and attempts to strip out HTML constructs.
Factory for
HTMLStripCharFilter.Analyzer for Hungarian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words.Factory for
HungarianLightStemFilter.Light Stemmer for Hungarian.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
TokenFilter that uses hunspell affix rules and words to stem tokens.
TokenFilterFactory that creates instances of
HunspellStemFilter.This class represents a hyphen.
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines.
Factory for
HyphenatedWordsFilter.This class represents a hyphenated word.
A
TokenFilter that decomposes compound words found in many Germanic languages.Factory for
HyphenationCompoundWordTokenFilter.This tree structure stores the hyphenation patterns in an efficient way for
fast lookup.
Provides a framework for the family of information-based models, as described
in Stéphane Clinchant and Eric Gaussier.
Extension of
CharTermAttributeImpl that encodes the term
text as a binary Unicode collation key instead of as UTF-8 bytes.
Converts each token into its
CollationKey, and
then encodes bytes as an index term.Indexes collation keys as a single-valued
SortedDocValuesField.
Configures
KeywordTokenizer with ICUCollationAttributeFactory.A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
Factory for
ICUFoldingFilter.Normalize token text with ICU's
Normalizer2.Factory for
ICUNormalizer2CharFilterNormalize token text with ICU's
Normalizer2Factory for
ICUNormalizer2FilterBreaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Class that allows for tailored Unicode Text Segmentation on
a per-writing system basis.
Factory for
ICUTokenizer.A
TokenFilter that transforms text with ICU.Wrap a
CharTermAttribute with the Replaceable API.Factory for
ICUTransformFilter.Does nothing other than convert the char array to a byte array using the specified encoding.
Function that returns
#idf(long, long)
for every document.A PostingsFormat optimized for primary-key (ID) fields that also
record a version (long) for each ID, delivered as a payload
created by
IDVersionPostingsFormat.longToBytes(long, org.apache.lucene.util.BytesRef) during indexing.Iterates through terms in this field; this class is public so users
can cast it to call
IDVersionSegmentTermsEnum.seekExact(BytesRef, long) for
optimistic-concurrency, and also IDVersionSegmentTermsEnum.getVersion() to get the
version of the currently seek'd term.Depending on the boolean value of the
ifSource function,
returns the value of the trueSource or falseSource function.Per-document scoring factors.
Information about upcoming impacts, ie.
DocIdSetIterator that skips non-competitive docs thanks to the
indexed impacts.Extension of
PostingsEnum which also provides information about
upcoming impacts.Source of
Impacts.Computes the measure of divergence from independence for DFI
scoring functions.
Normalized chi-squared measure of distance from independence
Saturated measure of distance from independence
Standardized measure of distance from independence
Represents a single field for indexing.
Describes the properties of a field.
Expert: represents a single commit into an index as seen by the
IndexDeletionPolicy or IndexReader.Expert: policy for deletion of stale
index commits.Immutable stateless index dictionary kept in RAM.
Stateful
IndexDictionary.Browser to seek a term in this IndexDictionary
and get its corresponding block file pointer in the block file.Supplier for a new stateful
IndexDictionary.Browser created on the immutable IndexDictionary.Builds an immutable
IndexDictionary.Disk-based implementation of a
DocIdSetIterator which can return
the index of the current document, i.e.Disk-based implementation of a
DocIdSetIterator which can return
the index of the current document, i.e.Holds details for each commit point.
Tracks the reference count for a single index file:
This class contains useful constants representing filenames and extensions
used by lucene, as well as convenience methods for querying whether a file
name matches an extension (
matchesExtension), as well as generating file names from a segment name,
generation and extension (
fileNameFromGeneration,
segmentFileName).This exception is thrown when Lucene detects
an index that is newer than this Lucene version.
This exception is thrown when Lucene detects
an index that is too old for this Lucene version
Abstract base class for input from a file in a
Directory.Merges indices specified on the command line into the index
specified as the first command line argument.
Signals that no index was found in the Directory.
Controls how much information is stored in the postings lists.
A query that uses either an index structure (points or terms) or doc values
in order to run a query, depending which one is more efficient.
A
DataOutput for appending data to a file in a Directory.IndexReader is an abstract class, providing an interface for accessing a
point-in-time view of an index.
A utility class that gives hooks in order to help build a cache based on
the data that is contained in this index.
A cache key identifying a resource that is being cached on.
A listener that is called when a resource gets closed.
A struct like class that represents a hierarchical relationship between
IndexReader instances.Class exposing static helper methods for generating DoubleValuesSource instances
over some IndexReader statistics
Implements search over a single IndexReader.
A class holding a subset of the
IndexSearchers leaf contexts to be
executed within a single thread.Handles how documents should be sorted in an index, both within a segment and between
segments.
Used for sorting documents across segments
A comparator of doc IDs, used for sorting documents within a segment
Sorts documents based on double values from a NumericDocValues instance
Sorts documents based on float values from a NumericDocValues instance
Sorts documents based on integer values from a NumericDocValues instance
Sorts documents based on long values from a NumericDocValues instance
Provide a NumericDocValues instance for a LeafReader
Provide a SortedDocValues instance for a LeafReader
Sorts documents based on terms from a SortedDocValues instance
A range query that can take advantage of the fact that the index is sorted to speed up
execution.
A doc ID set iterator that wraps a delegate iterator and only returns doc IDs in
the range [firstDocInclusive, lastDoc).
Compares the given document's value with a stored reference value.
Command-line tool that enables listing segments in an
index, copying specific segments to another index, and
deleting segments from an index.
This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions
to the current segment file format.
An
IndexWriter creates and maintains an index.DocStats for this index
Interface for internal atomic events.
If
DirectoryReader.open(IndexWriter) has
been called (ie, this writer is in near real-time
mode), then after a merge completes, this class can be
invoked to warm the reader on the newly merged
segment, before the merge commits.Holds all the configuration that is used to create an
IndexWriter.Specifies the open mode for
IndexWriter.A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages.Factory for
IndicNormalizationFilter.Normalizes the Unicode representation of text in Indian languages.
Analyzer for Indonesian (Bahasa)
Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words.Factory for
IndonesianStemFilter.Stemmer for Indonesian.
An indexed 128-bit
InetAddress field.An indexed InetAddress Range Field
Attribute for Kuromoji inflection data.
Attribute for Kuromoji inflection data.
Debugging API for Lucene classes such as
IndexWriter
and SegmentInfos.An
BytesRefSorter that keeps all the entries in memory.Sorter implementation based on the merge-sort algorithm that merges
in place (no extra memory will be allocated).Interface for enumerating term,weight,payload triples for suggester consumption;
currently only
AnalyzingSuggester, FuzzySuggester and AnalyzingInfixSuggester support payloads.Wraps a BytesRefIterator as a suggester InputIterator, with all weights
set to
1 and carries no payloadA
DataInput wrapping a plain InputStream.A pool for int blocks similar to
ByteBlockPoolAbstract class for allocating and freeing int
blocks.
A simple
IntBlockPool.Allocator that never recycles.A
IntBlockPool.SliceReader that can read int slices written by a IntBlockPool.SliceWriterA
IntBlockPool.SliceWriter that allows to write multiple integer slices into a given IntBlockPool.Comparator based on
Integer.compare(int, int) for numHits.Abstract
FunctionValues implementation which supports retrieving int values.Encode a character array Integer as a
BytesRef.The "intersect"
TermsEnum response to UniformSplitTerms.intersect(CompiledAutomaton, BytesRef),
intersecting the terms with an automaton.Block iteration order.
This is used to implement efficient
Terms.intersect(org.apache.lucene.util.automaton.CompiledAutomaton, org.apache.lucene.util.BytesRef) for
block-tree.Wraps an
IntervalIterator and passes through those intervals that match the IntervalFilter.accept() functionA
DocIdSetIterator that also allows iteration over matching
intervals in a document.An extension of MatchesIterator that allows it to be treated as
an IntervalIterator
This is necessary to get access to
IntervalIterator.gaps()
and IntervalIterator.width() when constructing matchesA query that retrieves documents containing intervals returned from an
IntervalsSource
Static constructor functions for various different sources can be found in the
Intervals class
Scores for this query are computed as a function of the sloppy frequency of
intervals appearing in a particular document.Constructor functions for
IntervalsSource types
These sources implement minimum-interval algorithms taken from the paper
Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
By default, sources that are sensitive to internal gaps (e.g.A helper class for
IntervalQuery that provides an IntervalIterator
for a given field and segment
Static constructor functions for various different sources can be found in the
Intervals classObtains int field values from
LeafReader.getNumericDocValues(java.lang.String) and makes those
values available as other numeric types, casting as needed.An indexed
int field for fast range filters.Builder for multi range queries for IntPoints
An indexed Integer Range field.
DocValues field for IntRange.
Implementation of the quick select algorithm.
An FST
Outputs implementation where each output
is a sequence of ints.Represents int[], as a slice (offset + length) into an
existing int[].
A builder for
IntsRef instances.Enumerates all input (IntsRef) + output pairs in an
FST.
Holds a single input (IntsRef) + output pair.
Exception thrown if TokenStream Tokens are incompatible with provided text
IOContext holds additional details on the merge/search context.
Context is a enumerator which specifies the context in which the Directory
is being used for.
This is a result supplier that is allowed to throw an IOException.
This class emulates the new Java 7 "Try-With-Resources" statement.
An IO operation with a single input.
A Function that may throw an IOException
Analyzer for Irish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
Factory for
IrishLowerCaseFilter.This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Analyzer for Italian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words.Factory for
ItalianLightStemFilter.Light Stemmer for Italian.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Analyzer for Japanese that uses morphological analysis.
Atomically loads DEFAULT_STOP_SET, DEFAULT_STOP_TAGS in a lazy fashion once the
outer class accesses the static final set the first time.
Replaces term text with the
BaseFormAttribute.Factory for
JapaneseBaseFormFilter.Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
Factory for
JapaneseIterationMarkCharFilter.A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC).Factory for
JapaneseKatakanaStemFilter.A
TokenFilter that normalizes Japanese numbers (kansūji) to regular Arabic
decimal numbers in half-width characters.Buffer that holds a Japanese number string and a position index used as a parsed-to marker
Factory for
JapaneseNumberFilter.Removes tokens that match a set of part-of-speech tags.
Factory for
JapanesePartOfSpeechStopFilter.A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form.Factory for
JapaneseReadingFormFilter.Tokenizer for Japanese that uses morphological analysis.
Tokenization mode: this determines how the tokenizer handles
compound and unknown words.
Token type reflecting the original source of this token
Factory for
JapaneseTokenizer.Similarity measure for short strings such as person names.
Deprecated.
Migrate to one of the newer suggesters which are much more RAM efficient.
Deprecated.
Migrate to one of the newer suggesters which are much more RAM efficient.
An inner class of Ternary Search Trie that represents a node in the trie.
This class provides an empty implementation of
JavascriptVisitor,
which can be extended to create a visitor which only needs to handle a subset
of the available methods.An expression compiler for javascript expressions.
Overrides the ANTLR 4 generated JavascriptLexer to allow for proper error handling
Allows for proper error handling in the ANTLR 4 parser
This interface defines a complete generic visitor for a parse tree produced
by
JavascriptParser.Use a field value and find the Document Frequency within another field.
Utility for query time joining.
This
IndexDeletionPolicy implementation that
keeps only the most recent commit and immediately removes
all prior commits after a new commit is done.A TokenFilter that only keeps tokens with text contained in the
required words.
Factory for
KeepWordFilter."Tokenizes" the entire stream as a single token.
This attribute can be used to mark a token as a keyword.
Default implementation of
KeywordAttribute.Marks terms as keywords via the
KeywordAttribute.Factory for
KeywordMarkerFilter.This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with
KeywordAttribute.setKeyword(boolean) set to true and once set to false.Factory for
KeywordRepeatFilter.Emits the entire input as a single token.
Factory for
KeywordTokenizer.A k-Nearest Neighbor classifier based on
NearestFuzzyQuery.A k-Nearest Neighbor classifier (see
http://en.wikipedia.org/wiki/K-nearest_neighbors) based
on MoreLikeThisA k-Nearest Neighbor Document classifier (see
http://en.wikipedia.org/wiki/K-nearest_neighbors) based
on MoreLikeThis .Analyzer for Korean that uses morphological analysis.
A
TokenFilter that normalizes Korean numbers to regular Arabic
decimal numbers in half-width characters.Buffer that holds a Korean number string and a position index used as a parsed-to marker
Factory for
KoreanNumberFilter.Removes tokens that match a set of part-of-speech tags.
Factory for
KoreanPartOfSpeechStopFilter.Replaces term text with the
ReadingAttribute which is
the Hangul transcription of Hanja characters.Factory for
KoreanReadingFormFilter.Tokenizer for Korean that uses morphological analysis.
Decompound mode: this determines how the tokenizer handles
POS.Type.COMPOUND, POS.Type.INFLECT and POS.Type.PREANALYSIS tokens.Token type reflecting the original source of this token
Factory for
KoreanTokenizer.This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A list of words used by Kstem
A high-performance kstem filter for english.
Factory for
KStemFilter.This class implements the Kstem algorithm
Associates a label with a CharArrayMatcher to distinguish different sources for terms in highlighting
The lambda (λw) parameter in information-based
models.
Computes lambda as
docFreq+1 / numberOfDocuments+1.Computes lambda as
totalTermFreq+1 / numberOfDocuments+1.Optimized collector for large number of hits.
An indexed 2-Dimension Bounding Box field for the Geospatial Lat/Lon Coordinate system
Distance query for
LatLonDocValuesField.An per-document location field.
Finds all previously indexed geo points that comply the given
ShapeField.QueryRelation
with the specified array of LatLonGeometry.Lat/Lon Geometry object.
An indexed location field.
Compares documents by distance from an origin point
Distance query for
LatLonPoint.Holder class for prototype sandboxed queries
When the query graduates from sandbox, these static calls should be
placed in
LatLonPointFinds all previously indexed geo points that comply the given
ShapeField.QueryRelation with the
specified array of LatLonGeometry.Sorts by distance from an origin location.
An geo shape utility class for indexing and searching gis geometries
whose vertices are latitude, longitude values (in decimal degrees).
Finds all previously indexed geo shapes that intersect the specified bounding box.
Holds spatial logic for a bounding box that works in the encoded space
Finds all previously indexed geo shapes that comply the given
ShapeField.QueryRelation with the
specified array of LatLonGeometry.Analyzer for Latvian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies LatvianStemmer to stem Latvian
words.Factory for
LatvianStemFilter.Light stemmer for Latvian.
Defers actually loading a field's value until you ask
for it.
Collector decouples the score from the collected doc:
the score computation is skipped entirely if it's not
needed.
Expert: comparator that gets instantiated on each leaf
from a top-level
FieldComparator instance.Provides read-only metadata about a leaf.
LeafReader is an abstract class, providing an interface for accessing an
index.IndexReaderContext for LeafReader instances.Similarity.SimScorer on a specific LeafReader.Deprecated.
Use
BinaryDocValues instead.Deprecated.
Implement
BinaryDocValues directly.Deprecated.
BM25Similarity should be used insteadBridge helper methods for legacy codecs to map sorted doc values to iterables.
Random-access reader for
FieldsIndexWriter.Deprecated.
Use
NumericDocValues instead.Deprecated.
Implement
NumericDocValues directly.Deprecated.
Use
SortedDocValues instead.Deprecated.
Implement
SortedDocValues directly.Deprecated.
Use
SortedNumericDocValues instead.Deprecated.
Implement
SortedNumericDocValues directly.Deprecated.
Use
SortedSetDocValues instead.Deprecated.
Implement
SortedSetDocValues directly.Removes words that are too long or too short from the stream.
Factory for
LengthFilter.Wraps another
BreakIterator to skip past breaks that would result in passages that are too
short.A LetterTokenizer is a tokenizer that divides text at non-letters.
Factory for
LetterTokenizer.Parametric description for generating a Levenshtein automaton of degree 1
Parametric description for generating a Levenshtein automaton of degree 1,
with transpositions as primitive edits
Parametric description for generating a Levenshtein automaton of degree 2
Parametric description for generating a Levenshtein automaton of degree 2,
with transpositions as primitive edits
Class to construct DFAs that match a word within some edit distance.
A ParametricDescription describes the structure of a Levenshtein DFA for some degree n.
Levenshtein edit distance class.
The Lift class is a data structure that is a variation of a Patricia trie.
Builder for
MoreLikeThisQueryFiniteStringsIterator which limits the number of iterated accepted strings.This Analyzer limits the number of tokens while indexing.
This TokenFilter limits the number of tokens while indexing.
Factory for
LimitTokenCountFilter.Lets all tokens pass through until it sees one with a start offset <= a
configured limit, which won't pass and ends the stream.
This is a simplified version of org.apache.lucene.analysis.miscellaneous.LimitTokenOffsetFilter to prevent
a dependency on analyzers-common.jar.
Factory for
LimitTokenOffsetFilter.This TokenFilter limits its emitted tokens to those with positions that
are not greater than the configured limit.
Factory for
LimitTokenPositionFilter.Represents a line on the earth's surface.
2D geo line implementation represented as a balanced interval tree of edges.
LinearFloatFunction implements a linear function over
another ValueSource.Wraps another Outputs implementation and encodes one or
more of its output values.
Pass a the field value through as a String, no matter the type // Q: doesn't this mean it's a "string"?
Analyzer for Lithuanian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Format for live/deleted documents
Tracks live field values across NRT reader reopens.
Holds all the configuration used by
IndexWriter with few setters for
settings that can be changed on an IndexWriter instance "live".Bayesian smoothing using Dirichlet priors.
Language model based on the Jelinek-Mercer smoothing method.
Abstract superclass for language modeling Similarities.
A strategy for computing the collection language model.
Models
p(w|C) as the number of occurrences of the term in the
collection, divided by the total number of tokens + 1.Stores the collection distribution of the current term.
An interprocess mutex lock.
Base class for Locking implementation.
This exception is thrown when the
write.lock
could not be acquired.This exception is thrown when the
write.lock
could not be released.Simple standalone tool that forever acquires and releases a
lock using a specific
LockFactory.This class makes a best-effort check that a provided
Lock
is valid before any destructive filesystem operation.Simple standalone server that must be running when you
use
VerifyingLockFactory.This is a
LogMergePolicy that measures size of a
segment as the total byte size of the segment's files.This is a
LogMergePolicy that measures size of a
segment as the number of documents (not taking deletions
into account).This class implements a
MergePolicy that tries
to merge segments into levels of exponentially
increasing size, where each level has fewer segments than
the value of the merge factor.BitSet of fixed length (numBits), backed by accessible (
LongBitSet.getBits())
long[], accessed with a long index.Comparator based on
Long.compare(long, long) for numHits.Abstract
FunctionValues implementation which supports retrieving long values.Obtains long field values from
LeafReader.getNumericDocValues(java.lang.String) and makes those
values available as other numeric types, casting as needed.An indexed
long field for fast range filters.Builder for multi range queries for LongPoints
An indexed Long Range field.
Represents a contiguous range of long values, with an inclusive minimum and
exclusive maximum
DocValues field for LongRange.
Groups double values into ranges
A GroupSelector implementation that groups documents by long values
Represents long[], as a slice (offset + length) into an
existing long[].
Per-segment, per-document long values, which can be calculated at search-time
Abstraction over an array of longs.
Base class for producing
LongValues
To obtain a LongValues object for a leaf reader, clients should
call LongValuesSource.rewrite(IndexSearcher) against the top-level searcher, and
then LongValuesSource.getValues(LeafReaderContext, DoubleValues).Simple Lookup interface for
CharSequence suggestions.A
PriorityQueue collecting a fixed size of high priority Lookup.LookupResultResult of a lookup.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Utility class that can efficiently compress arrays that mostly contain
characters in the [0x1F,0x3F) or [0x5F,0x7F) ranges, which notably
include all digits, lowercase characters, '.', '-' and '_'.
Normalizes token text to lower case.
Normalizes token text to lower case.
Factory for
LowerCaseFilter.A
QueryCache that evicts queries using a LRU (least-recently-used)
eviction policy in order to remain under a given maximum size and number of
bytes used.A LSB Radix sorter for unsigned int values.
Lucene 5.0 compound file format
Class for accessing a compound stream.
Offset/Length for a slice inside of a compound file
Lucene 5.0 Field Infos format.
Lucene 5.0 live docs format
Lucene 5.0 postings format, which encodes postings in packed integer blocks
for fast decode.
Holds all state required for
Lucene50PostingsReader to produce a
PostingsEnum without re-seeking the terms dict.Concrete class that reads docId(maybe frq,pos,offset,payloads) list
with postings format.
Implements the skip list reader for block postings format
that stores positions and payloads.
Lucene 5.0 stored fields format.
Configuration option for stored fields.
Lucene 5.0
term vectors format.Lucene 6.0 Field Infos format.
Lucene 6.0 point format, which encodes dimensional values in a block KD-tree structure
for fast 1D range and N dimensional shape intersection filtering.
Reads point values previously written with Lucene60PointsWriter
Implements the Lucene 7.0 index format, with configurable per-field postings
and docvalues formats.
writer for
Lucene70DocValuesFormatLucene 7.0 DocValues format.
reader for
Lucene70DocValuesFormatWriter for
Lucene70NormsFormatLucene 7.0 Score normalization format.
Reader for
Lucene70NormsFormatLucene 7.0 Segment info format.
Implements the Lucene 8.0 index format.
writer for
Lucene80DocValuesFormatLucene 8.0 DocValues format.
Configuration option for doc values.
reader for
Lucene80DocValuesFormatWriter for
Lucene80NormsFormatLucene 8.0 Score normalization format.
Reader for
Lucene80NormsFormatImplements the Lucene 8.4 index format, with configurable per-field postings
and docvalues formats.
Lucene 5.0 postings format, which encodes postings in packed integer blocks
for fast decode.
Holds all state required for
Lucene84PostingsReader to produce a
PostingsEnum without re-seeking the terms dict.Concrete class that reads docId(maybe frq,pos,offset,payloads) list
with postings format.
Concrete class that writes docId(maybe frq,pos,offset,payloads) list
with postings format.
Implements the skip list reader for block postings format
that stores positions and payloads.
Write skip lists with multiple levels, and support skip within block ints.
Implements the Lucene 8.6 index format, with configurable per-field postings
and docvalues formats.
Lucene 8.6 point format, which encodes dimensional values in a block KD-tree structure
for fast 1D range and N dimensional shape intersection filtering.
Reads point values previously written with
Lucene86PointsWriterWrites dimensional values
Lucene 8.6 Segment info format.
Implements the Lucene 8.6 index format, with configurable per-field postings
and docvalues formats.
Configuration option for the codec.
Lucene 8.7 stored fields format.
Configuration option for stored fields.
Lucene Dictionary: terms taken from the given field
of a Lucene index.
Damerau-Levenshtein (optimal string alignment) implemented in a consistent
way as Lucene's FuzzyTermsEnum with the transpositions option enabled.
Lucene's package information, including version.
LZ4 compression and decompression routines.
Simple lossy
LZ4.HashTable that only stores the last ocurrence for
each hash on 2^14 bytes of memory.A record of previous occurrences of sequences of 4 bytes.
A higher-precision
LZ4.HashTable.A compression mode that compromises on the compression ratio to provide
fast compression and decompression.
Helper class for keeping Lists of Objects associated with keys.
A
Fields implementation that merges multiple
Fields into one, and maps around deleted documents.Simplistic
CharFilter that applies the mappings
contained in a NormalizeCharMap to the character
stream, and correcting the resulting changes to the
offsets.Factory for
MappingCharFilter.Exposes flex API, merged from flex API of sub-segments,
remapping docIDs (this is used for segment merging).
A query that matches all documents.
Builder for
MatchAllDocsQueryA
MatchAllDocsQueryNode indicates that a query node tree or subtree
will match all documents if executed in the index.Builds a
MatchAllDocsQuery object from a
MatchAllDocsQueryNode object.This processor converts every
WildcardQueryNode that is "*:*" to
MatchAllDocsQueryNode.Interface for the creation of new CandidateMatcher objects
Reports the positions and optionally offsets of all matching terms in a query
for a single document
To obtain a
MatchesIterator for a particular field, call Matches.getMatches(String).An iterator over match positions (and optionally offsets) for a single document and field
To iterate over the matches, call
MatchesIterator.next() until it returns false, retrieving
positions and/or offsets after each call.Contains static functions that aid the implementation of
Matches and
MatchesIterator interfaces.Class to hold the results of matching a single
Document
against queries held in the MonitorComputes which segments have identical field name to number mappings,
which allows stored fields and term vectors in this codec to be bulk-merged.
A query that matches no documents.
A
MatchNoDocsQueryNode indicates that a query node tree or subtree
will not match any documents if executed in the index.Builds a
MatchNoDocsQuery object from a
MatchNoDocsQueryNode object.Math static utility methods.
Returns the value of
IndexReader.maxDoc()
for every document.MaxFloatFunction returns the max of its components.Add this
Attribute to a fresh AttributeSource before calling
MultiTermQuery.getTermsEnum(Terms,AttributeSource).Implementation class for
MaxNonCompetitiveBoostAttribute.Returns the maximum payload score seen, else 1 if there are no payloads on the doc.
Maintains the maximum score and its corresponding document id concurrently
Compute maximum scores based on
Impacts and keep them in a cache in
order not to run expensive similarity score computations multiple times on
the same data.Utility class to propagate scoring information in
BooleanQuery, which
compute the score as the sum of the scores of its matching clauses.Bitset collector which supports memory tracking
High-performance single-document main memory Apache Lucene fulltext search index.
Uses an
Analyzer on content to get offsets and then populates a MemoryIndex.Tracks dynamic allocations/deallocations of memory for transient objects
Provides a merged sorted view from several sorted iterators.
A MergeInfo provides information required for a MERGE context.
Expert: a MergePolicy determines the sequence of
primitive merge operations.
Thrown when a merge was explicitly aborted because
IndexWriter.abortMerges() was called.This interface represents the current context of the merge selection process.
Exception thrown if there are any problems while executing a merge.
A MergeSpecification instance provides the information
necessary to perform multiple merges.
OneMerge provides the information necessary to perform
an individual primitive merge operation, resulting in
a single new segment.
Progress and state for an executing merge.
Reason for pausing the merge thread.
This is the
RateLimiter that IndexWriter assigns to each running merge, to
give MergeSchedulers ionice like control.This is a hack to make index sorting fast, with a
LeafReader that always returns merge instances when you ask for the codec readers.Expert:
IndexWriter uses an instance
implementing this interface to execute the merges
selected by a MergePolicy.Provides access to new merges and executes the actual merge
Holds common state used during segment merging.
A map of doc IDs.
MergeTrigger is passed to
MergePolicy.findMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext) to indicate the
event that triggered the merge.Message Interface for a lazy loading.
Default implementation of Message interface.
Docs iterator that starts iterating from a configurable minimum document
MinFloatFunction returns the min of its components.Generate min hash tokens from an incoming stream of tokens.
128 bits of state
Operations for minimizing automata.
Calculates the minimum payload seen
File-based
Directory implementation that uses
mmap for reading, and FSDirectory.FSIndexOutput for writing.A
ModifierQueryNode indicates the modifier value (+,-,?,NONE) for
each term on the query string.Modifier type: such as required (REQ), prohibited (NOT)
Builds no object, it only returns the
Query object set on the
ModifierQueryNode object using a
QueryTreeBuilder.QUERY_TREE_BUILDER_TAGID tag.Statistics for the query cache and query index
Encapsulates various configuration settings for a Monitor's query index
Defines a query to be stored in a Monitor
Serializes and deserializes MonitorQuery objects into byte streams
Use this for persistent query indexes
For reporting events on a Monitor's query index
Provides random access to a stream written with
MonotonicBlockPackedWriter.A writer for large monotonically increasing sequences of positive longs.
Generate "more like this" similarity queries.
PriorityQueue that orders words by score.
Use for frequencies and to avoid renewing Integers.
A simple wrapper for MoreLikeThis for use in scenarios where a Query object is required eg
in custom QueryParser extensions.
Radix sorter for variable-length strings.
Concatenates multiple Bits together, on every lookup.
Abstract
ValueSource implementation which wraps multiple ValueSources
and applies an extendible boolean function to their values.A
CollectorManager implements which wrap a set of CollectorManager
as MultiCollector acts for Collector.A wrapper for CompositeIndexReader providing access to DocValues.
Implements SortedDocValues over n subs, using an OrdinalMap
Implements MultiSortedSetDocValues over n subs, using an OrdinalMap
This processor is used to expand terms so the query looks for the same term
in different fields.
A QueryParser which constructs queries to search multiple fields.
Provides a single
Fields term index view over an
IndexReader.Abstract
ValueSource implementation which wraps multiple ValueSources
and applies an extendible float function to their values.Abstract parent class for
ValueSource implementations that wrap multiple
ValueSources and apply their own logic.Utility methods for working with a
IndexReader as if it were a LeafReader.This abstract class reads skip lists with multiple levels.
used to buffer the top skip levels
This abstract class writes skip lists with multiple levels.
Class to hold the results of matching a batch of
Documents
against queries held in the MonitorCopy of
LeafSimScorer that sums document's norms from multiple fields.This tool splits input index into multiple equal parts.
This class emulates deletions on the underlying index.
A TermFilteredPresearcher that indexes queries multiple times, with terms collected
from different routes through a querytree.
A generalized version of
PhraseQuery, with the possibility of
adding more than one term at the same position that are treated as a disjunction (OR).A builder for multi-phrase queries
Takes the logical union of multiple PostingsEnum iterators.
disjunction of postings ordered by docid.
queue of terms for a single document.
A
MultiPhraseQueryNode indicates that its children should be used to
build a MultiPhraseQuery instead of PhraseQuery.Builds a
MultiPhraseQuery object from a MultiPhraseQueryNode
object.Exposes
PostingsEnum, merged from PostingsEnum
API of sub-segments.Holds a
PostingsEnum along with the
corresponding ReaderSlice.Abstract class for range queries involving multiple ranges against physical points such as
IntPoints
All ranges are logically ORed together
TODO: Add capability for handling overlapping ranges at rewrite timeA builder for multirange queries.
Representation of a single clause in a MultiRangeQuery
A
CompositeReader which reads multiple indexes, appending
their content.A
Multiset is a set that allows for duplicate elements.Implements the CombSUM method for combining evidence from multiple
similarity values described in: Joseph A.
Support for highlighting multi-term queries.
An abstract
Query that matches documents
containing a subset of terms provided by a FilteredTermsEnum enumeration.Abstract class that defines how the query is rewritten.
A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD clause in a BooleanQuery, but adjusts
the frequencies used for scoring to be blended across the terms, otherwise
the rarest term typically ranks highest (often not useful eg in the set of
expanded terms in a FuzzyQuery).A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD clause in a BooleanQuery, but the scores
are only computed as the boost.A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD clause in a BooleanQuery, and keeps the
scores as computed by the query.This class also provides the functionality behind
MultiTermQuery.CONSTANT_SCORE_REWRITE.This processor instates the default
MultiTermQuery.RewriteMethod,
MultiTermQuery.CONSTANT_SCORE_REWRITE, for multi-term
query nodes.Exposes flex API, merged from flex API of
sub-segments.
The MultiTrie is a Trie of Tries.
The MultiTrie is a Trie of Tries.
Obtains double field values from
LeafReader.getSortedNumericDocValues(java.lang.String) and using a
SortedNumericSelector it gives a single-valued ValueSource view of a field.Obtains float field values from
LeafReader.getSortedNumericDocValues(java.lang.String) and using a
SortedNumericSelector it gives a single-valued ValueSource view of a field.Obtains int field values from
LeafReader.getSortedNumericDocValues(java.lang.String) and using a
SortedNumericSelector it gives a single-valued ValueSource view of a field.Obtains long field values from
LeafReader.getSortedNumericDocValues(java.lang.String) and using a
SortedNumericSelector it gives a single-valued ValueSource view of a field.A
ValueSource that abstractly represents ValueSources for
poly fields, and other things.This is a very fast, non-cryptographic hash suitable for general hash-based
lookup.
Utility APIs for sorting and partitioning buffered points.
PointValues whose order of points can be changed.Base class for all mutable values.
MutableValue implementation of type boolean.MutableValue implementation of type Date.MutableValue implementation of type double.MutableValue implementation of type float.MutableValue implementation of type int.MutableValue implementation of type long.MutableValue implementation of type String.Utility class to help extract the set of sub queries that have matched from
a larger query.
Helper class for loading named SPIs from classpath (e.g.
Interface to support
NamedSPILoader.lookup(String) by name.A default
ThreadFactory implementation that accepts the name prefix
of the created threads as a constructor argument.Implements
LockFactory using native OS file
locks.Provides JNI access to native methods such as madvise() for
NativeUnixDirectoryA
Directory implementation for all Unixes that uses
DIRECT I/O to bypass OS level IO caching during
merging.Simplification of FuzzyLikeThisQuery, to be used in the context of KNN classification.
KNN search on top of 2D lat/lon indexed points.
A Spans that is formed from the ordered subspans of a SpanNearQuery
where the subspans do not overlap and have a maximum slop between them.
Similar to
NearSpansOrdered, but for the unordered case.N-Gram version of edit distance based on paper by Grzegorz Kondrak,
"N-gram similarity and distance".
Factory for
NGramTokenFilter.This is a
PhraseQuery which is optimized for n-gram phrase query.Tokenizes the input into n-grams of the given size(s).
Tokenizes the input into n-grams of the given size(s).
Factory for
NGramTokenizer.An
FSDirectory implementation that uses java.nio's FileChannel's
positional read, which allows multiple threads to read from the same file
without synchronizing.Reads bytes with
FileChannel.read(ByteBuffer, long)MessageBundles classes extend this class, to implement a bundle.
Interface that exceptions should implement to support lazy loading of messages.
A
NoChildOptimizationQueryNodeProcessor removes every
BooleanQueryNode, BoostQueryNode, TokenizedPhraseQueryNode or
ModifierQueryNode that do not have a valid children.An
IndexDeletionPolicy which keeps all index commits around, never
deleting them.Use this
LockFactory to disable locking entirely.A source returning no matches
A
MergePolicy which never returns merges to execute.A
MergeScheduler which never executes any merges.Never returns offsets.
A null FST
Outputs implementation; use this if
you just want to build an FSA.This class acts as the base class for the implementations of the term
frequency normalization methods in the DFR framework.
Implementation used when there is no normalization.
Normalization model that assumes a uniform distribution of the term frequency.
Normalization model in which the term frequency is inversely related to the
length.
Dirichlet Priors normalization
Pareto-Zipf Normalization
Holds a map of String input to String output, to be used
with
MappingCharFilter.Builds an NormalizeCharMap.
Abstract API that consumes normalization values.
Tracks state of one numeric sub-reader that we are merging
A
Query that matches documents that have a value for a given field
as reported by field norms.Encodes/decodes per-document score normalization values.
Abstract API that produces field normalization values
Function that returns the decoded norm for every document.
Buffers up pending long per doc, then flushes when
segment flushes.
Analyzer for Norwegian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words.Factory for
NorwegianLightStemFilter.Light Stemmer for Norwegian.
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words.Factory for
NorwegianMinimalStemFilter.Minimal Stemmer for Norwegian Bokmål (no-nb) and Nynorsk (no-nn)
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
A
NoTokenFoundQueryNode is used if a term is convert into no tokens
by the tokenizer/lemmatizer/analyzer (null).Factory for prohibited clauses
Wraps a
RAMDirectory
around any provided delegate directory, to
be used during NRT search.
NRTSuggester executes Top N search on a weighted FST specified by a
CompletionScorerHelper to encode/decode payload (surface + PAYLOAD_SEP + docID) output
Compares partial completion paths using
CompletionScorer.score(float, float),
breaks ties comparing path inputsBuilder for
NRTSuggesterFragmenter implementation which does not fragment the text.Returns the value of
IndexReader.numDocs()
for every document.Abstract numeric comparator for comparing numeric values.
A per-document numeric value.
Field that stores a per-document
long value for scoring,
sorting or value retrieval.A
DocValuesFieldUpdates which holds updates of documents, of a single
NumericDocValuesField.Buffers up pending long per doc, then flushes when
segment flushes.
Assigns a payload to a token based on the
TypeAttributeFactory for
NumericPayloadTokenFilter.Helper APIs to encode numeric values as sortable bytes and vice-versa.
Provides off heap storage of finite state machine (FST),
using underlying index input instead of byte store on heap
Reads points from disk in a fixed-with format, previously written with
OfflinePointWriter.Reusable implementation for a point value offline
Writes points to disk in a fixed-with format.
On-disk sorting of byte arrays.
A bit more descriptive unit for constructors.
Utility class to read length-prefixed byte[] entries from an input.
Utility class to emit length-prefixed byte[] entries to an output stream for sorting.
Holds one partition of items, either loaded into memory or based on a file.
The start and end character offset of a Token.
Default implementation of
OffsetAttribute.Tracks a reference intervals source, and produces a pseudo-interval that appears
either one position before or one position after each interval from the reference
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
An enumeration/iterator of a term and its offsets for use by
FieldHighlighter.A view over several OffsetsEnum instances, merging them in-place
Based on a
MatchesIterator; does not look at submatches.Based on a
MatchesIterator with submatches.Based on a
PostingsEnum -- the typical/standard OE impl.A wrapping merge policy that wraps the
MergePolicy.OneMerge
objects returned by the wrapped merge policy.Provides storage of finite state machine (FST),
using byte array or byte store allocated on heap.
A
OpaqueQueryNode is used for specify values that are not supposed to
be parsed by the parser.Processes
TermRangeQuerys with open ranges.A StringBuilder that allows one to access the array.
Automata operations.
The Optimizer class is a Trie that will be reduced (have empty rows removed).
The Optimizer class is a Trie that will be reduced (have empty rows removed).
Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints representation.
This is just like
BlockTreeTermsWriter, except it also stores a version per term, and adds a method to its TermsEnum
implementation to seekExact only if the version is >= the specified version.BlockTree's implementation of
Terms.Iterates through terms in this field.
Holds a single input (IntsRef) + output pair.
An ordinal based
TermStateFactory for disjunctions
A
OrQueryNode represents an OR boolean operation performed on a list
of nodes.Represents the outputs for an FST, providing the basic
algebra required for building and traversing the FST.
A
DataOutput wrapping a plain OutputStream.Implementation class for buffered
IndexOutput that writes to an OutputStream.Overlays a 2nd LeafReader for the terms of one field, otherwise the primary reader is
consulted.
Packs integers into 3 shorts (48 bits per value).
Space optimized random access capable array of values with a fixed number of
bits/value.
This class is similar to
Packed64 except that it trades space for
speed by ensuring that a single block needs to be read/written in order to
read/write a value.Packs integers into 3 bytes (24 bits per value).
A
DataInput wrapper to read unaligned, variable-length packed
integers.A
DataOutput wrapper to write unaligned, variable-length packed
integers.Simplistic compression for array of unsigned long values.
A decoder for packed integers.
An encoder for packed integers.
A format to write packed ints.
Simple class that holds a format and a number of bits per value.
A packed integer array that can be modified.
A
PackedInts.Reader which has all its values equal to 0 (bitsPerValue = 0).A read-only random access array of positive integers.
A simple base for Readers that keeps track of valueCount and bitsPerValue.
Run-once iterator interface, to decode previously saved PackedInts.
A write-once Writer.
Utility class to compress integers into a
LongValues instance.A Builder for a
PackedLongValues instance.Default implementation of the common attributes used by Lucene:
CharTermAttribute
TypeAttribute
PositionIncrementAttribute
PositionLengthAttribute
OffsetAttribute
TermFrequencyAttribute
Represents a logical byte[] as a series of pages.
Provides methods to read BytesRefs from a frozen
PagedBytes.
A
PagedMutable.An FST
Outputs implementation, holding two other outputs.Holds a single pair of two outputs.
An
CompositeReader which reads multiple, parallel indexes.An
LeafReader which reads multiple, parallel indexes.Matcher class that runs matching queries in parallel.
A query that returns all the matching child documents for a specific parent document
indexed together in the same block.
This exception is thrown when parse errors are encountered.
This exception is thrown when parse errors are encountered.
This exception is thrown when parse errors are encountered.
Thrown when the xml queryparser encounters
invalid syntax/configuration.
This class represents an extension base class to the Lucene standard
QueryParser.A multi-threaded matcher that collects all possible matches in one pass, and
then partitions them amongst a number of worker threads to perform the actual
matching.
Attribute for
Token.getPartOfSpeech().Part of Speech attributes for Korean.
Attribute for
Token.getPartOfSpeech().Part of Speech attributes for Korean.
Represents a passage (typically a sentence of the document).
Creates a formatted snippet from the top passages.
Ranks passages found by
UnifiedHighlighter.Tokenizer for path-like hierarchies.
Factory for
PathHierarchyTokenizer.SmartChineseAnalyzer internal node representation
A
PathQueryNode is used to store queries like
/company/USA/California /product/shoes/brown.Term text with a beginning and end position
Factory for
PatternCaptureGroupTokenFilter.CaptureGroup uses Java regexes to emit multiple tokens - one for each capture
group in one or more patterns.
This interface is used to connect the XML pattern file parser to the
hyphenation tree.
Marks terms as keywords via the
KeywordAttribute.A SAX document handler to read and parse hyphenation patterns from a XML
file.
CharFilter that uses a regular expression for the target of replace string.
Factory for
PatternReplaceCharFilter.A TokenFilter which applies a Pattern to each token in the stream,
replacing match occurrences with the specified replacement string.
Factory for
PatternReplaceFilter.This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream.
Factory for
PatternTokenizer.The payload of a Token.
Default implementation of
PayloadAttribute.Defines a way of converting payloads to float values, for use by
PayloadScoreQueryMainly for use with the DelimitedPayloadTokenFilter, converts char buffers to
BytesRef.An abstract class that defines a way for PayloadScoreQuery instances to transform
the cumulative effects of payload scores for a document.
Utility methods for encoding payloads.
A Query class that uses a
PayloadFunction to modify the score of a wrapped SpanQuerySpanCollector for collecting payloads
Experimental class to get set of payloads for most standard Lucene queries.
This class handles accounting and applying pending deletes for live segment readers
This analyzer is used to facilitate scenarios where different
fields require different analysis techniques.
Enables per field docvalues support.
Utility class to update the
MergeState instance to be restricted to a set of fields.Enables per field postings support.
Group of fields written by one PostingsFormat
Provides the ability to use a different
Similarity for different fields.Analyzer for Persian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
CharFilter that replaces instances of Zero-width non-joiner with an
ordinary space.
Factory for
PersianCharFilter.A
TokenFilter that applies PersianNormalizer to normalize the
orthography.Factory for
PersianNormalizationFilter.Normalizer for Persian.
A
SnapshotDeletionPolicy which adds a persistence layer so that
snapshots can be maintained across the life of an application.Utility class to encode sequences of 128 small positive integers.
Create tokens for phonetic matches.
Factory for
PhoneticFilter.Helps the
FieldOffsetStrategy with position sensitive queries (e.g.Needed to support the ability to highlight a query irrespective of the field a query refers to
(aka requireFieldMatch=false).
Base class for exact and sloppy phrase matching
To find matches on a document, first advance
PhraseMatcher.approximation() to the
relevant document, then call PhraseMatcher.reset().Position of a term in a document that takes into account the term offset within the phrase.
A Query that matches documents containing a particular sequence of terms.
A builder for phrase queries.
Builds a
PhraseQuery object from a TokenizedPhraseQueryNode
object.Query node for
PhraseQuery's slop factor.This processor removes invalid
SlopQueryNode objects in the query
node tree.A generalized version of
PhraseQuery, built with one or more MultiTermQuery
that provides term expansions for multi-terms (one of the expanded terms must match).Builds a
PhraseWildcardQuery.Phrase term with expansions.
All
PhraseWildcardQuery.PhraseTerm are light and immutable.Phrase term with no expansion.
Holds a pair of term bytes - term state.
Holds the
TermState and TermStatistics for all the matched
and collected Term, for all phrase terms, for all segments.Accumulates the doc freq and total term freq.
Test counters incremented when assertions are enabled.
Split an index based on a
Query.Remove this file when adding back compat codecs
Dictionary represented by a text file.
Represents a point on the earth's surface.
2D point implementation containing geo spatial logic.
Abstract query class to find all documents whose single or multi-dimensional point values, previously indexed with e.g.
Iterator of encoded point values.
This query node represents a field query that holds a point value.
This processor is used to convert
FieldQueryNodes to
PointRangeQueryNodes.Abstract class for range queries against single or multidimensional points such as
IntPoint.Creates a range query across 1D
PointValues.This query node represents a range query composed by
PointQueryNode
bounds, which means the bound values are Numbers.Builds
PointValues range queries out of PointRangeQueryNodes.This processor is used to convert
TermRangeQueryNodes to
PointRangeQueryNodes.One pass iterator through all points previously written with a
PointWriter, abstracting away whether points are read
from (offline) disk or simple arrays in heap.This class holds the configuration used to parse numeric queries and create
PointValues queries.This listener is used to listen to
FieldConfig requests in
QueryConfigHandler and add StandardQueryConfigHandler.ConfigurationKeys.POINTS_CONFIG
based on the StandardQueryConfigHandler.ConfigurationKeys.POINTS_CONFIG_MAP set in the
QueryConfigHandler.Encodes/decodes indexed points.
Abstract API to visit point values.
Abstract API to write points
Represents a dimensional point value written in the BKD tree.
Access to indexed numeric values.
We recurse the BKD tree, using a provided instance of this to guide the recursion.
Used by
PointValues.intersect(org.apache.lucene.index.PointValues.IntersectVisitor) to check how each recursive cell corresponds to the query.Buffers up pending byte[][] value(s) per doc, then flushes when segment flushes.
Appends many points, and then at the end provides a
PointReader to iterate
those points.Analyzer for Polish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
Represents a closed polygon on the earth's surface.
2D polygon implementation represented as a balanced interval tree of edges.
Transforms the token stream as per the Porter stemming algorithm.
Factory for
PorterStemFilter.Stemmer, implementing the Porter Stemming Algorithm
The Stemmer class transforms a word into its root form.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Analyzer for Portuguese.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words.Factory for
PortugueseLightStemFilter.Light Stemmer for Portuguese
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words.Factory for
PortugueseMinimalStemFilter.Minimal Stemmer for Portuguese
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words.Factory for
PortugueseStemFilter.Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa)
algorithm.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Part of speech classification for Korean based on Sejong corpus classification.
Part of speech tag for Korean based on Sejong corpus classification.
The type of the token.
Determines the position of this token
relative to the previous Token in a TokenStream, used in phrase
searching.
Default implementation of
PositionIncrementAttribute.Determines how many positions this
token spans.
Default implementation of
PositionLengthAttribute.Utility class to record Positions Spans
An FST
Outputs implementation where each output
is a non-negative long value.Iterates through the postings.
Encodes/decodes terms, postings, and proximity data.
This static holder class prevents classloading deadlock by delaying
init of postings formats until needed.
Uses offsets in postings --
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.The core terms dictionaries (BlockTermsReader,
BlockTreeTermsReader) interact with a single instance
of this class to manage creation of
PostingsEnum and
PostingsEnum instances.Like
PostingsOffsetStrategy but also uses term vectors (only terms needed) for multi-term queries.Class that plugs into term dictionaries, such as
BlockTreeTermsWriter, and handles writing postings.Function to raise the base "a" to the power "b"
This processor pipeline extends
StandardQueryNodeProcessorPipeline and enables
boolean precedence on it.
This query parser works exactly as the standard query parser (
StandardQueryParser ),
except that it respect the boolean precedence, so <a AND b OR c AND d> is parsed to <(+a +b) (+c +d)>
instead of <+a +b +c +d>.Prefix codes term instances (prefixes are shared).
Builds a PrefixCodedTerms: call add repeatedly, then finish.
An iterator over the list of terms stored in a
PrefixCodedTerms.A
CompletionQuery which takes an Analyzer
to analyze the prefix of the query term.A Query that matches documents containing terms with a specified prefix.
A
PrefixWildcardQueryNode represents wildcardquery that matches abc*
or *.Builds a
PrefixQuery object from a PrefixWildcardQueryNode
object.A Presearcher is used by the Monitor to reduce the number of queries actually
run against a Document.
Wraps a
QueryMatch with information about which queries were selected by the presearcherWraps a
MultiMatchingQueries with information on which presearcher queries were selectedInfoStream implementation over a
PrintStream
such as System.out.A PriorityQueue maintains a partial ordering of its elements such that the
least element can always be found in constant time.
ProductFloatFunction returns the product of its components.A ConditionalTokenFilter that only applies its wrapped filters to tokens that
are not contained in a protected set.
Factory for a
ProtectedTermFilterA
ProximityQueryNode represents a query where the terms should meet
specific distance conditions.utility class containing the distance condition and number
Distance condition: PARAGRAPH, SENTENCE, or NUMBER
Extension of
PostingsWriterBase, adding a push
API for writing each element of the postings.The abstract base class for queries.
Class to analyze and extract terms from a lucene query, to be used by
a
Presearcher in indexing.An
Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection
which prevents very common words from being passed into queries.A
BitSetProducer that wraps a query and caches matching
BitSets per segment.This interface is used by implementors classes that builds some kind of
object from a query tree.
Implemented by objects that produce Lucene Query objects from XML streams.
Creates queries from the
Analyzer chain.Wraps a term and boost
Factory for
QueryBuilderA cache for queries.
A policy defining which filters should be cached.
This class can be used to hold any query configuration and no field
configuration.
Split a disjunction query into its consituent parts, so that they can be indexed
and run separately in the Monitor.
A Collector that decodes the stored query for each document hit.
Represents a match for a specific query and document
A
QueryNode is a interface implemented by all nodes on a QueryNode
tree.Error class with NLS support
This exception should be thrown if something wrong happens when dealing with
QueryNodes.A
QueryNodeImpl is the default implementation of the interface
QueryNodeAllow joining 2 QueryNode Trees, into one.
This should be thrown when an exception happens during the query parsing from
string to the query node tree.
A
QueryNodeProcessor is an interface for classes that process a
QueryNode tree.
This is a default implementation for the
QueryNodeProcessor
interface, it's an abstract class, so it should be extended by classes that
want to process a QueryNode tree.A
QueryNodeProcessorPipeline class should be used to build a query
node processor pipeline.This class is generated by JavaCC.
This class is generated by JavaCC.
The default operator for parsing queries.
This class is overridden by QueryParser in QueryParser.jj
and acts to separate the majority of the Java code from the .jj grammar file.
Token literal values and constants.
Token literal values and constants.
This class is a helper for the query parser framework, it does all the three
query parser phrases at once: text parsing, query processing and query
building.
Flexible Query Parser message bundle class
Token Manager.
Token Manager.
This class defines utility methods to (help) parse query strings into
Query objects.A
Rescorer that uses a provided Query to assign
scores to the first-pass hits.Scorer implementation which scores text fragments by the number of
unique query terms found.Utility class used to extract the terms used in a query, plus any weights.
Scorer implementation which scores text fragments by the number of
unique query terms found.Notified of the time it takes to run individual queries against a set of documents
Base for query timeout implementations, which will provide a
shouldExit() method,
used with ExitableDirectoryReader.An implementation of
QueryTimeout that can be used by
the ExitableDirectoryReader class to time out and exit out
when a query takes a long time to rewrite.A representation of a node in a query tree
Queries are analyzed and converted into an abstract tree, consisting
of conjunction and disjunction nodes, and leaf nodes containing terms.
This class should be used when there is a builder for each type of node.
QueryValueSource returns the relevance score of the queryAllows recursion through a query tree
A
QuotedFieldQueryNode represents phrase query.Radix selector.
A straightforward implementation of
FSDirectory
using java.io.RandomAccessFile.Reads bytes with
RandomAccessFile.seek(long) followed by
RandomAccessFile.read(byte[], int, int).Deprecated.
This class uses inefficient synchronization and is discouraged
in favor of
MMapDirectory.Deprecated.
This class uses inefficient synchronization and is discouraged
in favor of
MMapDirectory.Deprecated.
This class uses inefficient synchronization and is discouraged
in favor of
MMapDirectory.Deprecated.
This class uses inefficient synchronization and is discouraged
in favor of
MMapDirectory.Estimates the size (memory representation) of Java objects.
Utility methods to estimate the RAM usage of objects.
Random Access Index API.
Query class for searching
RangeField types by a defined PointValues.Relation.Used by
RangeFieldQuery to check how each internal or leaf node relates to the query.RangeMapFloatFunction implements a map function over
another ValueSource whose values fall within min and max inclusive to target.Builder for
TermRangeQueryThis interface should be implemented by a
QueryNode that represents
some kind of range query.Abstract base class to rate limit IO.
Simple class to rate limit IO.
Utility class to safely share
DirectoryReader instances across
multiple threads, while periodically reopening.Holds shared SegmentReader instances.
This class merges the current on-disk DV with an incoming update DV instance and merges the two instances
giving the incoming update precedence in terms of values, in other words the values of the update always
wins over the on-disk version.
Subreader slice from a parent composite reader.
Common util methods for dealing with
IndexReaders and IndexReaderContexts.Attribute for Kuromoji reading data
Attribute for Korean reading data
Attribute for Kuromoji reading data
Attribute for Korean reading data
ReciprocalFloatFunction implements a reciprocal function f(x) = a/(mx+b), based on
the float value of a field or function as exported by ValueSource.Represents a lat/lon rectangle.
2D rectangle implementation containing cartesian spatial logic.
A
ByteBlockPool.Allocator implementation that recycles unused byte
blocks in a buffer and reuses them in subsequent calls to
RecyclingByteBlockAllocator.getByteBlock().A
IntBlockPool.Allocator implementation that recycles unused int
blocks in a buffer and reuses them in subsequent calls to
RecyclingIntBlockAllocator.getIntBlock().The Reduce object is used to remove gaps in a Trie which stores a dictionary.
Manages reference counting for a given object.
Utility class to safely share instances of a certain type across multiple
threads, while periodically refreshing them.
Use to receive notification when a refresh has
finished.
A
CompletionQuery which takes a regular expression
as the prefix of the query term.Regular Expression extension to
Automaton.The type of expression represented by a RegExp node.
A fast regular expression query based on the
org.apache.lucene.util.automaton package.A query handler implementation that matches Regexp queries by indexing regex
terms by their longest static substring, and generates ngrams from Document
tokens to match them.
A
RegexpQueryNode represents RegexpQuery query Examples: /[a-z]|[0-9]/Builds a
RegexpQuery object from a RegexpQueryNode object.Processor for Regexp queries.
A
QueryNodeProcessorPipeline class removes every instance of
DeletedQueryNode from a query node tree.A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
Factory for
RemoveDuplicatesTokenFilter.This processor removes every
QueryNode that is not a leaf and has not
children.Generates an iterator that spans repeating instances of a sub-iterator,
avoiding minimization.
A Scorer for queries with a required subscorer
and an excluding (prohibited) sub
Scorer.A Scorer for queries with a required part and an optional part.
Re-scores the topN results (
TopDocs) from an original
query.Abstraction for loading resources (streams, files, and classes).
Interface for a component that needs to be initialized by
an implementation of
ResourceLoader.Internal class to enable reuse of the string reader by
Analyzer.tokenStream(String,String)Reads in reverse from a single byte[].
Tokenizer for domain-like hierarchies.
Implements reverse read from a RandomAccessInput.
Reverse token string, for example "country" => "yrtnuoc".
Factory for
ReverseStringFilter.DocIdSet implementation inspired from http://roaringbitmap.org/
The space is divided into blocks of 2^16 bits and each block is encoded
independently.A builder of
RoaringDocIdSets.DocIdSet implementation that can store documents up to 2^16-1 in a short[].Acts like forever growing T[], but internally uses a
circular buffer to reuse instances of T.
Implement to reset an instance
Acts like a forever growing char[] as you read
characters into it from the provided reader, but
internally it uses a circular buffer to only hold the
characters that haven't been freed yet.
Analyzer for Romanian.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
The Row class represents a row in a matrix representation of a trie.
Base class for stemmers that use a set of RSLP-like stemming steps.
A basic rule, with no exceptions.
A rule with a set of whole-word exceptions.
A rule with a set of exceptional suffixes.
A step containing a list of rules.
Finite-state automaton with fast run operation.
Analyzer for Russian language.A
TokenFilter that applies RussianLightStemmer to stem Russian
words.Factory for
RussianLightStemFilter.Light Stemmer for Russian.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
An
ExecutorService that executes tasks immediately in the calling thread during submit.Scales values to be between min and max.
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
Factory for
ScandinavianFoldingFilter.This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
Factory for
ScandinavianNormalizationFilter.Allows access to the score of a Query
A child Scorer and its relationship to its parent.
Used by
BulkScorers that need to pass a Scorable to LeafCollector.setScorer(org.apache.lucene.search.Scorable).A
Scorer which wraps another scorer and caches the score of the
current document.Holds one hit in
TopDocs.How to aggregate multiple child hit scores into a single parent score.
Different modes of search.
An implementation of FragmentsBuilder that outputs score-order fragments.
Comparator for
FieldFragList.WeightedFragInfo by boost, breaking ties
by offset.A Scorer is responsible for scoring a stream of tokens.
Expert: Common scoring functionality for different types of queries.
A supplier of
Scorer.Util class for Scorer related methods
A QueryMatch that reports scores for each match
Base rewrite method that translates each term into a query, and keeps
the scores as computed by the query.
Special implementation of BytesStartArray that keeps parallel arrays for boost and docFreq
This attribute stores the UTR #24 script value for a token of text.
Implementation of
ScriptAttribute that stores the script
as an integer.An iterator that locates ISO 15924 script boundaries in text.
Factory class used by
SearcherManager to
create new IndexSearchers.Keeps track of current plus old IndexSearchers, closing
the old ones once they have timed out.
Simple pruner that drops any searcher older by
more than the specified seconds, than the newest
searcher.
Utility class to safely share
IndexSearcher instances across multiple
threads, while periodically reopening.Represents a group that is found during the first pass search.
SecondPassGroupingCollector runs over an already collected set of
groups, further applying a
GroupReducer to each groupA filtered TermsEnum that uses a BytesRefHash as a filter
Graph representing possible tokens at each start offset in the sentence.
Interface defining whether or not an object can be cached against a
LeafReader
Objects that depend only on segment-immutable structures such as Points or postings lists
can just return true from SegmentCacheable.isCacheable(LeafReaderContext)
Objects that depend on doc values should return DocValues.isCacheable(LeafReaderContext, String...), which
will check to see if the doc values fields have been updated.Embeds a [read-only] SegmentInfo and adds per-commit
fields.
Holds core readers that are shared (unchanged) when
SegmentReader is cloned or reopened
Manages the
DocValuesProducer held by SegmentReader and
keeps track of their reference counting.Encapsulates multiple producers when there are docvalues updates as one producer
Information about a segment such as its name, directory, and files related
to the segment.
Expert: Controls the format of the
SegmentInfo (segment metadata file).A collection of segmentInfo objects with methods for operating on those
segments in relation to the file system.
Utility class for executing code that needs to do
something with the current segments file.
Breaks text into sentences with a
BreakIterator and
allows subclasses to decompose these sentences into words.The SegmentMerger class combines two or more Segments, represented by an
IndexReader, into a single Segment.
IndexReader implementation over a single segment.
Holder class for common parameters used during read.
Iterates through terms in this field.
Holder class for common parameters used during write.
SmartChineseAnalyzer internal token
Filters a
SegToken by converting full-width latin to half-width, then lowercasing latin.A pair of tokens in
SegGraphAn implementation of a selection algorithm, ie.
A native int hash-based set where one value is reserved to mean "EMPTY" internally.
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
Factory for
SerbianNormalizationFilter.Normalizes Serbian Cyrillic to Latin.
A
MergeScheduler that simply does each merge
sequentially, using the current thread.Marks terms as keywords via the
KeywordAttribute.A convenient class which offers a semi-immutable object wrapper
implementation which allows one to set the value of an object exactly once,
and retrieve it many times.
Thrown when
SetOnce.set(Object) is called more than once.Holding object and marking that it was already set
A base shape utility class used for both LatLon (spherical) and XY (cartesian) shape fields.
Represents a encoded triangle using
ShapeField.decodeTriangle(byte[], DecodedTriangle).type of triangle
Query Relation Types
polygons are decomposed into tessellated triangles using
Tessellator
these triangles are encoded and inserted as separate indexed POINT fieldsA ShingleAnalyzerWrapper wraps a
ShingleFilter around another Analyzer.A ShingleFilter constructs shingles (token n-grams) from a token stream.
Factory for
ShingleFilter.Similarity defines the components of Lucene scoring.
Stores the weight for a query across the indexed collection.
A subclass of
Similarity that provides a simplified API for its
descendants.Simple class that binds expression variable names to
DoubleValuesSources
or other Expressions.BoolFunction implementation which applies an extendible boolean
function to the values of a single wrapped ValueSource.Simple boundary scanner implementation that divides fragments
based on a set of separator characters.
Base
Collector implementation that is used to collect all contexts.Base
FieldComparator implementation that is used for all contexts.A simple implementation of
FieldFragList.A simple float function with a single argument
A simple implementation of
FragListBuilder.Fragmenter implementation which breaks text up into same-size
fragments with no concerns over spotting sentence boundaries.A simple implementation of FragmentsBuilder.
Deprecated.
This class is a less efficient implementation of what's available
in
NIOFSDirectory, and will be removed in future versions of Lucene.Reads bytes with
SeekableByteChannel.read(ByteBuffer)Does minimal parsing of a GeoJSON object, to extract either Polygon or MultiPolygon, either directly as the top-level type, or if
the top-level type is Feature, as the geometry of that feature.
Simple
Encoder implementation to escape text for HTML outputSimple
Formatter implementation to highlight terms with a pre and
post tag.A very simple merged segment warmer that just ensures
data structures are initialized.
A simplistic Lucene based NaiveBayes classifier, see
http://en.wikipedia.org/wiki/Naive_Bayes_classifierA simplistic Lucene based NaiveBayes classifier, see
http://en.wikipedia.org/wiki/Naive_Bayes_classifierFactory for
SimplePatternSplitTokenizer, for producing tokens by splitting according to the provided regexp.Factory for
SimplePatternTokenizer, for matching tokens based on the provided regexp.SimpleQueryParser is used to parse human readable query syntax.
Fragmenter implementation which breaks text up into same-size
fragments but does not split up Spans.Base class for queries that expand to sets of simple terms.
Callback to visit each matching term during "rewrite"
in
SimpleTerm.MatchingTermVisitor.visitMatchingTerm(Term)Forked from
BKDReader and simplified/specialized for SimpleText's usageUsed to track all state for a single call to
SimpleTextBKDReader.intersect(org.apache.lucene.index.PointValues.IntersectVisitor).Forked from
BKDWriter and simplified/specialized for SimpleText's usageplain text index format.
plain text compound format.
plain text doc values format.
plaintext field infos format
reads/writes plaintext live docs
plain-text norms format.
Writes plain-text norms.
Reads plain-text norms.
For debugging, curiosity, transparency only!! Do not
use this codec in production.
For debugging, curiosity, transparency only!! Do not
use this codec in production.
plain text segments file format.
plain text stored fields format.
reads plaintext stored fields
Writes plain-text stored fields.
plain text term vectors format.
Reads plain-text term vectors.
Writes plain-text term vectors.
Parses shape geometry represented in WKT format
complies with OGC® document: 12-063r5 and ISO/IEC 13249-3:2016 standard
located at http://docs.opengeospatial.org/is/12-063r5/12-063r5.html
Enumerated type for Shapes
An implementation class of
FragListBuilder that generates one FieldFragList.WeightedFragInfo object.A function with a single argument
Implements
LockFactory for a single in-process instance,
meaning all locking will take place through this one instance.Subclass of FilteredTermsEnum for enumerating a single term.
Exposes multi-valued view over a single-valued instance.
Exposes multi-valued iterator view over a single-valued iterator.
Directory that wraps another, and that sleeps and retries
if obtaining the lock fails.
Math functions that trade off accuracy for speed.
Find all slop-valid position-combinations (matches)
encountered while traversing/hopping the PhrasePositions.
A
SlopQueryNode represents phrase query with a slop.This builder basically reads the
Query object set on the
SlopQueryNode child using
QueryTreeBuilder.QUERY_TREE_BUILDER_TAGID and applies the slop value
defined in the SlopQueryNode.Wraps arbitrary readers for merging.
ImpactsEnum that doesn't index impacts but implements the API in a
legal way.Reports on slow queries in a given match run
An individual entry in the slow log
Floating point numbers smaller than 32 bits.
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
An
IndexDeletionPolicy that wraps any other
IndexDeletionPolicy and adds the ability to hold and later release
snapshots of an index.A filter that stems words using a Snowball-generated stemmer.
Factory for
SnowballFilter, with configurable languageThis is the rev 502 of the Snowball SVN trunk,
now located at GitHub,
but modified:
made abstract and introduced abstract method stem to avoid expensive reflection in filter class.
This reader filters out documents that have a doc values value in the given field and treat these
documents as soft deleted.
This
MergePolicy allows to carry over soft deleted documents across merges.Parser for the Solr synonyms format.
Analyzer for Sorani Kurdish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies SoraniNormalizer to normalize the
orthography.Factory for
SoraniNormalizationFilter.Normalizes the Unicode representation of Sorani text.
A
TokenFilter that applies SoraniStemmer to stem Sorani words.Factory for
SoraniStemFilter.Light stemmer for Sorani
Encapsulates sort criteria for returned hits.
A per-document byte[] with presorted values.
Field that stores
a per-document
BytesRef value, indexed for
sorting.Implements a
TermsEnum wrapping a provided
SortedDocValues.Buffers up pending byte[] per doc, deref and sorting via
int ord, then flushes when segment flushes.
This wrapper buffers incoming elements and makes sure they are sorted based on given comparator.
A list of per-document numeric values, sorted
according to
Long.compare(long, long).
Field that stores a per-document
long values for scoring,
sorting or value retrieval.Buffers up pending long[] per doc, sorts, then flushes when segment flushes.
Selects a value from the document's list to use as the representative value
Wraps a SortedNumericDocValues and returns the last value (max)
Wraps a SortedNumericDocValues and returns the first value (min)
Type of selection to perform.
SortField for
SortedNumericDocValues.A SortFieldProvider for this sort field
A multi-valued version of
SortedDocValues.
Field that stores
a set of per-document
BytesRef values, indexed for
faceting,grouping,joining.Implements a
TermsEnum wrapping a provided
SortedSetDocValues.Buffers up pending byte[]s per doc, deref and sorting via
int ord, then flushes when segment flushes.
Retrieves
FunctionValues instances for multi-valued string based fields.Selects a value from the document's set to use as the representative value
Wraps a SortedSetDocValues and returns the last ordinal (max)
Wraps a SortedSetDocValues and returns the middle ordinal (or max of the two)
Wraps a SortedSetDocValues and returns the middle ordinal (or min of the two)
Wraps a SortedSetDocValues and returns the first ordinal (min)
Type of selection to perform.
SortField for
SortedSetDocValues.A SortFieldProvider for this sort
Sorts documents of a given index by returning a permutation on the document
IDs.
Base class for sorting algorithms implementations.
A permutation of doc IDs.
Stores information about how to sort documents by terms in an individual
field.
A SortFieldProvider for field sorts
Specifies the type of the terms to be sorted, or special types such as CUSTOM
Reads/Writes a named SortField from a segment info file, used to record index sorts
An
CodecReader which supports sorting documents by a given
Sort.A visitor that copies every field it sees in the provided
StoredFieldsWriter.A
Rescorer that re-sorts according to a provided
Sort.Counterpart of
BoostQuery for spans.Base class for building
SpanQuerysAn interface defining the collection of postings information from the leaves
of a
SpansKeep matches that contain another SpanScorer.
Builder for
SpanFirstQueryMatches spans near the beginning of a field.
Formats text with different color intensity depending on the score of the
term using the span tag.
Analyzer for Spanish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words.Factory for
SpanishLightStemFilter.Light Stemmer for Spanish
A
TokenFilter that applies SpanishMinimalStemmer to stem Spanish
words.Factory for
SpanishMinimalStemFilter.Minimal plural stemmer for Spanish.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Wraps any
MultiTermQuery as a SpanQuery,
so it can be nested within other SpanQuery classes.Abstract class that defines how the query is rewritten.
A rewrite method that first translates each term into a SpanTermQuery in a
BooleanClause.Occur.SHOULD clause in a BooleanQuery, and keeps the
scores as computed by the query.Builder for
SpanNearQueryFactory for
SpanOrQueryMatches spans which are near one another.
A builder for SpanNearQueries
Builder for
SpanNotQueryRemoves matches which overlap with another SpanQuery or which are
within x tokens before or y tokens after another SpanQuery.
Builder for
SpanOrQueryMatches the union of its clauses.
Builder that analyzes the text into a
SpanOrQueryOnly return those matches that have a specific payload at the given position.
Base class for filtering a SpanQuery based on the position of a match.
Builder for
SpanPositionRangeQueryChecks to see if the
SpanPositionCheckQuery.getMatch() lies between a start and end position
See SpanFirstQuery for a derivation that is optimized for the case where start position is 0.Base class for span-based queries.
Interface for retrieving a
SpanQuery.Factory for
SpanQueryBuildersIterates through combinations of start/end positions per-doc.
Builder for
SpanTermQueryMatches spans containing a term.
Expert-only.
Enumeration defining what postings information should be retrieved from the
index for a given Spans
Keep matches that are contained within another Spans.
A bit set that only stores longs that have at least one bit which is set.
utility class for implementing constant score logic specific to INTERSECT, WITHIN, and DISJOINT
Visitor used for walking the BKD tree.
Spell Checker class (Main class).
(initially inspired by the David Spencer code).
(initially inspired by the David Spencer code).
Helper class for loading SPI classes from classpath (META-INF files).
Virtually slices the text on both sides of every occurrence of the specified character.
Query that matches String prefixes
Lowest level base class for surround queries
Simple single-term clause
Query that matches wildcards
Filters
StandardTokenizer with LowerCaseFilter and
StopFilter, using a configurable list of stop words.Default implementation of
DirectoryReader.This query configuration handler is used for almost every processor defined
in the
StandardQueryNodeProcessorPipeline processor pipeline.Class holding keys for StandardQueryNodeProcessorPipeline options.
Boolean Operator: AND or OR
This pipeline has all the processors needed to process a query node tree,
generated by
StandardSyntaxParser, already assembled.This class is a helper that enables users to easily use the Lucene query
parser.
This query tree builder only defines the necessary map to build a
Query tree object.Parser for the standard Lucene syntax
Token literal values and constants.
Token Manager.
A grammar-based tokenizer constructed with JFlex.
Factory for
StandardTokenizer.This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29.
Pair of states.
BlockTree statistics for a single field
returned by
FieldReader.getStats().Represents a term and its details stored in the
BlockTermState.Reads block lines encoded incrementally, with all fields corresponding
to the term of the line.
Reads terms blocks with the Shared Terms format.
Writes terms blocks with the Shared Terms format.
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word.
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming.This builder builds an
FST for the StemmerOverrideFilterA read-only 4-byte FST backed map that allows fast case-insensitive key
value lookups for
StemmerOverrideFilterFactory for
StemmerOverrideFilter.Some commonly-used stemming functions
Transforms the token stream as per the stemming algorithm.
Factory for
StempelFilter using a Polish stemming table.
Stemmer class is a convenient facade for other stemmer-related classes.
The "intersect"
TermsEnum response to
STUniformSplitTerms.intersect(CompiledAutomaton, BytesRef),
intersecting the terms with an automaton.Combines
PostingsEnum for the same term for a given field from
multiple segments.Removes stop words from a token stream.
Removes stop words from a token stream.
Factory for
StopFilter.Base class for Analyzers that need to make use of stopword sets.
A field whose value is stored so that
IndexSearcher.doc(int) and IndexReader.document() will
return the field and its value.Controls the format of stored fields
Codec API for reading stored fields.
Codec API for writing stored fields:
For every document,
StoredFieldsWriter.startDocument() is called,
informing the Codec that a new document has started.Expert: provides a low-level means of accessing the stored field
values in an index.
Enumeration of possible return values for
StoredFieldVisitor.needsField(org.apache.lucene.index.FieldInfo).Abstract
FunctionValues implementation which supports retrieving String values.Used for parsing Version strings so we don't have to
use overkill String.split nor StringTokenizer (which silently
skips empty tokens).
Interface for string distances.
A field that is indexed but not tokenized: the entire
String value is indexed as a single token.
Methods for manipulating strings.
String manipulation routines
PostingsFormat based on the Uniform Split technique and supporting
Shared Terms.Extends
UniformSplitTerms for a shared-terms dictionary, with
all the fields of a term in the same block line.A block-based terms index and dictionary based on the Uniform Split technique,
and sharing all the fields terms in the same dictionary, with all the fields
of a term in the same block line.
Extends
UniformSplitTermsWriter by sharing all the fields terms
in the same dictionary and by writing all the fields of a term in the same
block line.
Field that indexes a string value and a weight as a weighted completion
against a named suggester.
Adds document suggest capabilities to IndexSearcher.
Set of strategies for suggesting related terms
Bounded priority queue for
TopSuggestDocs.SuggestScoreDocs.Like
StopFilter except it will not remove the
last token if that token was not followed by some token
separator.Factory for
SuggestStopFilter.SuggestWord, used in suggestSimilar method in SpellChecker class.
Frequency first, then score.
Sorts SuggestWord instances
Score first, then frequency
SumFloatFunction returns the sum of its components.Calculate the final score as the sum of scores of all payloads seen.
SumTotalTermFreqValueSource returns the number of tokens.Annotation to suppress forbidden-apis errors inside a whole class, a method, or a field.
Analyzer for Swedish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words.Factory for
SwedishLightStemFilter.Light Stemmer for Swedish.
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
A similarity with a lengthNorm that provides for a "plateau" of
equally good lengths, and tf helper functions.
Deprecated.
Use
SynonymGraphFilter instead, but be sure to also
use FlattenGraphFilter at index time (not at search time) as well.Deprecated.
Use
SynonymGraphFilterFactory instead, but be sure to also
use FlattenGraphFilterFactory at index time (not at search time) as well.Applies single- or multi-token synonyms from a
SynonymMap
to an incoming TokenStream, producing a fully correct graph
output.Factory for
SynonymGraphFilter.A map of synonyms, keys and values are phrases.
Builds an FSTSynonymMap.
Abstraction for parsing synonym files.
A query that treats multiple terms as synonyms.
A builder for
SynonymQuery.QueryNode for clauses that are synonym of each other.Builder for
SynonymQueryNode.A parser needs to implement
SyntaxParser interfaceThis TokenFilter provides the ability to set aside attribute states that have already been analyzed.
TokenStream output from a tee.
A convenience wrapper for storing the cached states as well the final state of the stream.
A Term represents a word from text.
A proximity query that lets you express an automaton, whose
transitions are terms, to match documents.
Sorts by docID so we can quickly pull out all scorers that are on
the same (lowest) docID.
Sorts by position so we can visit all scorers on one doc, by
position.
Term of a block line.
Presearcher implementation that uses terms extracted from queries to index
them in the Monitor, and builds a disjunction from terms in a document to match
them.
Constructs a document disjunction from a set of terms
Sets the custom term frequency of a term within one document.
Default implementation of
TermFrequencyAttribute.Function that returns
PostingsEnum.freq() for the
supplied term in every document.An implementation of
GroupFacetCollector that computes grouped facets based on the indexed terms
from DocValues.A GroupSelector implementation that groups via SortedDocValues
Specialization for a disjunction over many terms that behaves like a
ConstantScoreQuery over a BooleanQuery containing only
BooleanClause.Occur.SHOULD clauses.A
MatchesIterator over a single term's postings listA Query that matches documents containing a term.
Builder for
TermQueryA Query that matches documents within an range of terms.
This query node represents a range query composed by
FieldQueryNode
bounds, which means the bound values are strings.Builds a
TermRangeQuery object from a TermRangeQueryNode
object.This processors process
TermRangeQueryNodes.Access to the terms in a specific field.
A collector that collects all terms from a specified field matching the query.
Expert: A
Scorer for documents matching a Term.Iterator to seek (
TermsEnum.seekCeil(BytesRef), TermsEnum.seekExact(BytesRef)) or step through (BytesRefIterator.next() terms to obtain frequency information (TermsEnum.docFreq()), PostingsEnum or PostingsEnum for the current term (TermsEnum.postings(org.apache.lucene.index.PostingsEnum).Represents returned result from
TermsEnum.seekCeil(org.apache.lucene.util.BytesRef).A TokenStream created from a
TermsEnumThis class is passed each token produced by the analyzer
on each field during indexing, and it stores these
tokens in a hash table, and allocates separate byte
streams per token.
This class stores streams of information per term without knowing
the size of the stream ahead of time.
BlockTermsReader interacts with an instance of this class
to manage its terms index.Similar to TermsEnum, except, the only "metadata" it
reports for a given indexed term is the long fileOffset
into the main terms dictionary file.
Base class for terms index implementations to plug
into
BlockTermsWriter.Expert:
Public for extension only.
A query that has an array of terms from a specific field.
Builds a BooleanQuery from all of the terms found in the XML element using the choice of analyzer
Encapsulates all required internal state to position the associated
TermsEnum without re-seeking.Contains statistics for a specific term
Holder for per-term statistics.
Holder for a term along with its statistics
(
TermStats.docFreq and TermStats.totalTermFreq).This attribute is requested by TermsHashPerField to index the contents.
A filtered LeafReader that only includes the terms that are also in a provided set of terms.
Wraps a Terms with a
LeafReader, typically from term vectors.Uses term vectors that contain offsets.
Controls the format of term vectors
Codec API for reading term vectors:
Codec API for writing term vectors:
For every document,
TermVectorsWriter.startDocument(int) is called,
informing the Codec how many fields will be written.Calculates the weight of a
TermTernary Search Tree.
The class creates a TST node.
Computes a triangular mesh tessellation for a given polygon.
Circular Doubly-linked list used for polygon coordinates
state of the tessellated split - avoids recursion
Triangle in the tessellated mesh
Interface for a node that has text as a
CharSequenceA field that is indexed and tokenized, without term
vectors.
Low-level class used to record information about a section of a document
with a score.
Implementation of
Similarity with the Vector Space Model.Function that returns
TFIDFSimilarity.tf(float)
for every document.Analyzer for Thai language.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
Tokenizer that use
BreakIterator to tokenize Thai text.Factory for
ThaiTokenizer.Thrown by lucene on detecting that Thread.interrupt() had
been called.
Merges segments of approximately equal size, subject to
an allowed number of segments per tier.
Holds score and explanation for a single candidate
merge.
The
TimeLimitingCollector is used to timeout search requests that
take longer than the maximum allowed search time limit.Thrown when elapsed search time exceeds allowed search time.
Thread used to timeout search requests.
Just like
ToParentBlockJoinQuery, except this
query joins in reverse: you provide a Query matching
parent documents and it joins down to child
documents.Analyzed token with morphological data from its dictionary.
Analyzed token with morphological data.
Describes the input token stream.
Describes the input token stream.
Describes the input token stream.
A TokenFilter is a TokenStream whose input is another TokenStream.
Abstract parent class for analysis factories that create
TokenFilter
instances.One, or several overlapping tokens, along with the score(s) and the scope of
the original text.
Binary dictionary implementation for a known-word dictionary model:
Words are encoded into an FST mapping to a list of wordIDs.
Binary dictionary implementation for a known-word dictionary model:
Words are encoded into an FST mapping to a list of wordIDs.
Thin wrapper around an FST with root-arc caching for Japanese.
Thin wrapper around an FST with root-arc caching for Hangul syllables (11,172 arcs).
A
TokenizedPhraseQueryNode represents a node created by a code that
tokenizes/lemmatizes/analyzes.A Tokenizer is a TokenStream whose input is a Reader.
Abstract parent class for analysis factories that create
Tokenizer
instances.Token Manager Error.
Token Manager Error.
Token Manager Error.
Adds the
OffsetAttribute.startOffset()
and OffsetAttribute.endOffset()
First 4 bytes are the startFactory for
TokenOffsetPayloadTokenFilter.Convenience methods for obtaining a
TokenStream for use with the Highlighter - can obtain from
term vectors with offsets and positions or from an Analyzer re-parsing the stored content.TokenStream created from a term vector field.
Analyzes the text, producing a single
OffsetsEnum wrapping the TokenStream filtered to terms
in the query, including wildcards.Consumes a TokenStream and creates an
Automaton
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the TermToBytesRefAttribute.Consumes a TokenStream and creates an
TermAutomatonQuery
where the transition labels are tokens from the TermToBytesRefAttribute.This exception is thrown when determinizing an automaton would result in one
which has too many states.
Exception thrown when
BasicQueryFactory would exceed the limit
of query clauses.This query requires that you index
children and parent docs as a single block, using the
IndexWriter.addDocuments() or IndexWriter.updateDocuments() API.A special sort field that allows sorting parent docs based on nested / child level fields.
Represents hits returned by
IndexSearcher.search(Query,int).A base class for all collectors that return a
TopDocs output.Represents hits returned by
IndexSearcher.search(Query,int,Sort).Represents result returned by a grouping search.
How the GroupDocs score (if any) should be merged.
A second-pass collector that collects the TopDocs for each group, and
returns them as a
TopGroups objectScoreDoc with an
additional CharSequence keyCollector that collects completion and
score, along with document idBase rewrite method for collecting only the top terms
via a priority queue.
Utility class for english translations of morphological data,
used only for debugging.
Helper methods to ease implementing
Object.toString().Just counts the total number of hits.
Description of the total number of hits of a query.
How the
TotalHits.value should be interpreted.TotalTermFreqValueSource returns the total term freq
(sum of term freqs across all documents).A delegating Directory that records which files were
written to and deleted.
Holds one transition from an
Automaton.A Trie is used to store a dictionary of words and their stems.
Trims leading and trailing whitespace from Tokens in the stream.
Factory for
TrimFilter.A token filter for truncating the terms into a specific length.
Factory for
TruncateTokenFilter.Ternary Search Trie implementation.
Suggest implementation based on a
Ternary Search Tree
Analyzer for Turkish.Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class
accesses the static final set the first time.;
Normalizes Turkish token text to lower case.
Factory for
TurkishLowerCaseFilter.This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
An interface for implementations that support 2-phase commit.
A utility for executing 2-phase commit on several objects.
Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an
object fails to commit().Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an
object fails to prepareCommit().Returned by
Scorer.twoPhaseIterator()
to expose an approximation of a DocIdSetIterator.Makes the
TypeAttribute a payload.Factory for
TypeAsPayloadTokenFilter.Adds the
TypeAttribute.type() as a synonym,
i.e.Factory for
TypeAsSynonymFilter.A Token's lexical type.
Default implementation of
TypeAttribute.Removes tokens whose types appear in a set of blocked types from a token stream.
Factory class for
TypeTokenFilter.Filters
UAX29URLEmailTokenizer
with LowerCaseFilter and
StopFilter, using a list of
English stop words.This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
Factory for
UAX29URLEmailTokenizer.This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
A parameter object to hold the components a
FieldOffsetStrategy needs.CharsSequence with escaped chars information.
This file contains unicode properties used by various
CharTokenizers.Class to encode java's UTF16 char[] into UTF8 byte[]
without always allocating a new byte[] as
String.getBytes(StandardCharsets.UTF_8) does.
An Analyzer that uses
UnicodeWhitespaceTokenizer.A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
A Highlighter that can get offsets from either
postings (
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS),
term vectors (FieldType.setStoreTermVectorOffsets(boolean)),
or via re-analyzing text.Flags for controlling highlighting behavior.
Fetches stored fields for highlighting.
Source of term offsets; essential for highlighting.
Wraps an IndexReader that remembers/caches the last call to
IndexReader.getTermVectors(int) so that
if the next call has the same ID, then it is reused.PostingsFormat based on the Uniform Split technique.Terms based on the Uniform Split technique.A block-based terms index and dictionary based on the Uniform Split technique.
A block-based terms index and dictionary that assigns terms to nearly
uniform length blocks.
Builds a
FieldMetadata that is the union of multiple FieldMetadata.Dictionary for unknown-word handling.
Dictionary for unknown-word handling.
This wrapper buffers the incoming elements and makes sure they are in
random order.
This
MergePolicy is used for upgrading all existing segments of
an index when calling IndexWriter.forceMerge(int).Normalizes token text to UPPER CASE.
Factory for
UpperCaseFilter.An FST
Outputs implementation where each output
is one or two non-negative long values.Holds two long outputs.
A
QueryCachingPolicy that tracks usage statistics of recently-used
filters in order to decide on which filters are worth caching.Class for building a User Dictionary.
Class for building a User Dictionary.
UserInputQueryBuilder uses 1 of 2 strategies for thread-safe parsing:
1) Synchronizing access to "parse" calls on a previously supplied QueryParser
or..
Converts UTF-32 automata to the equivalent UTF-8 representation.
Static helper methods.
Represents a path in TopNSearcher.
Holds a single input (IntsRef) + output, returned by
shortestPaths().Compares first by the provided comparator, and then
tie breaks by path.input.
Utility class to find top N shortest paths from start
point(s).
Holds the results for a top N search using
Util.TopNSearcherSmartChineseAnalyzer utility constants and methods
This interface should be implemented by
QueryNode that holds an
arbitrary value.Instantiates
FunctionValues for a particular reader.A GroupSelector that groups via a ValueSource
Scorer which returns the result of FunctionValues.floatVal(int) as
the score for a document, and which filters out documents that don't match ValueSourceScorer.matches(int).A helper to parse the context of a variable name, which is the base variable, followed by the
sequence of array (integer or string indexed) and member accesses.
Represents what a piece of a variable does.
Selects index terms according to provided pluggable
VariableGapTermsIndexWriter.IndexTermSelector, and stores them in a prefix trie that's
loaded entirely in RAM stored as an FST.Sets an index term when docFreq >= docFreqThresh, or
every interval terms.
Same policy as
FixedGapTermsIndexWriterHook for selecting which terms should be placed in the terms index.
Converts individual ValueSource instances to leverage the FunctionValues *Val functions that work with multiple values,
i.e.
A
LockFactory that wraps another LockFactory and verifies that each lock obtain/release
is "correct" (never results in two processes holding the
lock at the same time).Use by certain classes to match version compatibility
across releases of Lucene.
This is just like
BlockTreeTermsWriter, except it also stores a version per term, and adds a method to its TermsEnum
implementation to seekExact only if the version is >= the specified version.BlockTree's implementation of
Terms.A utility for keeping backwards compatibility on previously abstract methods
(or similar replacements).
This implements the WAND (Weak AND) algorithm for dynamic pruning
described in "Efficient Query Evaluation using a Two-Level Retrieval
Process" by Broder, Carmel, Herscovici, Soffer and Zien.
Implements a combination of
WeakHashMap and
IdentityHashMap.Expert: Calculate query weights and build query scorers.
Just wraps a Scorer and performs top scoring using it.
Wraps an internal docIdSetIterator for it to start with docID = -1
A weighted implementation of
FieldFragList.A weighted implementation of
FragListBuilder.Lightweight class to hold term, weight, and positions used for scoring this
term.
Class used to extract
WeightedSpanTerms from a Query based on whether
Terms from the Query are contained in a supplied TokenStream.This class makes sure that if both position sensitive and insensitive
versions of the same term are added, the position insensitive one wins.
Lightweight class to hold term and a weight value used for scoring this term
Suggester based on a weighted FST: it first traverses the prefix,
then walks the n shortest paths to retrieve top-ranked
suggestions.
An Analyzer that uses
WhitespaceTokenizer.A tokenizer that divides text at whitespace characters as defined by
Character.isWhitespace(int).Factory for
WhitespaceTokenizer.Just produces one single fragment for the entire text
Extension of StandardTokenizer that is aware of Wikipedia syntax.
Factory for
WikipediaTokenizer.JFlex-generated tokenizer that is aware of Wikipedia syntax.
Implements the wildcard search query.
A
WildcardQueryNode represents wildcard query This does not apply to
phrases.Builds a
WildcardQuery object from a WildcardQueryNode
object.The
StandardSyntaxParser creates PrefixWildcardQueryNode nodes which
have values containing the prefixed wildcard.Native
Directory implementation for Microsoft Windows.
A spell checker whose sole function is to offer suggestions by combining
multiple terms into one word and/or breaking terms into multiple words.
Determines the order to list word break suggestions
Deprecated.
Use
WordDelimiterGraphFilter instead: it produces a correct
token graph so that e.g.Deprecated.
Use
WordDelimiterGraphFilterFactory instead: it produces a correct
token graph so that e.g.Splits words into subwords and performs optional transformations on subword
groups, producing a correct token graph so that e.g.
Factory for
WordDelimiterGraphFilter.A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterGraphFilter rules.
SmartChineseAnalyzer Word Dictionary
Loader for text files that represent a list of stopwords.
Parser for wordnet prolog format
Segment a sentence of Chinese text into words.
Internal SmartChineseAnalyzer token type constants
Represents a circle on the XY plane.
An per-document location field.
XYGeometry query for
XYDocValuesField.reusable cartesian geometry encoding methods
Cartesian Geometry object.
Represents a line in cartesian space.
Represents a point on the earth's surface.
Compares documents by distance from an origin point
An indexed XY position field.
Finds all previously indexed points that fall within the specified XY geometries.
Sorts by distance from an origin location.
Represents a polygon in cartesian space.
Represents a x/y cartesian rectangle.
A cartesian shape utility class for indexing and searching geometries whose vertices are unitless x, y values.
Finds all previously indexed cartesian shapes that comply the given
ShapeField.QueryRelation with
the specified array of XYGeometry.