Package org.apache.lucene.index
Class DocumentsWriter
java.lang.Object
org.apache.lucene.index.DocumentsWriter
- All Implemented Interfaces:
Closeable,AutoCloseable,Accountable
This class accepts multiple added documents and directly
writes segment files.
Each added document is passed to the indexing chain,
which in turn processes the document into the different
codec formats. Some formats write bytes to files
immediately, e.g. stored fields and term vectors, while
others are buffered by the indexing chain and written
only on flush.
Once we have used our allowed RAM buffer, or the number
of added docs is large enough (in the case we are
flushing by doc count instead of RAM usage), we create a
real segment and flush it to the Directory.
Threads:
Multiple threads are allowed into addDocument at once.
There is an initial synchronized call to
DocumentsWriterFlushControl.obtainAndLock()
which allocates a DWPT for this indexing thread. The same
thread will not necessarily get the same DWPT over time.
Then updateDocuments is called on that DWPT without
synchronization (most of the "heavy lifting" is in this
call). Once a DWPT fills up enough RAM or hold enough
documents in memory the DWPT is checked out for flush
and all changes are written to the directory. Each DWPT
corresponds to one segment being written.
When flush is called by IndexWriter we check out all DWPTs
that are associated with the current DocumentsWriterDeleteQueue
out of the DocumentsWriterPerThreadPool and write
them to disk. The flush process can piggy-back on incoming
indexing threads or even block them from adding documents
if flushing can't keep up with new documents being added.
Unless the stall control kicks in to block indexing threads
flushes are happening concurrently to actual index requests.
Exceptions:
Because this class directly updates in-memory posting
lists, and flushes stored fields and term vectors
directly to files in the directory, there are certain
limited times when an exception can corrupt this state.
For example, a disk full while flushing stored fields
leaves this file in a corrupt state. Or, an OOM
exception while appending to the in-memory posting lists
can corrupt that posting list. We call such exceptions
"aborting exceptions". In these cases we must call
abort() to discard all docs added since the last flush.
All other exceptions ("non-aborting exceptions") can
still partially update the index structures. These
updates are consistent, but, they represent only a part
of the document seen up until the exception was hit.
When this happens, we immediately mark the document as
deleted so that the document is always atomically ("all
or none") added to the index.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static interface -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate final LiveIndexWriterConfigprivate DocumentsWriterDeleteQueue(package private) DocumentsWriterDeleteQueue(package private) final DocumentsWriterFlushControlprivate final DocumentsWriter.FlushNotificationsprivate final InfoStreamprivate final AtomicIntegerprivate booleanprivate final AtomicLong(package private) final DocumentsWriterPerThreadPoolprivate final DocumentsWriterFlushQueueFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Constructor Summary
ConstructorsConstructorDescriptionDocumentsWriter(DocumentsWriter.FlushNotifications flushNotifications, int indexCreatedVersionMajor, AtomicLong pendingNumDocs, boolean enableTestPoints, Supplier<String> segmentNameSupplier, LiveIndexWriterConfig config, Directory directoryOrig, Directory directory, FieldInfos.FieldNumbers globalFieldNumberMap) -
Method Summary
Modifier and TypeMethodDescription(package private) voidabort()Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs.private voidReturns how many documents were aborted.(package private) boolean(package private) booleanprivate booleanIf buffered deletes are using too much heap, resolve them and write disk and return true.private longprivate booleanassertTicketQueueModification(DocumentsWriterDeleteQueue deleteQueue) voidclose()(package private) longdeleteQueries(Query... queries) (package private) longdeleteTerms(Term... terms) private booleandoFlush(DocumentsWriterPerThread flushingDWPT) private void(package private) voidfinishFullFlush(boolean success) (package private) long(package private) final boolean(package private) int(package private) longReturns the number of bytes currently being flushed This is a subset of the value returned byramBytesUsed()(package private) longreturns the maximum sequence number for all previously completed operations(package private) long(package private) int(package private) intReturns how many docs are currently buffered in RAM.(package private) CloseableLocks all currently active DWPT and aborts them.private booleanpostUpdate(DocumentsWriterPerThread flushingDWPT, boolean hasEvents) private boolean(package private) voidpurgeFlushTickets(boolean forced, IOUtils.IOConsumer<DocumentsWriterFlushQueue.FlushTicket> consumer) longReturn the memory usage of this object in bytes.(package private) voidresetDeleteQueue(DocumentsWriterDeleteQueue newQueue) private boolean(package private) voidsubtractFlushedNumDocs(int numFlushed) (package private) longupdateDocuments(Iterable<? extends Iterable<? extends IndexableField>> docs, DocumentsWriterDeleteQueue.Node<?> delNode) (package private) longupdateDocValues(DocValuesUpdate... updates) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
pendingNumDocs
-
flushNotifications
-
closed
private volatile boolean closed -
infoStream
-
config
-
numDocsInRAM
-
deleteQueue
-
ticketQueue
-
pendingChangesInCurrentFullFlush
private volatile boolean pendingChangesInCurrentFullFlush -
perThreadPool
-
flushControl
-
currentFullFlushDelQueue
-
-
Constructor Details
-
DocumentsWriter
DocumentsWriter(DocumentsWriter.FlushNotifications flushNotifications, int indexCreatedVersionMajor, AtomicLong pendingNumDocs, boolean enableTestPoints, Supplier<String> segmentNameSupplier, LiveIndexWriterConfig config, Directory directoryOrig, Directory directory, FieldInfos.FieldNumbers globalFieldNumberMap)
-
-
Method Details
-
deleteQueries
- Throws:
IOException
-
deleteTerms
- Throws:
IOException
-
updateDocValues
- Throws:
IOException
-
applyDeleteOrUpdate
private long applyDeleteOrUpdate(ToLongFunction<DocumentsWriterDeleteQueue> function) throws IOException - Throws:
IOException
-
applyAllDeletes
If buffered deletes are using too much heap, resolve them and write disk and return true.- Throws:
IOException
-
purgeFlushTickets
void purgeFlushTickets(boolean forced, IOUtils.IOConsumer<DocumentsWriterFlushQueue.FlushTicket> consumer) throws IOException - Throws:
IOException
-
getNumDocs
int getNumDocs()Returns how many docs are currently buffered in RAM. -
ensureOpen
- Throws:
AlreadyClosedException
-
abort
Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs. This resets our state, discarding any docs added since last flush.- Throws:
IOException
-
flushOneDWPT
- Throws:
IOException
-
lockAndAbortAll
Locks all currently active DWPT and aborts them. The returned Closeable should be closed once the locks for the aborted DWPTs can be released.- Throws:
IOException
-
abortDocumentsWriterPerThread
Returns how many documents were aborted.- Throws:
IOException
-
getMaxCompletedSequenceNumber
long getMaxCompletedSequenceNumber()returns the maximum sequence number for all previously completed operations -
anyChanges
boolean anyChanges() -
getBufferedDeleteTermsSize
int getBufferedDeleteTermsSize() -
getNumBufferedDeleteTerms
int getNumBufferedDeleteTerms() -
anyDeletions
boolean anyDeletions() -
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
preUpdate
- Throws:
IOException
-
postUpdate
private boolean postUpdate(DocumentsWriterPerThread flushingDWPT, boolean hasEvents) throws IOException - Throws:
IOException
-
updateDocuments
long updateDocuments(Iterable<? extends Iterable<? extends IndexableField>> docs, DocumentsWriterDeleteQueue.Node<?> delNode) throws IOException - Throws:
IOException
-
doFlush
- Throws:
IOException
-
getNextSequenceNumber
long getNextSequenceNumber() -
resetDeleteQueue
-
subtractFlushedNumDocs
void subtractFlushedNumDocs(int numFlushed) -
setFlushingDeleteQueue
-
assertTicketQueueModification
-
flushAllThreads
- Throws:
IOException
-
finishFullFlush
- Throws:
IOException
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-
getFlushingBytes
long getFlushingBytes()Returns the number of bytes currently being flushed This is a subset of the value returned byramBytesUsed()
-