Package org.apache.lucene.analysis.core
Class WhitespaceTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenizerFactory
org.apache.lucene.analysis.core.WhitespaceTokenizerFactory
Factory for
WhitespaceTokenizer.
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode" maxTokenLen="256"/>
</analyzer>
</fieldType>
Options:
- rule: either "java" for
WhitespaceTokenizeror "unicode" forUnicodeWhitespaceTokenizer - maxTokenLen: max token length, should be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024).
It is rare to need to change this
else
CharTokenizer::DEFAULT_MAX_TOKEN_LEN
- Since:
- 3.1
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final intstatic final StringSPI nameprivate final Stringstatic final Stringprivate static final Collection<String> static final StringFields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new WhitespaceTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryMethods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersMethods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
RULE_JAVA
- See Also:
-
RULE_UNICODE
- See Also:
-
RULE_NAMES
-
rule
-
maxTokenLen
private final int maxTokenLen
-
-
Constructor Details
-
WhitespaceTokenizerFactory
Creates a new WhitespaceTokenizerFactory
-
-
Method Details
-
create
Description copied from class:TokenizerFactoryCreates a TokenStream of the specified input using the given AttributeFactory- Specified by:
createin classTokenizerFactory
-