Class HMMChineseTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenizerFactory
org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
Factory for
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
- Since:
- 4.10.0
-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new HMMChineseTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryMethods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersMethods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
-
Constructor Details
-
HMMChineseTokenizerFactory
Creates a new HMMChineseTokenizerFactory
-
-
Method Details
-
create
Description copied from class:TokenizerFactoryCreates a TokenStream of the specified input using the given AttributeFactory- Specified by:
createin classTokenizerFactory
-