Class Stemmer
java.lang.Object
org.apache.lucene.analysis.hunspell.Stemmer
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It
conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteArrayDataInputprivate final Dictionaryprivate static final intprivate final intprivate char[](package private) final FST.BytesReader[]private final BytesRefprivate char[]private final StringBuilderprivate final StringBuilder(package private) final FST.BytesReader[]private static final intprivate char[]private static final int -
Constructor Summary
ConstructorsConstructorDescriptionStemmer(Dictionary dictionary) Constructs a new Stemmer which will use the provided Dictionary to create its stems. -
Method Summary
Modifier and TypeMethodDescriptionapplyAffix(char[] strippedWord, int length, int affix, int prefixFlag, int recursionDepth, boolean prefix, boolean circumfix, boolean caseVariant) Applies the affix rule to the given word, producing a list of stems if any are foundprivate voidcaseFoldLower(char[] word, int length) folds lowercase variant of word (title cased) to lowerBufferprivate voidcaseFoldTitle(char[] word, int length) folds titlecase variant of word to titleBufferprivate intcaseOf(char[] word, int length) returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the wordprivate booleancheckCondition(int condition, char[] c1, int c1off, int c1len, char[] c2, int c2off, int c2len) checks condition of the concatenation of two stringsdoStem(char[] word, int length, boolean caseVariant) private booleanhasCrossCheckedFlag(char flag, char[] flags, boolean matchEmpty) Checks if the given flag cross checks with the given array of flagsprivate CharsRefstem(char[] word, int length) Find the stem(s) of the provided wordstem(char[] word, int length, int previous, int prevFlag, int prefixFlag, int recursionDepth, boolean doPrefix, boolean doSuffix, boolean previousWasPrefix, boolean circumfix, boolean caseVariant) Generates a list of stems for the provided wordFind the stem(s) of the provided word.uniqueStems(char[] word, int length) Find the unique stem(s) of the provided word
-
Field Details
-
dictionary
-
scratch
-
segment
-
affixReader
-
scratchSegment
-
scratchBuffer
private char[] scratchBuffer -
formStep
private final int formStep -
lowerBuffer
private char[] lowerBuffer -
titleBuffer
private char[] titleBuffer -
EXACT_CASE
private static final int EXACT_CASE- See Also:
-
TITLE_CASE
private static final int TITLE_CASE- See Also:
-
UPPER_CASE
private static final int UPPER_CASE- See Also:
-
prefixReaders
-
prefixArcs
-
suffixReaders
-
suffixArcs
-
-
Constructor Details
-
Stemmer
Constructs a new Stemmer which will use the provided Dictionary to create its stems.- Parameters:
dictionary- Dictionary that will be used to create the stems
-
-
Method Details
-
stem
Find the stem(s) of the provided word.- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
stem
Find the stem(s) of the provided word- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
caseOf
private int caseOf(char[] word, int length) returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word -
caseFoldTitle
private void caseFoldTitle(char[] word, int length) folds titlecase variant of word to titleBuffer -
caseFoldLower
private void caseFoldLower(char[] word, int length) folds lowercase variant of word (title cased) to lowerBuffer -
doStem
-
uniqueStems
Find the unique stem(s) of the provided word- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
newStem
-
stem
private List<CharsRef> stem(char[] word, int length, int previous, int prevFlag, int prefixFlag, int recursionDepth, boolean doPrefix, boolean doSuffix, boolean previousWasPrefix, boolean circumfix, boolean caseVariant) throws IOException Generates a list of stems for the provided word- Parameters:
word- Word to generate the stems forprevious- previous affix that was removed (so we dont remove same one twice)prevFlag- Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive stepprefixFlag- flag of the most inner removed prefix, so that when removing a suffix, it's also checked against the wordrecursionDepth- current recursiondepthdoPrefix- true if we should remove prefixesdoSuffix- true if we should remove suffixespreviousWasPrefix- true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.circumfix- true if the previous prefix removal was signed as a circumfix this means inner most suffix must also contain circumfix flag.caseVariant- true if we are searching for a case variant. if the word has KEEPCASE flag it cannot succeed.- Returns:
- List of stems, or empty list if no stems are found
- Throws:
IOException
-
checkCondition
private boolean checkCondition(int condition, char[] c1, int c1off, int c1len, char[] c2, int c2off, int c2len) checks condition of the concatenation of two strings -
applyAffix
List<CharsRef> applyAffix(char[] strippedWord, int length, int affix, int prefixFlag, int recursionDepth, boolean prefix, boolean circumfix, boolean caseVariant) throws IOException Applies the affix rule to the given word, producing a list of stems if any are found- Parameters:
strippedWord- Word the affix has been removed and the strip addedlength- valid length of stripped wordaffix- HunspellAffix representing the affix rule itselfprefixFlag- when we already stripped a prefix, we cant simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!recursionDepth- current recursion depthprefix- true if we are removing a prefix (false if it's a suffix)- Returns:
- List of stems for the word, or an empty list if none are found
- Throws:
IOException
-
hasCrossCheckedFlag
private boolean hasCrossCheckedFlag(char flag, char[] flags, boolean matchEmpty) Checks if the given flag cross checks with the given array of flags- Parameters:
flag- Flag to cross check with the array of flagsflags- Array of flags to cross check against. Can benull- Returns:
trueif the flag is found in the array or the array isnull,falseotherwise
-