Class Stemmer

java.lang.Object
org.apache.lucene.analysis.hunspell.Stemmer

final class Stemmer extends Object
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
  • Field Details

  • Constructor Details

    • Stemmer

      public Stemmer(Dictionary dictionary)
      Constructs a new Stemmer which will use the provided Dictionary to create its stems.
      Parameters:
      dictionary - Dictionary that will be used to create the stems
  • Method Details

    • stem

      public List<CharsRef> stem(String word)
      Find the stem(s) of the provided word.
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • stem

      public List<CharsRef> stem(char[] word, int length)
      Find the stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • caseOf

      private int caseOf(char[] word, int length)
      returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word
    • caseFoldTitle

      private void caseFoldTitle(char[] word, int length)
      folds titlecase variant of word to titleBuffer
    • caseFoldLower

      private void caseFoldLower(char[] word, int length)
      folds lowercase variant of word (title cased) to lowerBuffer
    • doStem

      private List<CharsRef> doStem(char[] word, int length, boolean caseVariant)
    • uniqueStems

      public List<CharsRef> uniqueStems(char[] word, int length)
      Find the unique stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • newStem

      private CharsRef newStem(char[] buffer, int length, IntsRef forms, int formID)
    • stem

      private List<CharsRef> stem(char[] word, int length, int previous, int prevFlag, int prefixFlag, int recursionDepth, boolean doPrefix, boolean doSuffix, boolean previousWasPrefix, boolean circumfix, boolean caseVariant) throws IOException
      Generates a list of stems for the provided word
      Parameters:
      word - Word to generate the stems for
      previous - previous affix that was removed (so we dont remove same one twice)
      prevFlag - Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive step
      prefixFlag - flag of the most inner removed prefix, so that when removing a suffix, it's also checked against the word
      recursionDepth - current recursiondepth
      doPrefix - true if we should remove prefixes
      doSuffix - true if we should remove suffixes
      previousWasPrefix - true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.
      circumfix - true if the previous prefix removal was signed as a circumfix this means inner most suffix must also contain circumfix flag.
      caseVariant - true if we are searching for a case variant. if the word has KEEPCASE flag it cannot succeed.
      Returns:
      List of stems, or empty list if no stems are found
      Throws:
      IOException
    • checkCondition

      private boolean checkCondition(int condition, char[] c1, int c1off, int c1len, char[] c2, int c2off, int c2len)
      checks condition of the concatenation of two strings
    • applyAffix

      List<CharsRef> applyAffix(char[] strippedWord, int length, int affix, int prefixFlag, int recursionDepth, boolean prefix, boolean circumfix, boolean caseVariant) throws IOException
      Applies the affix rule to the given word, producing a list of stems if any are found
      Parameters:
      strippedWord - Word the affix has been removed and the strip added
      length - valid length of stripped word
      affix - HunspellAffix representing the affix rule itself
      prefixFlag - when we already stripped a prefix, we cant simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!
      recursionDepth - current recursion depth
      prefix - true if we are removing a prefix (false if it's a suffix)
      Returns:
      List of stems for the word, or an empty list if none are found
      Throws:
      IOException
    • hasCrossCheckedFlag

      private boolean hasCrossCheckedFlag(char flag, char[] flags, boolean matchEmpty)
      Checks if the given flag cross checks with the given array of flags
      Parameters:
      flag - Flag to cross check with the array of flags
      flags - Array of flags to cross check against. Can be null
      Returns:
      true if the flag is found in the array or the array is null, false otherwise