Class BrazilianStemmer

java.lang.Object
org.apache.lucene.analysis.br.BrazilianStemmer

public class BrazilianStemmer extends Object
A stemmer for Brazilian Portuguese words.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private String
     
    private static final Locale
     
    private String
     
    private String
     
    private String
     
    private String
    Changed term
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private String
    1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c
    private void
    Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.
    private String
    getR1(String value)
    Gets R1 R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
    private String
    getRV(String value)
    Gets RV RV - IF the second letter is a consonant, RV is the region after the next following vowel, OR if the first two letters are vowels, RV is the region after the next consonant, AND otherwise (consonant-vowel case) RV is the region after the third letter.
    private boolean
    Checks a term if it can be processed indexed.
    private boolean
    Checks a term if it can be processed correctly.
    private boolean
    isVowel(char value)
    See if string is 'a','e','i','o','u'
    log()
    For log and debug purpose
    private String
    removeSuffix(String value, String toRemove)
    Remove a string suffix
    private String
    replaceSuffix(String value, String toReplace, String changeTo)
    Replace a string suffix by another
    protected String
    stem(String term)
    Stems the given term to an unique discriminator.
    private boolean
    Standard suffix removal.
    private boolean
    Verb suffixes.
    private void
    Delete suffix 'i' if in RV and preceded by 'c'
    private void
    Residual suffix If the word ends with one of the suffixes (os a i o á í ó) in RV, delete it
    private void
    If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i') Or if the word ends ç remove the cedilha
    private boolean
    suffix(String value, String suffix)
    Check if a string ends with a suffix
    private boolean
    suffixPreceded(String value, String suffix, String preceded)
    See if a suffix is preceded by a String

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • BrazilianStemmer

      public BrazilianStemmer()
  • Method Details

    • stem

      protected String stem(String term)
      Stems the given term to an unique discriminator.
      Parameters:
      term - The term that should be stemmed.
      Returns:
      Discriminator for term
    • isStemmable

      private boolean isStemmable(String term)
      Checks a term if it can be processed correctly.
      Returns:
      true if, and only if, the given term consists in letters.
    • isIndexable

      private boolean isIndexable(String term)
      Checks a term if it can be processed indexed.
      Returns:
      true if it can be indexed
    • isVowel

      private boolean isVowel(char value)
      See if string is 'a','e','i','o','u'
      Returns:
      true if is vowel
    • getR1

      private String getR1(String value)
      Gets R1 R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
      Returns:
      null or a string representing R1
    • getRV

      private String getRV(String value)
      Gets RV RV - IF the second letter is a consonant, RV is the region after the next following vowel, OR if the first two letters are vowels, RV is the region after the next consonant, AND otherwise (consonant-vowel case) RV is the region after the third letter. BUT RV is the end of the word if this positions cannot be found.
      Returns:
      null or a string representing RV
    • changeTerm

      private String changeTerm(String value)
      1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c
      Returns:
      null or a string transformed
    • suffix

      private boolean suffix(String value, String suffix)
      Check if a string ends with a suffix
      Returns:
      true if the string ends with the specified suffix
    • replaceSuffix

      private String replaceSuffix(String value, String toReplace, String changeTo)
      Replace a string suffix by another
      Returns:
      the replaced String
    • removeSuffix

      private String removeSuffix(String value, String toRemove)
      Remove a string suffix
      Returns:
      the String without the suffix
    • suffixPreceded

      private boolean suffixPreceded(String value, String suffix, String preceded)
      See if a suffix is preceded by a String
      Returns:
      true if the suffix is preceded
    • createCT

      private void createCT(String term)
      Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.
    • step1

      private boolean step1()
      Standard suffix removal. Search for the longest among the following suffixes, and perform the following actions:
      Returns:
      false if no ending was removed
    • step2

      private boolean step2()
      Verb suffixes. Search for the longest among the following suffixes in RV, and if found, delete.
      Returns:
      false if no ending was removed
    • step3

      private void step3()
      Delete suffix 'i' if in RV and preceded by 'c'
    • step4

      private void step4()
      Residual suffix If the word ends with one of the suffixes (os a i o á í ó) in RV, delete it
    • step5

      private void step5()
      If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i') Or if the word ends ç remove the cedilha
    • log

      public String log()
      For log and debug purpose
      Returns:
      TERM, CT, RV, R1 and R2