Class Dictionary

java.lang.Object
org.apache.lucene.analysis.hunspell.Dictionary

public class Dictionary extends Object
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
  • Field Details

  • Constructor Details

    • Dictionary

      public Dictionary(Directory tempDir, String tempFileNamePrefix, InputStream affix, InputStream dictionary) throws IOException, ParseException
      Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.
      Parameters:
      tempDir - Directory to use for offline sorting
      tempFileNamePrefix - prefix to use to generate temp file names
      affix - InputStream for reading the hunspell affix file (won't be closed).
      dictionary - InputStream for reading the hunspell dictionary file (won't be closed).
      Throws:
      IOException - Can be thrown while reading from the InputStreams
      ParseException - Can be thrown if the content of the files does not meet expected formats
    • Dictionary

      public Dictionary(Directory tempDir, String tempFileNamePrefix, InputStream affix, List<InputStream> dictionaries, boolean ignoreCase) throws IOException, ParseException
      Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.
      Parameters:
      tempDir - Directory to use for offline sorting
      tempFileNamePrefix - prefix to use to generate temp file names
      affix - InputStream for reading the hunspell affix file (won't be closed).
      dictionaries - InputStream for reading the hunspell dictionary files (won't be closed).
      Throws:
      IOException - Can be thrown while reading from the InputStreams
      ParseException - Can be thrown if the content of the files does not meet expected formats
  • Method Details

    • lookupWord

      IntsRef lookupWord(char[] word, int offset, int length)
      Looks up Hunspell word forms from the dictionary
    • lookupPrefix

      IntsRef lookupPrefix(char[] word, int offset, int length)
    • lookupSuffix

      IntsRef lookupSuffix(char[] word, int offset, int length)
    • lookup

      IntsRef lookup(FST<IntsRef> fst, char[] word, int offset, int length)
    • readAffixFile

      private void readAffixFile(InputStream affixStream, CharsetDecoder decoder) throws IOException, ParseException
      Reads the affix file through the provided InputStream, building up the prefix and suffix maps
      Parameters:
      affixStream - InputStream to read the content of the affix file from
      decoder - CharsetDecoder to decode the content of the file
      Throws:
      IOException - Can be thrown while reading from the InputStream
      ParseException
    • affixFST

      private FST<IntsRef> affixFST(TreeMap<String,List<Integer>> affixes) throws IOException
      Throws:
      IOException
    • escapeDash

      static String escapeDash(String re)
    • parseAffix

      private void parseAffix(TreeMap<String,List<Integer>> affixes, String header, LineNumberReader reader, String conditionPattern, Map<String,Integer> seenPatterns, Map<String,Integer> seenStrips) throws IOException, ParseException
      Parses a specific affix rule putting the result into the provided affix map
      Parameters:
      affixes - Map where the result of the parsing will be put
      header - Header line of the affix rule
      reader - BufferedReader to read the content of the rule from
      conditionPattern - String.format(String, Object...) pattern to be used to generate the condition regex pattern
      seenPatterns - map from condition -> index of patterns, for deduplication.
      Throws:
      IOException - Can be thrown while reading the rule
      ParseException
    • parseConversions

      private FST<CharsRef> parseConversions(LineNumberReader reader, int num) throws IOException, ParseException
      Throws:
      IOException
      ParseException
    • getDictionaryEncoding

      static String getDictionaryEncoding(InputStream affix) throws IOException, ParseException
      Parses the encoding specified in the affix file readable through the provided InputStream
      Parameters:
      affix - InputStream for reading the affix file
      Returns:
      Encoding specified in the affix file
      Throws:
      IOException - Can be thrown while reading from the InputStream
      ParseException - Thrown if the first non-empty non-comment line read from the file does not adhere to the format SET <encoding>
    • getJavaEncoding

      private CharsetDecoder getJavaEncoding(String encoding)
      Retrieves the CharsetDecoder for the given encoding. Note, This isn't perfect as I think ISCII-DEVANAGARI and MICROSOFT-CP1251 etc are allowed...
      Parameters:
      encoding - Encoding to retrieve the CharsetDecoder for
      Returns:
      CharSetDecoder for the given encoding
    • getFlagParsingStrategy

      static Dictionary.FlagParsingStrategy getFlagParsingStrategy(String flagLine)
      Determines the appropriate Dictionary.FlagParsingStrategy based on the FLAG definition line taken from the affix file
      Parameters:
      flagLine - Line containing the flag information
      Returns:
      FlagParsingStrategy that handles parsing flags in the way specified in the FLAG definition
    • unescapeEntry

      String unescapeEntry(String entry)
    • morphBoundary

      static int morphBoundary(String line)
    • indexOfSpaceOrTab

      static int indexOfSpaceOrTab(String text, int start)
    • readDictionaryFiles

      private void readDictionaryFiles(Directory tempDir, String tempFileNamePrefix, List<InputStream> dictionaries, CharsetDecoder decoder, Builder<IntsRef> words) throws IOException
      Reads the dictionary file through the provided InputStreams, building up the words map
      Parameters:
      dictionaries - InputStreams to read the dictionary file through
      decoder - CharsetDecoder used to decode the contents of the file
      Throws:
      IOException - Can be thrown while reading from the file
    • decodeFlags

      static char[] decodeFlags(BytesRef b)
    • encodeFlags

      static void encodeFlags(BytesRefBuilder b, char[] flags)
    • parseAlias

      private void parseAlias(String line)
    • getAliasValue

      private String getAliasValue(int id)
    • getStemException

      String getStemException(int id)
    • parseMorphAlias

      private void parseMorphAlias(String line)
    • parseStemException

      private String parseStemException(String morphData)
    • hasFlag

      static boolean hasFlag(char[] flags, char flag)
    • cleanInput

      CharSequence cleanInput(CharSequence input, StringBuilder reuse)
    • caseFold

      char caseFold(char c)
      folds single character (according to LANG if present)
    • applyMappings

      static void applyMappings(FST<CharsRef> fst, StringBuilder sb) throws IOException
      Throws:
      IOException
    • getIgnoreCase

      public boolean getIgnoreCase()
      Returns true if this dictionary was constructed with the ignoreCase option
    • setDefaultTempDir

      public static void setDefaultTempDir(Path tempDir)
      Used by test framework
    • getDefaultTempDir

      static Path getDefaultTempDir() throws IOException
      Returns the default temporary directory. By default, java.io.tmpdir. If not accessible or not available, an IOException is thrown
      Throws:
      IOException