Class Operations

java.lang.Object
org.apache.lucene.util.automaton.Operations

public final class Operations extends Object
Automata operations.
  • Field Details

  • Constructor Details

    • Operations

      private Operations()
  • Method Details

    • concatenate

      public static Automaton concatenate(Automaton a1, Automaton a2)
      Returns an automaton that accepts the concatenation of the languages of the given automata.

      Complexity: linear in total number of states.

    • concatenate

      public static Automaton concatenate(List<Automaton> l)
      Returns an automaton that accepts the concatenation of the languages of the given automata.

      Complexity: linear in total number of states.

    • optional

      public static Automaton optional(Automaton a)
      Returns an automaton that accepts the union of the empty string and the language of the given automaton. This may create a dead state.

      Complexity: linear in number of states.

    • repeat

      public static Automaton repeat(Automaton a)
      Returns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language.

      Complexity: linear in number of states.

    • repeat

      public static Automaton repeat(Automaton a, int count)
      Returns an automaton that accepts min or more concatenated repetitions of the language of the given automaton.

      Complexity: linear in number of states and in min.

    • repeat

      public static Automaton repeat(Automaton a, int min, int max)
      Returns an automaton that accepts between min and max (including both) concatenated repetitions of the language of the given automaton.

      Complexity: linear in number of states and in min and max.

    • toSet

      private static Set<Integer> toSet(Automaton a, int offset)
    • complement

      public static Automaton complement(Automaton a, int maxDeterminizedStates)
      Returns a (deterministic) automaton that accepts the complement of the language of the given automaton.

      Complexity: linear in number of states if already deterministic and exponential otherwise.

      Parameters:
      maxDeterminizedStates - maximum number of states determinizing the automaton can result in. Set higher to allow more complex queries and lower to prevent memory exhaustion.
    • minus

      public static Automaton minus(Automaton a1, Automaton a2, int maxDeterminizedStates)
      Returns a (deterministic) automaton that accepts the intersection of the language of a1 and the complement of the language of a2. As a side-effect, the automata may be determinized, if not already deterministic.

      Complexity: quadratic in number of states if a2 already deterministic and exponential in number of a2's states otherwise.

    • intersection

      public static Automaton intersection(Automaton a1, Automaton a2)
      Returns an automaton that accepts the intersection of the languages of the given automata. Never modifies the input automata languages.

      Complexity: quadratic in number of states.

    • sameLanguage

      public static boolean sameLanguage(Automaton a1, Automaton a2)
      Returns true if these two automata accept exactly the same language. This is a costly computation! Both automata must be determinized and have no dead states!
    • hasDeadStates

      public static boolean hasDeadStates(Automaton a)
      Returns true if this automaton has any states that cannot be reached from the initial state or cannot reach an accept state. Cost is O(numTransitions+numStates).
    • hasDeadStatesFromInitial

      public static boolean hasDeadStatesFromInitial(Automaton a)
      Returns true if there are dead states reachable from an initial state.
    • hasDeadStatesToAccept

      public static boolean hasDeadStatesToAccept(Automaton a)
      Returns true if there are dead states that reach an accept state.
    • subsetOf

      public static boolean subsetOf(Automaton a1, Automaton a2)
      Returns true if the language of a1 is a subset of the language of a2. Both automata must be determinized and must have no dead states.

      Complexity: quadratic in number of states.

    • union

      public static Automaton union(Automaton a1, Automaton a2)
      Returns an automaton that accepts the union of the languages of the given automata.

      Complexity: linear in number of states.

    • union

      public static Automaton union(Collection<Automaton> l)
      Returns an automaton that accepts the union of the languages of the given automata.

      Complexity: linear in number of states.

    • determinize

      public static Automaton determinize(Automaton a, int maxDeterminizedStates)
      Determinizes the given automaton.

      Worst case complexity: exponential in number of states.

      Parameters:
      maxDeterminizedStates - Maximum number of states created when determinizing. Higher numbers allow this operation to consume more memory but allow more complex automatons. Use DEFAULT_MAX_DETERMINIZED_STATES as a decent default if you don't know how many to allow.
      Throws:
      TooComplexToDeterminizeException - if determinizing a creates an automaton with more than maxDeterminizedStates
    • isEmpty

      public static boolean isEmpty(Automaton a)
      Returns true if the given automaton accepts no strings.
    • isTotal

      public static boolean isTotal(Automaton a)
      Returns true if the given automaton accepts all strings. The automaton must be minimized.
    • isTotal

      public static boolean isTotal(Automaton a, int minAlphabet, int maxAlphabet)
      Returns true if the given automaton accepts all strings for the specified min/max range of the alphabet. The automaton must be minimized.
    • run

      public static boolean run(Automaton a, String s)
      Returns true if the given string is accepted by the automaton. The input must be deterministic.

      Complexity: linear in the length of the string.

      Note: for full performance, use the RunAutomaton class.

    • run

      public static boolean run(Automaton a, IntsRef s)
      Returns true if the given string (expressed as unicode codepoints) is accepted by the automaton. The input must be deterministic.

      Complexity: linear in the length of the string.

      Note: for full performance, use the RunAutomaton class.

    • getLiveStates

      private static BitSet getLiveStates(Automaton a)
      Returns the set of live states. A state is "live" if an accept state is reachable from it and if it is reachable from the initial state.
    • getLiveStatesFromInitial

      private static BitSet getLiveStatesFromInitial(Automaton a)
      Returns bitset marking states reachable from the initial state.
    • getLiveStatesToAccept

      private static BitSet getLiveStatesToAccept(Automaton a)
      Returns bitset marking states that can reach an accept state.
    • removeDeadStates

      public static Automaton removeDeadStates(Automaton a)
      Removes transitions to dead states (a state is "dead" if it is not reachable from the initial state or no accept state is reachable from it.)
    • isFinite

      public static boolean isFinite(Automaton a)
      Returns true if the language of this automaton is finite. The automaton must not have any dead states.
    • isFinite

      private static boolean isFinite(Transition scratch, Automaton a, int state, BitSet path, BitSet visited, int level)
      Checks whether there is a loop containing state. (This is sufficient since there are never transitions to dead states.)
    • getCommonPrefix

      public static String getCommonPrefix(Automaton a)
      Returns the longest string that is a prefix of all accepted strings and visits each state at most once. The automaton must be deterministic.
      Returns:
      common prefix, which can be an empty (length 0) String (never null)
    • getCommonPrefixBytesRef

      public static BytesRef getCommonPrefixBytesRef(Automaton a)
      Returns the longest BytesRef that is a prefix of all accepted strings and visits each state at most once. The automaton must be deterministic.
      Returns:
      common prefix, which can be an empty (length 0) BytesRef (never null)
    • getSingleton

      public static IntsRef getSingleton(Automaton a)
      If this automaton accepts a single input, return it. Else, return null. The automaton must be deterministic.
    • getCommonSuffixBytesRef

      public static BytesRef getCommonSuffixBytesRef(Automaton a, int maxDeterminizedStates)
      Returns the longest BytesRef that is a suffix of all accepted strings. Worst case complexity: exponential in number of states (this calls determinize).
      Parameters:
      maxDeterminizedStates - maximum number of states determinizing the automaton can result in. Set higher to allow more complex queries and lower to prevent memory exhaustion.
      Returns:
      common suffix, which can be an empty (length 0) BytesRef (never null)
    • reverseBytes

      private static void reverseBytes(BytesRef ref)
    • reverse

      public static Automaton reverse(Automaton a)
      Returns an automaton accepting the reverse language.
    • reverse

      static Automaton reverse(Automaton a, Set<Integer> initialStates)
      Reverses the automaton, returning the new initial states.
    • totalize

      static Automaton totalize(Automaton a)
      Returns a new automaton accepting the same language with added transitions to a dead state so that from every state and every label there is a transition.
    • topoSortStates

      public static int[] topoSortStates(Automaton a)
      Returns the topological sort of all states reachable from the initial state. Behavior is undefined if this automaton has cycles. CPU cost is O(numTransitions), and the implementation is recursive so an automaton matching long strings may exhaust the java stack.
    • topoSortStatesRecurse

      private static int topoSortStatesRecurse(Automaton a, BitSet visited, int[] states, int upto, int state, int level)