Residual probabilistic automata are not closed under convex sums

9 Jan 2020
Joshua Moerman

Just a quick counterexample. Let’s fix a finite set Ξ£\Sigma as our alphabet.

Definition A probabilistic automaton (PA), A\mathcal{A}, consists of a finite state space Q;Q; a distribution of initial states I∈D(Q);I \in \mathcal{D}(Q); an acceptance function F ⁣:Qβ†’[0,1];F \colon Q \to [0, 1]; and a probabilistic transition function Ξ΄a ⁣:Qβ†’D(Q)\delta_a \colon Q \to \mathcal{D}(Q) for each a∈Σa \in \Sigma.

This is roughly the definition of Rabin automata. Via the usual definitions, the semantics of a state is given as L(q) ⁣:Ξ£βˆ—β†’[0,1]\mathcal{L}(q) \colon \Sigma^\ast \to [0, 1]. The value L(q)(w)\mathcal{L}(q)(w) denotes the probability of accepting ww starting in state qq. The language of the automaton, L(A)\mathcal{L}(\mathcal{A}), is then given by the initial states L(I)\mathcal{L}(I).

Any function L ⁣:Ξ£βˆ—β†’[0,1]\mathcal{L} \colon \Sigma^\ast \to [0, 1] is called a language. Given a word ww, we define the derivative language wβˆ’1Lw^{-1} \mathcal{L} as

wβˆ’1L(u)=L(wu). w^{-1} \mathcal{L} (u) = \mathcal{L}(w u) .

Definition An automaton A\mathcal{A} is residual if for all states q∈Qq \in Q, there is a word wβˆˆΞ£βˆ—w \in \Sigma^\ast such that L(q)=wβˆ’1L(A)\mathcal{L}(q) = w^{-1} \mathcal{L}(\mathcal{A}).

This definition comes from learning theory. There, all information about the automaton has to be inferred from the values L(A)(w)\mathcal{L}(\mathcal{A})(w) form some words ww. This is challenging, as nothing is known about the states. Residuality helps: It tells us that L(q)\mathcal{L}(q) will be observable at some point. (Namely after the word ww.) It means that we can recover all states if we have enough data. Residual PA can be learned efficiently (in the limit), but we do not have such results for general PA.

The class of languages accepted by a residual PA is a strict subclass of languages accepted by PA. But it’s strictly bigger than the class accepted by deterministic PA. The subclass can be characterised as follows.

Theorem (Denis & Esposito) Given a language L\mathcal{L}, TFAE:

  • L\mathcal{L} can be accepted by a residual PA;

  • There is a finite subset GβŠ†Der(L)G \subseteq \text{Der}(\mathcal{L}) which generates Der(L)\text{Der}(\mathcal{L}) by convex sums.

Here Der(L)={wβˆ’1L∣wβˆˆΞ£βˆ—}\text{Der}(\mathcal{L}) = \{ w^{-1} \mathcal{L} \mid w \in \Sigma^\ast \} is the set of all derivatives. (I use a reactive version of PA, whereas Denis and Esposito use a generative model. The differences are inessential.) To get an intuition for the above theorem, compare it to the Myhill-Nerode theorem. The (proof of the) above theorem also gives a canonical construction.

Closed under convex sums?

Given two languages L1,L2\mathcal{L}_1, \mathcal{L}_2, we can consider their convex sum pβ‹…L1+(1βˆ’p)β‹…L2p \cdot \mathcal{L}_1 + (1-p) \cdot \mathcal{L}_2. The class of languages accepted by PA is closed under this operation. The usual union construction for NFA does the trick.

What about residual PA? It turns out those languages are not closed under convex sums. I’ll give an example. First, consider the following PA on a one-letter alphabet.

Non residual PA

This automaton is not residual. We can make any automaton residual by adding so-called β€œanchors.” This drastically changes the language (and alphabet), though. By doing this, we get the automaton A1\mathcal{A}_1 with two letters.

Residual PA

This automaton is residual. Consider the same automaton, but swap aa and bb, call it A2\mathcal{A}_2. The language L=12β‹…L(A1)+12β‹…L(A2)\mathcal{L} = \frac{1}{2} \cdot \mathcal{L}(\mathcal{A}_1) + \frac{1}{2} \cdot \mathcal{L}(\mathcal{A}_2) cannot be accepted by a residual automaton!

To see this, consider the derivatives L\mathcal{L}, aβˆ’nLa^{-n}\mathcal{L}, bβˆ’nLb^{-n}\mathcal{L}, and abβˆ’1Lab^{-1}\mathcal{L}. (Note that wβˆ’1L=abβˆ’1Lw^{-1}\mathcal{L} = ab^{-1}\mathcal{L} whenever ww contains at least one aa and one bb.) The problem is the that aβˆ’nLa^{-n}\mathcal{L} approach a limit, but this limit itself is not a derivative. There is no finite set of derivatives which generate all aβˆ’nLa^{-n}\mathcal{L}. So by the theorem, there is no residual PA for this langauge.

I don’t consider this a problem. Closure properties simply do not play an important role in learning theory. For instance, the must-read paper by Denis and Esposito develop the theory of residual PA in depth, providing many results, such as canonical minimal automata, decision procedures, and complexity results. However, they do not even mention closure under convex sums.

Together with Matteo Sammartino, we have some work in progress on residual nominal automata. It mirrors the situation of probabilistic automata closely, with a bit more undecidability sprinkled around. I think it’s a nice topic, and it’s good to understand residuality and its use in learning in different contexts.