Just a quick counterexample. Let’s fix a finite set $\Sigma$ as our alphabet.

Definition A probabilistic automaton (PA), $\mathcal{A}$, consists of a finite state space $Q;$ a distribution of initial states $I \in \mathcal{D}(Q);$ an acceptance function $F \colon Q \to [0, 1];$ and a probabilistic transition function $\delta_a \colon Q \to \mathcal{D}(Q)$ for each $a \in \Sigma$.

This is roughly the definition of Rabin automata. Via the usual definitions, the semantics of a state is given as $\mathcal{L}(q) \colon \Sigma^\ast \to [0, 1]$. The value $\mathcal{L}(q)(w)$ denotes the probability of accepting $w$ starting in state $q$. The language of the automaton, $\mathcal{L}(\mathcal{A})$, is then given by the initial states $\mathcal{L}(I)$.

Any function $\mathcal{L} \colon \Sigma^\ast \to [0, 1]$ is called a language. Given a word $w$, we define the derivative language $w^{-1} \mathcal{L}$ as

$w^{-1} \mathcal{L} (u) = \mathcal{L}(w u) .$

Definition An automaton $\mathcal{A}$ is residual if for all states $q \in Q$, there is a word $w \in \Sigma^\ast$ such that $\mathcal{L}(q) = w^{-1} \mathcal{L}(\mathcal{A})$.

This definition comes from learning theory. There, all information about the automaton has to be inferred from the values $\mathcal{L}(\mathcal{A})(w)$ form some words $w$. This is challenging, as nothing is known about the states. Residuality helps: It tells us that $\mathcal{L}(q)$ will be observable at some point. (Namely after the word $w$.) It means that we can recover all states if we have enough data. Residual PA can be learned efficiently (in the limit), but we do not have such results for general PA.

The class of languages accepted by a residual PA is a strict subclass of languages accepted by PA. But it’s strictly bigger than the class accepted by deterministic PA. The subclass can be characterised as follows.

Theorem (Denis & Esposito) Given a language $\mathcal{L}$, TFAE:

• $\mathcal{L}$ can be accepted by a residual PA;

• There is a finite subset $G \subseteq \text{Der}(\mathcal{L})$ which generates $\text{Der}(\mathcal{L})$ by convex sums.

Here $\text{Der}(\mathcal{L}) = \{ w^{-1} \mathcal{L} \mid w \in \Sigma^\ast \}$ is the set of all derivatives. (I use a reactive version of PA, whereas Denis and Esposito use a generative model. The differences are inessential.) To get an intuition for the above theorem, compare it to the Myhill-Nerode theorem. The (proof of the) above theorem also gives a canonical construction.

## Closed under convex sums?

Given two languages $\mathcal{L}_1, \mathcal{L}_2$, we can consider their convex sum $p \cdot \mathcal{L}_1 + (1-p) \cdot \mathcal{L}_2$. The class of languages accepted by PA is closed under this operation. The usual union construction for NFA does the trick.

What about residual PA? It turns out those languages are not closed under convex sums. I’ll give an example. First, consider the following PA on a one-letter alphabet.

This automaton is not residual. We can make any automaton residual by adding so-called “anchors.” This drastically changes the language (and alphabet), though. By doing this, we get the automaton $\mathcal{A}_1$ with two letters.

This automaton is residual. Consider the same automaton, but swap $a$ and $b$, call it $\mathcal{A}_2$. The language $\mathcal{L} = \frac{1}{2} \cdot \mathcal{L}(\mathcal{A}_1) + \frac{1}{2} \cdot \mathcal{L}(\mathcal{A}_2)$ cannot be accepted by a residual automaton!

To see this, consider the derivatives $\mathcal{L}$, $a^{-n}\mathcal{L}$, $b^{-n}\mathcal{L}$, and $ab^{-1}\mathcal{L}$. (Note that $w^{-1}\mathcal{L} = ab^{-1}\mathcal{L}$ whenever $w$ contains at least one $a$ and one $b$.) The problem is the that $a^{-n}\mathcal{L}$ approach a limit, but this limit itself is not a derivative. There is no finite set of derivatives which generate all $a^{-n}\mathcal{L}$. So by the theorem, there is no residual PA for this langauge.

I don’t consider this a problem. Closure properties simply do not play an important role in learning theory. For instance, the must-read paper by Denis and Esposito develop the theory of residual PA in depth, providing many results, such as canonical minimal automata, decision procedures, and complexity results. However, they do not even mention closure under convex sums.

Together with Matteo Sammartino, we have some work in progress on residual nominal automata. It mirrors the situation of probabilistic automata closely, with a bit more undecidability sprinkled around. I think it’s a nice topic, and it’s good to understand residuality and its use in learning in different contexts.