Parametric Weighted Automata

12 Apr 2019
Joshua Moerman

I moved to Aachen for a post-doc on probabilistic programs (see FRAPPANT). This means that I am learning about new, exciting topics. I will collect notes on on this blog, as they may be useful to others. One of the research lines here is that of parametric Markov chains, see for example Parameter Synthesis for Markov Models: Faster Than Ever and references therein.

Parametric Markov chains are just like Markov chains, except that they allow for unspecified, or parametric edges. Instead of a precise probability, these edges will have a formal parameter pp. To make life a bit easier, I generalise from probabilities in [0,1][0, 1] to any value in R\mathbb{R}. So in this post, I talk about parametric weighted finite automata (pWFAs). I will do this for a single parameter pp, but it all generalises nicely to multiple parameters.


Example Let’s start with an example, before we define pWFAs. The image below defines a pWFA with two states: q0q_0 (initial) and q1q_1 which have the outputs o(q0)=0o(q_0) = 0 and o(q1)=1βˆ’po(q_1) = 1-p respectively. We have a single letter a∈Σa \in \Sigma in our alphabet. On this letter, the state q0q_0 transitions to q0q_0 with weight pp and to q1q_1 with weight 11. State q1q_1 simply transitions to itself with weight 11. (If there is just a single transition with unit weight, we often simplify the labels in the picture.)

Example pWFA

What is the acceptance of – say – the word aaaa? In total there are three paths: q0β†’q0β†’q0q_0 \to q_0 \to q_0, q0β†’q0β†’q1q_0 \to q_0 \to q_1, and q0β†’q1β†’q1q_0 \to q_1 \to q_1 with the respective weights pβ‹…pβ‹…0p \cdot p \cdot 0, pβ‹…(1βˆ’p)β‹…1p \cdot (1-p) \cdot 1, and (1βˆ’p)β‹…(1βˆ’p)β‹…1(1-p) \cdot (1-p) \cdot 1. Summing these up, gives an acceptance of 1βˆ’p21 - p^2. This polynomial can evaluated for any p∈Rp \in \mathbb{R}.


Let us fix an alphabet Ξ£\Sigma.

Definition A parametric weighted finite automaton, A\mathcal{A}, consists of

  • a state space: a finite set QQ,
  • an initial vector: an element i∈R⟨Q⟩i \in \mathbb{R} \langle Q \rangle,
  • an output function: a map o ⁣:Qβ†’Ro \colon Q \to \mathbb{R}, and
  • transition functions: for each a∈Σa \in \Sigma we have a map Ξ΄a ⁣:QΓ—Qβ†’R[p]\delta_a \colon Q \times Q \to \mathbb{R}[p].

Let me explain the notation: R⟨Q⟩\mathbb{R} \langle Q \rangle is the free R\mathbb{R}-vector space with QQ as a basis. As a set it is R⟨Q⟩=RQ={f ⁣:Qβ†’R}\mathbb{R} \langle Q \rangle = \mathbb{R}^Q = \{ f \colon Q \to \mathbb{R} \}, but we think of the elements v∈R⟨Q⟩v \in \mathbb{R} \langle Q \rangle as formal sums v=βˆ‘qrqβ‹…qv = \sum_q r_q \cdot q for some values rq∈Rr_q \in \mathbb{R}. So the initial vector ii is really a linear combination of states. In the example above, the initial vector is the state q0q_0 with weight 11.

In the transition function we see R[p]\mathbb{R}[p], which is the polynomial ring with indeterminate pp. (It is also a free vector space, but with a basis 1,p,p2,…1, p, p^2, \ldots) This allows transitions to use normal weights from R\mathbb{R}, but also polynomials involving pp. The weight is Ξ΄a(q,qβ€²)=0\delta_a(q, q') = 0 if there is no aa-transition from qq to qβ€²q'. The maps Ξ΄a\delta_a can be equivalently defined as Ξ΄a ⁣:Qβ†’R[p]⟨Q⟩\delta_a \colon Q \to \mathbb{R}[p] \langle Q \rangle. (This is the free R[p]\mathbb{R}[p]-module on QQ.) Going further, it can be equivalently expressed as a R\mathbb{R}-linear map Ξ΄a ⁣:R⟨QβŸ©β†’R[p]⟨Q⟩\delta_a \colon \mathbb{R} \langle Q \rangle \to \mathbb{R}[p] \langle Q \rangle, or even a R[p]\mathbb{R}[p]-linear map. All these manipulations are one-to-one, and I will switch to whatever is useful in the context.

We can do the same for the output map. It is equivalently defined as a linear map o ⁣:R[p]⟨QβŸ©β†’R[p]o \colon \mathbb{R}[p] \langle Q \rangle \to \mathbb{R}[p], i.e., it is a dual vector o∈R[p]⟨QβŸ©βˆ—o \in \mathbb{R}[p] \langle Q \rangle^{\ast}.


Example To see why these equivalent characterisations matter, we can compute the example from above: o(Ξ΄a(Ξ΄a(i)))=o(Ξ΄a(Ξ΄a(q0)))=o(Ξ΄a(pβ‹…q0+(1βˆ’p)β‹…q1))=o(pβ‹…Ξ΄a(q0)+(1βˆ’p)β‹…Ξ΄a(q1))=o(p2β‹…q0+(1βˆ’p)pβ‹…q1+(1βˆ’p)β‹…q1)=p2β‹…o(q0)+(pβˆ’p2+1βˆ’p)β‹…o(q1)=1βˆ’p2. \begin{aligned} o(\delta_a(\delta_a(i))) &= o(\delta_a(\delta_a(q_0))) \\ &= o(\delta_a(p \cdot q_0 + (1-p) \cdot q_1)) \\ &= o(p \cdot \delta_a(q_0) + (1-p) \cdot \delta_a(q_1)) \\ &= o(p^2 \cdot q_0 + (1-p)p \cdot q_1 + (1-p) \cdot q_1) \\ &= p^2 \cdot o(q_0) + (p - p^2 + 1 - p) \cdot o(q_1) \\ &= 1 - p^2. \end{aligned} I have used linearity in many of these steps.


Applications

Why do we bother with all this? We often use Markov chains in model checking, and typically we want to compute things like β€œwhat is the probability that something good eventually happens?” But how do we actually obtain a Markov chain in the first place? Some engineer, or domain expert, will design it and has to justify why it models reality. It may be easier to leave some edges unspecified, by using a parameter. Luckily, as we will see, we can still do model checking for parametrised systems. This probability will then, of course, depend on the parameter. So instead of a single value, we obtain a function from the model checker.

I will not be precise in the type of properties we can model check. But I will assume that β€œgood” states are states with an output 11, and that good states have no outgoing edges (once we are in a good state, we stay there).

Disclaimer You can imagine that for general weighted automata, the outcome may not be a probability (a value in [0,1][0, 1]). We will need all the weights to sum up to 11, and we need bounds on pp, etc. We ignore well-definedness in this post and pretend everything is fine.

The probability of reaching a good state is the sum over all paths leading to a good state: outcome=βˆ‘wβˆˆΞ£βˆ—o(Ξ΄w(i)). \text{outcome} = \sum_{w \in \Sigma^{\ast}} o (\delta_w(i)). Here I have abbreviated Ξ΄a1β‹―ak=Ξ΄akβˆ˜β‹―βˆ˜Ξ΄a1\delta_{a_1 \cdots a_k} = \delta_{a_k} \circ \cdots \circ \delta_{a_1}. Let’s keep our fingers crossed and hope that this sum converges. (β€œMathematics is wishful thinking” - Wim Veldman.) How can we compute it?

We first generalise this quantity to any state: v(q)=βˆ‘wo(Ξ΄w(q)). v(q) = \sum_w o(\delta_w(q)). Then, by unrolling the transition function once, we see: v(q)=o(q)+βˆ‘av(Ξ΄a(q)), v(q) = o(q) + \sum_a v (\delta_a(q)), where I’ve silently extended vv linearly.

Now we manipulate this equation, first we write it as a composition (again using linearity to push the sum inwards): v(q)=o(q)+(vβˆ˜βˆ‘aΞ΄a)(q). v(q) = o(q) + (v \circ \sum_a \delta_a)(q). Then, let’s get rid of qq, and keep the linear functions. We might as well write the composition as a matrix multiplication. v=o+vΓ—βˆ‘aΞ΄a. v = o + v \times \sum_a \delta_a. Put things to the other side (here 11 is the identity matrix). o=vΓ—(1βˆ’βˆ‘aΞ΄a). o = v \times (1 - \sum_a \delta_a). And, finally, transpose and rename: AvT=oT, A v^{T} = o^{T}, where the matrix A=1βˆ’βˆ‘aΞ΄aTA = 1 - \sum_a \delta_a^{T}.

So, we see that we can compute v(i)v(i), the value that we are interested in, by solving a linear system of equations! Wait, what? Is that really all there is to it? Well, we have cheated a bit, the maps Ξ΄a\delta_a live in a R[p]\mathbb{R}[p]-linear space. This is not a field, and so we cannot expect to just solve this equation. Can we even ask for a map v ⁣:R[p]⟨QβŸ©β†’R[p]v \colon \mathbb{R}[p] \langle Q \rangle \to \mathbb{R}[p]?

There is a way out: Consider the field of fractions R(p)\mathbb{R}(p). This is the set of rational functions fg\frac{f}{g}, where ff and gg are polynomials. This is a field which extends the polynomial ring, and so we can solve the equation here. One can solve it with, for example, Guassian elimination, which is very similar to state elimination. (Side note: value iteration will not work in such a field, I believe.)


In the end we find that the probability of eventually ending up in a good state is not a polynomial in pp. Rather, it is a rational function. I find it cool, that we can derive this with pure linear algebra. Although, this derivation seems easy, of course it hinges on the assumption that a solution exists, a fact which has to be shown separately. Also note that we can do this with Q\mathbb{Q} as our base field, instead of R\mathbb{R}, in order to do this numerically exact.

Sebastian Junges has implemented such a tool (together with colleagues at RWTH Aachen) and has a nice slide (see s. 41) where he shows that these rational functions can be monstrous expressions. I would like to thank Jip Spel to introduce me to the basics of parametric Markov chains.

Learning

In a subsequent post, I hope to talk a bit about (active) learning of parametric weighted automata. Stay tuned!


Below you can leave comments via Disqus. You may need to disable your adblocker temporarily.

comments powered by Disqus