## The uniform word problem for finite semigroups

The goal of this post is to prove the following theorem of Gurevich. Stronger results have been proven by Mark Sapir and can be found in his survey with Olga Kharlampovich. The version stated here lets one prove the results of this post without appealing to Malcev’s theorem on residual finiteness.

Theorem. (Gurevich). There exists a finite set of relations $R$ over the alphabet $\{a,b\}$ and a word $v\in \{a,b\}^*$ such that if $\mathcal C$ is any class of semigroups containing all finite nilpotent semigroups and $S=\langle a,b\mid R\rangle$, then the following algorithmic problem is undecidable. Given a word $u\in \{a,b\}^*$, it is undecidable whether $\varphi(u)=\varphi(t)$ for every homomorphism $\varphi\colon S\to T$ with $T\in \mathcal C$.

We recall here that a semigroup $S$ is nilpotent if $S^n=\{z\}$ for some $n\geq 1$ and $z\in S$. Necessarily $z$ is a zero, that is, $zs=z=sz$ for all $s\in S$.

We’ll prove this theorem first over a much bigger alphabet. There is then a standard trick to convert the proof to a two letter alphabet that I mostly leave as an exercise.

There is a standard construction of Markov and Post that encodes a Turing machine into a finite presentation of a semigroup (we mostly follow Rotman’s book here). A Turing machine is essentially a rewriting system for instantaneous descriptions of the Turing machine configuration. One can view it as a semigroup presentation. One then adds some rules that rewrite all halting configurations to some fixed symbol. The only subtlety is that you need to show that the symmetry in semigroup derivations doesn’t break the encoding. This is handled by exploiting the determinism to obtain confluence of a sort.

Gurevich’s fundamental insight was that to obtain undecidability for finite semigroups, one needs to in fact encode recursively inseparable sets into a semigroup presentation rather than a non-recursive set. Two disjoint sets $A,B$ of words over some alphabet are said to be recursively inseparable if there is no recursive set $C$ such that $A\subseteq C$ and $B\cap C=\emptyset$. Notice that neither $A$ nor $B$ can be recursive.

For simplicity we work here with Turing machines that have a two-way infinite tape and that in any transition either the tape head moves left or right, or the machine writes on the current cell. We assume all Turing machines have exactly two halting states.

Formally, a Turing machine $M$ has a finite state set $Q$, an alphabet $\Sigma$, a blank symbol $\square\notin \Sigma$, an initial state $q_0\in Q$, halting states $q_1,q_2\in Q$ and a transition function $\delta$. Set $\Sigma_\square=\Sigma\cup \{\square\}$. Then $\delta\colon Q\setminus \{q_1,q_2\}\times \Sigma_\square\to Q\times \left(\{L,R\}\cup \Sigma_\square\right)$. For example $\delta(q,a)=(q',R)$ means that if the current state is $q$ and the symbol under the tape head is $a$ then the machine changes to state $q'$ and moves right. On the other hand, $\delta(q,a)=(q',b)$ means that if the current state is $q$ and the symbol under the tape head is $a$ then the machine changes to state $q'$ and replaces the $a$ under the tape head by a $b$.

A Turing machine starts a computation in the initial state with a word written on the tape (and the rest of the tape blank) and the head above the first letter of the word. We assume that input strings are non-empty. The following is a standard result in the theory of computation.

Theorem A. There exists a Turing machine $M$ such that the set $A$ of input words on which $M$ halts in state $q_1$ is recursively inseparable from the set $B$ of input words in which $M$ halts in state $q_2$.

From now one we fix a Turing machine $M$ as in Theorem A. Let $\Delta=\{[,]\}\cup Q\cup \Sigma_\square$. A word of the form $[xqay]$ with $x,y\in \Sigma_\square^*$, $q\in Q$ and $a\in \Sigma_\square$ will be called an instantaneous description of the Turing machine or a special word for simplicity. One should interpret this word as saying that the machine is in state $q$ and the tape holds the word $xay$ with the head over the $a$. The representation of a tape configuration by special words is not quite unique because of possible padding by blanks at either side, but this will cause no problems.

We now define rewriting rules which simulate the action of the Turing machine $M$. They come in 5 types.

(1) $q_ia\rightarrow q_jb$ if $\delta(q_i,a)=(q_j,b)$ for $a,b\in \Sigma_\square$

(2) $q_iab\to aq_jb$ if $\delta(q_i,a)=(q_j,R)$ for $a,b\in \Sigma_\square$

(3) $q_ia]\to aq_j\square]$ if $\delta(q_i,a)=(q_j,R)$ for $a\in \Sigma_\square$

(4) $aq_ib\to q_jab$ if $\delta(q_i,a)=(q_j,L)$ for $a,b\in \Sigma_\square$

(5) $[q_ib\to [q_j\square a$ if $\delta(q_i,a)=(q_j,L)$ for $a\in \Sigma_\square$

We write $w\Rightarrow_M w'$ if $w=uxv$ and $w'=uyv$ where $x\rightarrow y$ is one of the above rules. The reflexive-transitive closure of $\Rightarrow_M$ is denoted $\Rightarrow^*_M$. The equivalence relation generated by $\Rightarrow_M$ is denoted $\Leftrightarrow^*_M$. It will also be convenient to write $\alpha \Leftrightarrow_M \beta$ if $\alpha\Rightarrow_M \beta$ or $\beta\Rightarrow_M\alpha$.

Note that we have

$A=\{w\in \Sigma^*\mid [q_0w]\Rightarrow_M^* [xq_1ay], \text{for some }x,y\in \Sigma_\square^*, a\in \Sigma_\square\}.$

$B=\{w\in \Sigma^*\mid [q_0w]\Rightarrow_M^* [xq_2ay], \text{for some }x,y\in \Sigma_\square^*, a\in \Sigma_\square\}.$

By a derivation from $w$ to $w'$, we mean a sequence $w=\alpha_0,\ldots,\alpha_m=w'$ such that $\alpha_i\Leftrightarrow_M \alpha_{i+1}$ for each $i$.

The following lemma says that the rewriting system associated to $M$ is confluent when restricted to special words. It is, in fact, nothing more than the determinism of $M$, but it is the key trick to simulating a Turing machine with a semigroup.

Lemma 1. (Confluence)

1. If $w\in \Delta^*$ is a special word and $w\Leftrightarrow_M^* w'$, then $w'$ is special.
2. If $w,w'\in \Delta^*$ are special words and $w\Leftrightarrow_M^* w'$, then there is a special word $v\in \Delta^*$ such that $w\Rightarrow_M^* v\Leftarrow_M^* w'$

Proof. The first statement is clear because applying any of the rewriting rules in the forward or backward direction preserves speciality.

To prove the second statement, suppose that $w=\alpha_0,\ldots,\alpha_m=w'$ is a shortest derivation. We claim that there are no `peaks’ $\alpha_{i-1}\Leftarrow_M \alpha_i\Rightarrow_M \alpha_{i+1}$. Indeed, because $M$ is deterministic at most one rewriting rule can be applied to any special word. Thus $\alpha_{i-1}=\alpha_{i+1}$ and so $w=\alpha_0,\ldots,\alpha_{i-1},\alpha_{i+2},\ldots\alpha_m=w'$ is a shorter derivation.

It is now immediate that either $w\Rightarrow^*_M w'$, $w\Leftarrow^*_M w'$ or there is an index $i$ such that $w\Rightarrow^*_M \alpha_i\Leftarrow^*_M w'$.

Let $\Gamma=\Delta\cup \{\omega\}$ where $\omega$ is a new symbol. We define a rewriting system over $\Gamma$ by retaining the rules of the form 1-5 above and adding the following 3 types of rules, which serve to collapse all special words that lead to the halting state $q_2$.

(6) $q_2a\to q_2$ for $a\in \Sigma_\square$

(7) $aq_2]\to q_2]$ for $a\in \Sigma_\square$

(8) $[q_2]\to \omega$

Let $R$ be the rewriting system consisting of rules (1)-(8). We use $\Rightarrow_R$, $\Rightarrow^*_R$, $\Leftrightarrow_R$ and $\Leftrightarrow_R^*$ analogously to the case of the rewriting system $M$. We need two technical lemmas.

Lemma 2

(a) $[xq_2y]\Rightarrow_R^*\omega$ for any $x,y\in \Sigma_\square^*$.

(b) If $w_1$ is a special word, $w_2$ is not a special word and $w_1\Leftrightarrow_R w_2$, then

1. $w_1=[xq_2a]$ with $x\in \Sigma_\square^*, a\in \Sigma_\square$
2. $w_2=[xq_2]$
3. $w_1\Rightarrow_R w_2$.

Proof. By (6) and (7) we have $[xq_2y]\Rightarrow_R^*[xq_2]\Rightarrow_R^* [q_2]\Rightarrow_R \omega$. This proves (a). For (b), observe that rewriting rules (1)-(5) preserve speciality when applied in either direction, whereas rules (7) and (8) have non-special words on both sides and so preserve non-speciality. Rewriting rule (6) preserves speciality in the backward direction but can remove it in the forward direction if $w_1$ and $w_2$ have the form above.

Lemma 3. If $w$ is special, then one of the following mutually exclusive statements holds.

1. $w\Rightarrow_R^* \omega$
2. $w\Leftrightarrow_R^* w'$ implies that $w'$ is special.

Proof. Since $\omega$ is not a special word, it suffices to show that if (2) does not hold, then (1) does. So assume $w\Leftrightarrow_R^* w'$ with $w'$ not special. Let $w=\alpha_0,\ldots,\alpha_m=w'$ be a derivation. Then because $w$ is special and $w'$ is not, there is a least $i>0$ with $\alpha_i$ not special. By Lemma 2(b), we have $\alpha_{i-1}=[xq_2a]$ and $\alpha_i=[xq_2]$ for some $a\in \Sigma_\square$. By Lemma 1, there exists a special word $u$ such that $w\Rightarrow_M^* u\Leftarrow_M^* \alpha_{i-1}$. But $q_2$ is a halting state and so does not appear on the left hand side of rules (1)-(5). Thus $u=\alpha_{i-1}$ and so $w\Rightarrow^*_R \omega$ by Lemma 2(a).

We can now show that the recursively inseparable sets $A,B$ are encoded in the semigroup presented by the rewriting system $R$.

Corollary 4. The following hold.

(a) $A=\{w\in \Sigma^*\mid [q_0w]\Leftrightarrow^*_R [xq_1ay],\ \text{for some}\ x,y\in \Sigma_\square^*, a\in \Sigma_\square\}.$

(b) $B=\{w\in \Sigma^*\mid [q_0w]\Leftrightarrow^*_R \omega\}$.

Proof.

(b) If $w\in B$, then $[q_0w]\Rightarrow_M^* [xq_2ay]$ for some $x,y\in \Sigma_\square^*$ and $a\in \Sigma_\square$. Lemma 2(a) shows that $[xq_2ay]\Rightarrow_R^*\omega$ and so $[q_0w]\Leftrightarrow^*_R \omega$.

Conversely, if $[q_0w]\Leftrightarrow_R^* \omega$, then by Lemma 3 we have $[q_0w]\Rightarrow_R^* \omega$. Since the left hand side of each of the rewriting rules (6)-(8) contains $q_2$, we conclude that $[q_0w]\Rightarrow_M^*[xq_2ay]$ with $x,y\in \Sigma_\square^*$ and $a\in \Sigma_\square$. Thus $w\in B$.

(a) If $w\in A$, then $[q_0w]\Rightarrow_M^* [xq_1ay]$ for some $x,y\in \Sigma_\square^*$ and $a\in \Sigma_\square$. Suppose conversely that $[q_0w]\Leftrightarrow^*_R [xq_1ay]$ for some $x,y\in \Sigma_\square^*, a\in \Sigma_\square$. First note that because $q_1$ is halting, no rewriting rule from $R$ can be applied to $[xq_1ay]$ and so Lemma 3 implies that all words equivalent to $[xq_1ay]$ are special. Moreover, none of these words can contain $q_2$ by Lemma 2(a), since otherwise we could derive $\omega$. It follows that $[q_0w]\Leftrightarrow^*_M [xq_1ay]$. Therefore, by Lemma 1 there is a special word $z$ such that $[q_0w]\Rightarrow^*_M z\Leftarrow^*_M [xq_1ay]$. But again, we must have $z=[xq_1ay]$ because $q_1$ is halting. Thus $w\in A$.

As the set $B$ is not recursive, we conclude that the word problem is undecidable in the semigroup presented by the rewriting system $R$.

Corollary 5 There is a finitely presented semigroup $S=\langle \Gamma\mid R\rangle$ with undecidable word problem. In fact, there is a fixed word $v\in \Gamma^*$ such that it is undecidable whether another word $u\in \Gamma^*$ is equal to $v$ in $S$.

To prove Gurevich’s theorem, we need to show that certain congruence classes are finite. Let $S$ be the semigroup with presentation $\langle \Gamma\mid R\rangle$. Let $C(w)$ denote the congruence class of the word $w\in \Gamma^*$.

Lemma 6. If $w\in A$, then $C([q_0w])$ is finite.
Proof. Let $m$ be the number of steps for the Turing machine $M$ to halt on input $w$ (in state $q_1$). By Lemma 2, Lemma 3 and Corollary 4, each word in $C([q_0w])$ is special and does not contain the symbol $q_2$. So if $u\in C([q_0w])$, then $[q_0w]\Leftrightarrow_M^* u$. By Lemma 1 this means there is a special word $z$ such that $[q_0w]\Rightarrow_M^* z\Leftarrow_M^* u$. Note that $z$ is an instantaneous description of the Turing machine at some point during its computation on input $w$ and hence we can derive $z$ in at most $m$ steps from $[q_0w]$. Since each of rules (1)-(5) either preserves length or increases it by one, we have that $|u|\leq |z|\leq m+|w|+3$.

We can now use ideal quotients (or Rees quotients) to obtain suitable finite nilpotent quotients of $S$. If $X\subseteq \Gamma^*$, then $F(X)$ is the set of all factors of $X$. So $w\in F(X)$ if and only if $uwv\in X$ for some $u,v\in \Gamma^*$.

We say a subset $Y$ of $\Gamma^*$ is saturated if $w\in Y$ and $w\Leftrightarrow_R^* w'$ implies $w'\in Y$. It is easy to check that if $X$ is saturated, then so is $F(X)$.

Corollary 7. If $w\in A$, then $F(C([q_0w]))$ is finite and saturated.

An immediate consequence of Lemma 6 and Corollary 7 is the following.

Proposition 8. Let $w\in A$ and let $I(w)=\{C(z)\mid z\notin F(C([q_0w]))\}$. Then

1. $I(w)$ is an ideal of $S$.
2. $S/I(w)$ is a finite nilpotent semigroup.
3. If $\eta\colon S\to S/I(w)$ is the canonical projection then $\eta(C(\omega))=0$ and $\eta(C([q_0w]))\neq 0$.

We can now prove a weakened form of the main result.

Theorem B. There exists a finitely presented semigroup $S=\langle \Gamma\mid R \rangle$ and an element $\omega\in \Gamma^*$ such that if $\mathcal C$ is any class of semigroups containing all finite nilpotent semigroups, then it is undecidable given a word $w\in \Gamma^*$ whether, for every homomorphism $\varphi\colon S\to T$ with $T\in \mathcal C$, one has that $\varphi(C(w))=\varphi (C(\omega))$ where $C(u)$ is the element of $S$ represented by a word $u\in \Gamma^*$.

Proof. Let $S=\langle \Gamma \mid R\rangle$ be the finitely presented semigroup we have just constructed above and retain all the previous notation (including $\omega)$. Let $C$ be the set of all words $w\in \Sigma^*$ such that $\varphi(C([q_0w]) = \varphi(C(\omega))$ for all homomorphisms $\varphi\colon S\to T$ with $T\in \mathcal C$. Because $\mathcal C$ contains all finite nilpotent semigroups, Corollary 4 and Proposition 8 shows that $C$ contains $B$ and does not intersect $A$. Thus $C$ is not recursive by the recursive inseparability of $A$ and $B$. The result now follows.

We sketch now the proof of how to obtain an example with two generators. Here we follow Albert, Baldinger and Rhodes, who proved a stronger result that Gurevich in which a relatively free semigroup in a finitely based variety is constructed. Let $\Gamma=\{x_1,\ldots, x_n\}$ be an alphabet. Then Murskii defines an injective homomorphism $\theta\colon \Gamma^*\to \{a,b\}^*$ by $\theta(x_i) = ab^{i+n}a^2$. If $R$ is a rewriting system over $\Gamma$, define $R'$ to be the rewriting system over $\{a,b\}$ with rules $\theta(\ell)\to \theta(r)$ whenever $\ell\to r$ is a rule of $R$.

The following lemma is straightforward combinatorics on words.

Lemma 9. (Murski). If $u,v\in \Gamma^*$ and $\theta(u)=r\theta(v)s$ with $r,s\in \{a,b\}^*$, then $u=r'vs'$ with $\theta(r')=r$, $\theta(s')=s$.

From this one easily deduces that if $u\in \Gamma^*$ then $\theta(u)\Leftrightarrow_{R'} z$ if and only if $z=\theta(z')$ where $u\Leftrightarrow_R z'$. It is then straightforward to prove Gurevich’s theorem. Namely, let $\Gamma$ and $R$ be constructed as above and consider the semigroup presentation $\langle a,b\mid R'\rangle$ with $R'$ constructed from $R$ via $\theta$ as above. It is routine to verify that all results from Corollary 5 and onward now apply if we replace each word over $\Gamma^*$ with its image under $\theta$.