We can write the equations for the belief state in a compact form by writing the conditional probability tables as transition matrices. To do so, for each action, we merely gather the corresponding rows from the complete conditional probability table into a state transition matrix. For the action move right, this is illustrated as following.
$$ \begin{array}{|l|l|l|l|l|l|l|} \hline X1 & A1 & Living Room & Kitchen & Office & Hallhay & Dining Room \\ \hline Living Room & L & 1 & 0 & 0 & 0 & 0 \\ \hline Living Room & R & 0.2 & 0.8 & 0 & 0 & 0 \\ \hline Living Room & U & 1 & 0 & 0 & 0 & 0 \\ \hline Living Room & D & 02 & 0 & 0 & 0.8 & 0 \\ \hline Kitchen & L & 0.8 & 0.2 & 0 & 0 & 0 \\ \hline Kitchen & R & 0 & 1 & J & 0 & 0 \\ \hline Kitchen & U & 0 & 1 & 0 & 0 & 0 \\ \hline Kitchen & D & 0.2 & 0 & 0 & 0 & 0.8 \\ \hline Office & L & 0 & 0 & 1 & 0 & 0 \\ \hline Office & R & 0 & 0 & 0.2 & 0.8 & 0 \\ \hline Office & U & 0 & 0 & 1 & 0 & 0 \\ \hline Office & D & 0 & 0 & 1 & 0 & 0 \\ \hline Hallwary & L & 0 & 0 & 0.8 & 0.2 & 0 \\ \hline Hallway & R & 0 & 0 & 0 & 0.2 & 0.8 \\ \hline Hallway & U & 0.8 & 0 & 0 & 0.2 & 0 \\ \hline Hallwey & D & 0 & 0 & 0 & 1 & 0 \\ \hline Dining Room & L & 0 & 0 & 0 & 0.8 & 0.2 \\ \hline Dining Room & R & 0 & 0 & 0 & 0 & 1 \\ \hline Dining Room & U & 0 & 0.8 & 0 & 0 & 0.2 \\ \hline Dining Room & D & 0 & 0 & 0 & 0 & 1 \\ \hline \end{array} $$
$$ M_r=\left[\begin{array}{lllll} 0.2 & 0.8 & 0.0 & 0.0 & 0.0 \\ 0.0 & 1.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.2 & 0.8 & 0.0 \\ 0.0 & 0.0 & 0.0 & 0.2 & 0.8 \\ 0.0 & 0.0 & 0.0 & 0.0 & 1.0 \end{array}\right] $$
https://gist.github.com/viadean/22a6431ecc85a007fcc61b535eacfd7f
Hereās how the belief state evolves over time when applying the "move right" action repeatedly from three different starting beliefs:
Each step shifts more probability mass toward the Dining Room, showing how the system "drifts" to more certain beliefs due to the structure of M_r
.
[0.2, 0.2, 0.2, 0.2, 0.2]
[0.000064, 0.399936, 0.000064, 0.001344, 0.598592]
The agent becomes highly confident it's either in the Kitchen or Dining Room.
Shows how uncertainty spreads, but mostly settles in the Kitchen.
[1.0, 0.0, 0.0, 0.0, 0.0]
[0.00032, 0.99968, 0.0, 0.0, 0.0]
The system becomes almost completely confident it's in the Kitchen.