If we look deeper into the training procedure for this problem, we can see from the Figure 4 how each of the attention mechanisms are used. As the location tuples are provided, they are loaded sequentially into memory. Then when a query is performed, the location of the starting station (Victoria) and the lines to be travelled on are written into the matrix. The first read head (in pink) then proceeds to retrieve the relevant stations in order from the memory matrix using the temporal linkage mechanism. The second read head (in blue) then computes the destinations for each of the stations that are retrieved in order. This is done using the content lookup mechanism, which emits a final key vector from the second read head and compares it to the most similar location in the memory matrix, from which the full tuple is then decoded as starting from Victoria and ending up at Tottenham Court Road.