This is a very important step, the one in which the expert knowledge can be merged with the information obtained with the data. We will propose just an example of how this could be done. For example we might have some beliefs which are not well represented by this graph. This might happen for many reason. Maybe there is not enough data in the data set for the algorithm to discover this. Maybe it was just a direction imposed to keep the graph acyclic, or maybe when the algorithm had a choice, it picked that direction because it does not actually understand what those variables mean.
Thus what we have various options:
- Remove an existing arrow;
- Add a non-existing arrow;
- Change the direction of an arrow;
In doing that, especially when we remove an arrow, just remember that the thickness of the arrow represents the importance of that link. So we would not want to remove a very thick link. We might at most consider to revert it.
This operation of changing the arrows is done by a process of white-listing and black-listing links. So, looking at the graph, we select a number of links that we want to white-list, i.e. that have to absolutely be there, and a number of links that we want to black-list, i.e. that we don't want to be there.
Just for the purpose of making the example, let us assume that for some reason we believe that the arrow going from PUNTAJE to AMBIENTE.ESCOLAR should be in the opposite direction. Than we would white-list the link (AMBIENTE.ESCOLAR, PUNTAJE) and retrain the model. Since when using some score-based algorithms we can actually check what is the score of a particular network, we can use that score also to evaluate how much this will impact the network based on the data. Remember, though, that you are adding information that is supposedly not contained into the data, so this might also get you a network with a smaller score. So don't let be worried too much if that happens. This phase needs you to critically assess what is going on and decide if a network makes sense or not.