Materials and Methods Brain network construction Generally, to perform network analyses one must define the two important elements of the network -- nodes and edges. These two elements are the building blocks of networks and their accurate definitions are very important for any network models (Butts 2009). The standard method of defining network nodes in the field of network neuroscience is to consider neuroimaging data such fMRI and apply a structural atlas or parcellation that separates the whole brain volume into different regions defined by known anatomical differences. Network nodes thus represent the collection of voxels within anatomically defined region. The network edges reflect statistical dependencies between the activity time series of two nodes. In this study, the brain is parcellated into 112 subcortical and cortical regions (see Supplementary Table 1) defined by the structural Harvard-Oxford atlas of the fMRIB (Smith et al. 2004; Woolrich et al. 2009). Each region's activity is given by the mean time series across all voxels within that region. The edge weights that link network nodes (brain regions) were defined as the wavelet transform coherence (WTC) (Torrence and Compo 1998), smoothed over time and frequency to avoid bias toward unity coherence. We use Morlet wavelets with coefficients given by\(:\) \(\begin{equation}w(t,f)=(\sigma_{t}\sqrt{(}\pi))^{-\frac{1}{2}}e^{-i2\pi ft}e^{-\frac{t^{2}}{2\sigma_{t}^{2}}}\nonumber \\ \end{equation}\) , where f is the centre frequency and \(\sigma_{t}\) is the temporal standard deviation. The time-frequency estimate, X(t,f) of time series x(t) was computed by a convolution with the wavelet coefficients: \(\begin{equation}X(t,f)=x(t)*w(t,f)\nonumber . \\ \end{equation}\) We selected the central frequency of 1/12 Hz corresponding to a spectral width of 0.05 to 0.11 Hz for full width at half maximum. Then the wavelet transform coherence between two time series x(t) and y(t) is defined as follows (Torrence and Compo 1998; Cazelles et al. 2007; Grinsted, Moore, and Jevrejeva 2004)\(:\) \begin{equation} TC^{2}(f,t)=\frac{|S(s^{-1}X_{xy}(t,f)|^{2}}{S(s^{-1}|X_{x}(t,f)|^{2})\cdot S(s^{-1}|X_{y}(t,f)|^{2})}\nonumber \\ \end{equation} ,where \(X_{xy}\) is the cross-wavelet of \(X_{x}\) and \(X_{y}\), s is the scale which depends on the frequency (Cazelles et al. 2007; Grinsted, Moore, and Jevrejeva 2004), and S is the smoothing operator. This definition closely resembles that of a traditional coherence, with the marked difference that the wavelet coherence provides a localized correlation coefficient in both time and frequency. Higher scales are required for lower frequency signals (Cazelles et al. 2007; Grinsted, Moore, and Jevrejeva 2004) and in this study, we used s=32 for the smoothing operation. This procedure was repeated for all pair of regions yielding the 112 by 112 adjacency matrix, A , representing the functional connectivity between brain regions. Network modularity In neuroscience, the term network modularity can be used to refer to the concept that brain regions cluster into modules or communities. These communities can be identified computationally using machine learning techniques in the form of community detections algorithms (Girvan and Newman 2001). A community of nodes is a group of nodes that are tightly interconnected. In this study, we implemented a generalized Louvain community detection algorithm (De Meo et al. 2011; Mucha et al. 2010) which considers multiple adjacency matrices as slices of a multilayer network, and which then produces a partition of brain regions into modules that reflects each subject's community structure across the multiple stages of learning instantiated in the four days of task practice. The multilayer network was constructed by connecting the adjacency matrices of all scans and subjects with interlayer links. We then maximized a multilayer modularity quality function, Q , that seeks a partition of nodes into communities that maximizes intra-community connections (Mucha et al. 2010)\(:\) \(\begin{equation}Q=\frac{1}{2\mu}\sum_{ijs}[(A_{ijs}-\gamma_{s}V_{ijs})\delta_{sr}+\delta_{ij}\omega_{jsr}]\delta(g_{is},g_{jr})\nonumber ,\\ \end{equation}\) where \(A_{ijs}\) is the ijth element of the adjacency matrix of slice s, and element \(V_{ijs}\) is the component of the null model matrix tuned by the structural resolution parameter \(\gamma\). In this study, we set \(\gamma\)=1, which is the standard practice in the field when no a priori hypotheses exist to otherwise inform the choice of \(\gamma\). We employed the Newman-Girvan null model within each layer by using \(V_{ijs}=\frac{k_{is}k_{js}}{2m_{s}}\), where k is the total edge weight and \(m_{s}\) is the total edge weight in slice s. The interslice coupling parameter, \(\omega_{jsr},\) is the connection strength of the interlayer link between node j in slice s and node j in slice r, and the total edge in the network is \(\mu=\frac{1}{2}\sum_{jr}\kappa_{jr}\). The node strength, \(\kappa_{jr}\), is the sum of the intraslice strength and interslice strength: \(\kappa_{jr}=k_{jr}+c_{jr}\), and \(c_{jr}=\sum_{s}\omega_{jrs}.\) In this study, we set \(\omega=1,\) which is the standard practice in the field when no a priori hypotheses exist to otherwise inform the choice of \(\omega\). Finally, the indicator \(\delta(g_{i},g_{j})=1\) if nodes i and j are assigned to the same community, and is 0 otherwise. We obtained a partition of the brain into communities for each scan and subject, and from that ensemble of partitions we constructed a module allegiance matrix\cite{Bassett_2015} , whose elements correspond to the probability that two regions belong to the same community across all scans and subjects. The seven network communities generated with this procedure are shown in Table 1 in the main text. Edge strength In complementary analyses, we also investigated which regions of the brain were characterized by high strength within the network. The edge strength of node i is defined as \(\begin{equation}S_{i}=\frac{1}{N-1}\sum_{j\epsilon N}a_{ij}\nonumber \\ \end{equation}\) ,where \(a_{ij}\) is the ijth element of the adjacency matrix with N nodes.