Figure 2: Patch-HAN. The network builds up an image representation as a weighted combination of slice representations, with the sequence representations built from weighted combinations of patch representations.