Authorea

Xavier Holt edited Results_discussion_and_Analysis_All__.md almost 8 years ago

Commit id: 7543c395f4ff6bf5c25426a128b91226be66c6e4

deletions | additions

All results included below are calculated on the test-dataset as outlined above. ## Automatic Metrics ### ROUGE Scores (Fig. ?) Looking at the results of our ROUGE tests, we find: * Both HDP models are situated in between our upper/lower baselines. This is to be expected and is a reassurance of the validity of our approach. * The F-scores for the variational method were all actually slightly higher than the GB model. Considering that the motivation for VI was its speed, the fact that it seems to perform at least comparably is excellent. * The relative performance of the different ROUGE metrics didn't appear to change dependent on classifier. ### Per-Word Log Likelihood (Simulated data, results are expected) (Fig. ?) * As expected, the variational methods had a smaller burn-in time and converged faster. * As above, the results after convergence for the VI model were at least comparable with GB.