As previously stated, the test suites were never used in practice.  Effectiveness is difficult to define, so using real faults would be a good first step.   Inozemtseva et al also mentions the possibility of performing a study of how coverage and effectiveness change over time.  Bugs that are not caught immediately by the test suite could be studied to determine what kinds of bugs are the most difficult to detect, so more optimized methods of detecting them can be created.