The test suites that  Inozemtseva  et al refer to throughout the paper were never used in practice, so their actual effectiveness is difficult to measure.  They only tested Java code, so it is possible that results could vary across other languages.  Some of the studies that they compare their results to used hand-seeded faults, which are notoriously more difficult to detect than mutants and actual faults.  They also mention that some of the previous studies that they are comparing to state that the correlation between coverage and effectiveness didn't appear until high coverage levels were reached.  They dismiss this by saying this is impractical for most developers, which is probably true, but I feel that test-driven development has very high coverage and therefore is a practical example of high coverage that should be taken into account.