Alberto Pepe edited Introduction.md  almost 11 years ago

Commit id: 13623100068a91502caba248f7d46daf67146afe

deletions | additions      

       

It is difficult to anticipate how your data will be useful in the future. Documenting the known limitations imposed from your data’s provenance is vital. Describing why your data was collected and the context in which you anticipate the data to be used can be a useful way of conveying implicit assumptions.   In making decisions about what data to share, consider cost, privacy, statistical efficiency, and simplicity. Is it too much to expect future researchers to address all of the intricacies of your data collection and protocols? If so, you could consider providing robust intermediate results for future use. However, keep in mind that data reduction and summary statistics limit the scope of future analysis. For example, the mean and standard deviation are sufficient information for a normal distribution, but not for a more general statistical model. When applying encryption and statistical methods to reduce disclosure risk of sensitive information, the use of redacted data can even lead to false inference (for example: \cite{http://dx.doi.org/10.1093%2Fpoq%2Fnfq033}). \cite{10.1093/poq/nfq033}).  These tradeoffs are unavoidable, so provide data products at multiple stages in the processing spectrum if possible. Your data community can provide an invaluable resource to help you understand the potential future use of your data and the the level of processing that would provide them with the greatest value. This is a paradigm shift--conducting research with sharing & reuse in mind is now essential. How do you do things this way?