Open Data #

Given that we often produce large-scale empirical analyses, resting on particular data concerning the scientific literature, the lab has adopted a strict open data policy. What does this mean? As defined by the EU for Horizon 2020 actions, an open data policy entails that we will:

deposit in a research data repository and take measures to make it possible for third parties to access, mine, exploit, reproduce, and disseminate – free of charge for any user – the following:
- the data, including associated metadata, needed to validate the results presented in scientific publications as soon as possible,
- other data, including associated metadata, as specified and within the deadlines laid down in the project’s data management plan,
- provide information – via the repository – about tools and instruments which we have that are necessary for validating the results, and, where possible, provide the tools and instruments themselves.

As it turns out, there are a number of tools already prepared to allow you to engage in open data sharing. You must upload all data – from the very first raw data including intermediate steps after as many processing phases as is practical – along with a lab notebook or “recipe” for how to reproduce any and all processing that you did to the data. In order of preference, you should choose to upload this data to:

The journal where the paper is published. This is the best choice when suitable storage is available from the publisher, as it minimizes the “distance” for end users from the data package to the final, published product.
On an open repository that provides DOIs. If you upload your data package to an open repository – I strongly recommend figshare – that will provide your data with a DOI (digital object identifier), then your data itself becomes formally citeable, which can cleanly connect your published article to your data package, as well as allowing for any future users of your data to cite it in turn.

If you have questions, ask! These norms are still rapidly evolving, and our group in particular can have weird problems with copyrighted source data coming from journal publishers.