History of the project

In clinical research, software and data infrastructure development is undervalued and generally underfunded, particularly for small- to mid-sized research organizations. Clinical and health researchers largely lack formal training, support, and awareness in research software and/or data engineering and in building and managing data infrastructures. The result? The overall software and computational ecosystem, as well as the technical capacities to maintain them, lags far behind multiple other scientific domains (e.g., bioinformatics).

Combined with the recent rise of data science and the greater focus on analytical reproducibility, this issue has become increasingly apparent as data, and the skills required to work with it, become ever larger, more technical, and complex. In fact, investing in and implementing scalable and modern data infrastructures as well as software and data engineering processes, built with open source software, have the potential to greatly improve the quality of science, to produce more transparent and streamlined workflows, to lead to reproducible research, and generally better science in less time (1).

Thankfully, building modern data infrastructures has slowly been taking greater priority by funding and research agencies globally. For instance, the UK Biobank (2,3) is a large-scale biomedical database with highly detailed data on ~500,000 participants. It is regularly expanded with additional data and is globally accessible to approved researchers and is a role model to building a functioning research data infrastructure.

While the UK Biobank is a source of inspiration, the underlying infrastructure itself is not openly accessible and reusable. The same applies to a similar Danish initiative, the “Single path to access Danish health data” project (4), where the Danish government and individual regions are collaborating to map out all Danish health data. Another state-of-the-art initiative led by the University of Chicago, USA is Gen3 (5), which contains modular open source services that can form the basis for a data infrastructure (6,7) and powers several research platforms, including the National Institutes of Health (8). However, we are unaware of any similar current national efforts that are open source, re-usable, and suitable for the Danish and EU legal context.

In light of this background, the idea of Seedcase initially formed out of a need to improve the data infrastructure of the Danish Centre for Strategic Research in Type 2 Diabetes (DD2) study (9). We expanded the idea and generalized the idea to improve not only DD2, but also the infrastructure of other Danish studies.

The NovoNordiskFonden in Denmark had an grant application call titled Data Science Research Infrastructure Grant in 2021, of which we’ve included a modified version of the instructions here. We submitted an application for it that resulted in receiving the requested funds.

References

1.
Lowndes JSS, Best BD, Scarborough C, Afflerbach JC, Frazier MR, O’Hara CC, et al. Our path to better science in less time using open data science tools. Nature Ecology & Evolution [Internet]. 2017 May [cited 2021 May 2];1(6):1–7. Available from: https://www.nature.com/articles/s41559-017-0160
2.
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine [Internet]. 2015 Mar [cited 2021 May 2];12(3):e1001779. Available from: https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001779
3.
UK biobank [Internet]. [cited 2021 May 2]. Available from: https://www.ukbiobank.ac.uk/
4.
En indgang til sundhedsdata [Internet]. [cited 2021 May 2]. Available from: https://www.enindgangtilsundhedsdata.dk/
5.
Gen3 data commons [Internet]. [cited 2021 May 2]. Available from: http://gen3.org/
6.
Center for translational data science GitHub repositories [Internet]. GitHub. [cited 2021 May 2]. Available from: https://github.com/uc-cdis
7.
Gen3 software [Internet]. Center for Translational Data Science. [cited 2021 May 2]. Available from: https://ctds.uchicago.edu/gen3
8.
Gen3 - powered by Gen3 [Internet]. [cited 2021 May 2]. Available from: http://gen3.org/powered-by-gen3/
9.
Christensen DH, Nicolaisen SK, Berencsi K, Beck-Nielsen H, Rungby J, Friborg S, et al. Danish centre for strategic research in type 2 diabetes (DD2) project cohort of newly diagnosed patients with type 2 diabetes: A cohort profile. BMJ Open. 2018 Apr;8(4):e017273.