51 WGS 1KG dataset
-
51 WGS samples from 1000 Genomes Project
- 19 725 105 unique variants
- ~51×19×10⁶ = ~1×10⁹ genotypes
-
Memory, disk, and start up time footprints
- On disk: 1.1 GiB
- RAM footprint in running cluster: ~ 1.7 GiB
- Cluster start up time on a single node (laptop): t = 4.6 sec
- Cluster memory and disk footprints slightly vary depending on number of nodes
2890 WGS dataset
-
2890 WGS samples
- ~86×10⁶ unique variants
- ~2890×86×10⁶ = 2.48×10¹¹ total number of all genotypes
- VEP annotated
-
Memory, disk footprints
- On disk: 29 GiB
- RAM footprint in running k8s cluster with 4 pods: ~ 48.6 GiB
- Cluster memory and disk footprints slightly vary depending on number of nodes
76 156 WGS gnomAD
-
- 76 156 WGS samples
- ~759×10⁶ unique variants
- 415.071703278×10⁹ non-reference (hom+het) genotypes, modelled by HWE with 0.005 no-call rate
- ~57.802404×10¹² total number of genotypes (with homozygous reference and missed genotypes = 76156*759×10⁶)
-
Memory, disk, and start up time footprints
- On disk: 312 GiB
- RAM footprint in running cluster: 450 GiB
- Cluster start up time on a single node (testbed): t = 260 sec
- Cluster start up time on N nodes = t / N
- Cluster memory and disk footprints slightly vary depending on number of nodes