51 WGS dataset
- 51 WGS samples from 1000 Genomes Project
- 19 725 105 unique variants
- ~51×19×10⁶ = ~1×10⁹ genotypes
- Memory, disk, and start up time footprints
- On disk: 1.1 GiB
- RAM footprint in running cluster: ~ 1.7 GiB
- Cluster start up time on a single node (laptop): t = 4.6 sec
- Cluster memory and disk footprints slightly vary depending on number of nodes
3202 WGS dataset
- 1KGP 30x on GRCh38 dataset
- 3202 WGS samples
- 138 044 724 unique variants
- 19 348 414 multiallelic variants
- 138 044 724×3202 = 4.42×10¹¹ genotypes (some missing / not existing)
- VEP, ClinVar, gnomAD AF and AlphaMissense annotations
- Memory, disk footprints
- On disk: 41 GiB
- RAM footprint in k8s cluster with 8 pods: 64 GiB
- Cluster memory and disk footprints slightly vary depending on number of nodes
76 156 WGS gnomAD
- gnomAD v3.1
- 76 156 WGS samples
- ~759×10⁶ unique variants
- 415.071703278×10⁹ non-reference (hom+het) genotypes, modelled by HWE with 0.005 no-call rate
- ~57.802404×10¹² total number of genotypes (with homozygous reference and missed genotypes = 76156*759×10⁶)
- Memory, disk, and start up time footprints
- On disk: 312 GiB
- RAM footprint in running cluster: 450 GiB
- Cluster start up time on a single node (testbed): t = 260 sec
- Cluster start up time on N nodes = t / N
- Cluster memory and disk footprints slightly vary depending on number of nodes