Skip to content

RAM & Disk Footprints

Footprints and Scaling

Scaling from local development environments to production clusters.


51 WGS dataset
  • Dataset: 51 WGS samples from 1000 Genomes Project
    • 19 725 105 unique variants
    • ~51×19×10⁶ = ~1×10⁹ genotypes
  • Memory, disk, and startup time footprints:
    • On disk: 1.1 GiB
    • RAM footprint: ~1.7 GiB
    • Startup time on single node (laptop): t = 4.6 sec
    • Footprints vary slightly based on the number of nodes

3202 WGS dataset
  • Dataset: 1KGP 30x on GRCh38
    • 3,202 WGS samples
    • 138,044,723 unique variants
    • 19,348,414 multiallelic variants
    • 138 044 723×3202 = ~ 4.42×10¹¹ genotypes
    • Annotations:
      • VEP (impact, biotype, feature type, variant class, consequences), ClinVar, gnomADe + gnomADg 4.1, HGVSp, AlphaMissense Score & Class
      • GENCODE Basic set
      • annotation composition
  • Memory and disk footprints:
    • On disk: 42 GiB
    • RAM footprint (k8s cluster with 8 pods): 63 GiB
    • Footprints vary slightly based on the number of nodes.

76 156 WGS gnomAD
  • Dataset: gnomAD v3.1
    • 76,156 WGS samples
    • ~759×10⁶ unique variants
    • 415.071703278×10⁹ non-reference (hom+het) genotypes, modelled by HWE with 0.005 no-call rate
    • ~57.802404×10¹² total number of genotypes (with homozygous reference and missed genotypes = 76156*759×10⁶)
  • Memory, disk, and startup time footprints:
    • On disk: 312 GiB
    • RAM footprint: 450 GiB
    • Startup time on single node (R630 testbed): t = 260 sec
    • Startup time on N nodes = t / N
    • Footprints vary slightly based on the number of nodes.

Resource Comparison Table

Cohort Size Variants Total Genotypes Disk (GiB) RAM (GiB)
51 WGS 19.7M 1×10⁹ 1.1 1.7
3,202 WGS 138M 4.42×10¹¹ 42 63
76,156 WGS 759M 57.802404×10¹² 312 450

Testbed Hardware

Benchmarks are conducted on the R630 Testbed unless specified otherwise

CPU:  Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
      28 Cores w/ HT (56 Logical Cores)
RAM:  503 GiB