1000 Genomes Project Dataset
The 1000 Genomes 30x on GRCh38 dataset (sequenced & aligned by the New York Genome Center) is available for real-time querying.
Connection Endpoints:
| Protocol | Non-TLS (Port 80) | TLS (Port 443) |
|---|---|---|
| gRPC | http://db.dnaerys.org |
https://db.dnaerys.org |
| gRPC-web | http://db.dnaerys.org |
https://db.dnaerys.org |
| MCP | http://db.dnaerys.org/mcp |
https://db.dnaerys.org/mcp |
Dataset Composition: The cluster hosts 3,202 samples with 138,044,723 unique variants. This includes 2,504 unrelated samples from the Phase 3 panel and 698 related samples (1,598 males, 1,604 females total).
Data Annotations and Attribution
This dataset was annotated by:
-
Ensembl Variant Effect Predictor (VEP) software, developed by the Ensembl project at EMBL-EBI and the Wellcome Sanger Institute. Annotated with GENCODE Primary set. Data was used as provided.
-
AlphaMissense annotations, developed by Google DeepMind and EMBL-EBI, licensed under the Creative Commons Attribution 4.0 International License. Data was used as provided.
-
gnomAD AF annotations, derived from the Genome Aggregation Database, developed by the gnomAD consortium and the Broad Institute. Data have been used as provided in VEP cache in accordance with their terms of use (CC0 Public Domain Dedication).
-
ClinVar annotations, derived from public archive of interpretations of clinically relevant variants maintained by the National Center for Biotechnology Information (NCBI). The information is publicly available for use without restriction, as provided by the NCBI's data use policy. Data was used as provided.
-
Sequence Ontology variant consequence terms from the Sequence Ontology (SO), which are available under the permissive CC BY 4.0 license. The SO is developed as an open source project for the genomics community. Data was used as provided.
-
Annotation versions:
- VEP="v115.2"
- cache="115_GRCh38" ensembl=115.266b84d ensembl-compara=115.ae48a7a ensembl-funcgen=115.57f7061 ensembl-io=115.25061d3 ensembl-variation=115.b7c2637 1000genomes="phase3" ClinVar="202502" assembly="GRCh38.p14" gencode="GENCODE 49" genebuild="GENCODE49" gnomADe="v4.1" gnomADg="v4.1"
- AlphaMissense thresholds
- 'Likely benign' if score < 0.34, 'Likely pathogenic' if score > 0.564, 'ambiguous' otherwise;
- see doi.org/10.1126/science.adg7492
- VEP="v115.2"
Note
db.dnaerys.org is located Down Under. Please account for geographic latency.
Model Context Protocol
OneKGPd provides real-time access for LLMs to the 1000 Genomes Project Dataset.
- Documentation, Source & Installation: github.com/dnaerys/onekgpd-mcp
- Remote Service (Streamable HTTP):
https://db.dnaerys.org/mcp
gRPC Queries
The following examples utilize gRPCurl. API is defined in dnaerys_1.17.4.proto file which needs to be available locally.
- homozygous & heterozygous variants from TP53 from all samples, limiting response by 10 variants
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"chr":"17", "start":"7661779", "end":"7687546", "hom":"true", "het":"true", "limit":"10", "assembly":"GRCh38"}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectVariantsInRegion
- pathogenic variants in TP53
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"chr":"17", "start":"7661779", "end":"7687546", "hom":"true", "het":"true", "ann": {"clinsgn":"PATHOGENIC"}, "assembly":"GRCh38"}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectVariantsInRegion
- high impact heterozygous variants in transcripts in TP53
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"chr":"17", "start":"7661779", "end":"7687546", "het":"true", "ann": {"feature_type":["TRANSCRIPT"], "impact":["HIGH"]}, "assembly":"GRCh38"}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectVariantsInRegion
- pathogenic heterozygous variants in sample (NA10842) in TP53
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"chr":"17", "start":"7661779", "end":"7687546", "het":"true", "samples":"NA10842", "ann": {"clinsgn":["PATHOGENIC"]}, "assembly":"GRCh38"}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectVariantsInRegionInSamples
- Select samples with pathogenic heterozygous variants in transcripts in TP53 with gnomAD exomes AF < 0.0001
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"chr":"17", "start":"7661779", "end":"7687546", "het":"true", "ann": {"feature_type":"TRANSCRIPT", "clinsgn":"PATHOGENIC", "gnomad_exomes_af_lt":"0.0001"}, "assembly":"GRCh38"}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectSamplesInRegion
- De Novo: All de novo variants in chromosome 1 in a trio classified as likely pathogenic by AlphaMissense
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"parent1":"HG00418", "parent2":"HG00419", "proband":"HG00420", "chr":"1", "start":"1", "end":"248956422", "ann": {"am_class":"AM_LIKELY_PATHOGENIC"}}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectDeNovo
- Homozygous Recessive: All homozygous recessive variants in chromosome 1 in a trio classified as likely pathogenic by AlphaMissense
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"unaffected_parent1":"HG00418", "unaffected_parent2":"HG00419", "affected_child":"HG00420", "chr":"1", "start":"1", "end":"248956422", "ann": {"am_class":"AM_LIKELY_PATHOGENIC"}}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectHomRecessive
- Heterozygous Dominant: All heterozygous dominant variants in chromosome 1 in a trio classified as likely pathogenic by AlphaMissense
grpcurl \
-proto dnaerys_1.17.4.proto \
-d '{"affected_parent":"HG00418", "unaffected_parent":"HG00419", "affected_child":"HG00420", "chr":"1", "start":"1", "end":"248956422", "ann": {"am_class":"AM_LIKELY_PATHOGENIC"}}' \
db.dnaerys.org:443 \
org.dnaerys.cluster.grpc.DnaerysService/SelectHetDominant
References
- 1000 Genomes 30x on GRCh38 Portal
- Byrska-Bishop et al. (2022) Cell 185(18):3412-3432.e24
- 1000 Genomes data reuse statement
- Auton et al. (2015) Nature 526, 68–74
- The International Genome Sample Resource (IGSR) collection of open human genomic variation resources
Terms and Conditions
Disclaimer of Warranties
The Services and data provided on dnaerys.org and db.dnaerys.org (“the Services”) are supplied on an “AS IS” and “AS AVAILABLE” basis without warranties of any kind. We do not warrant that the Services will be uninterrupted, error-free, secure, accurate, or complete. All information is provided for informational purposes only, and while reasonable efforts are made to ensure accuracy, we do not guarantee the correctness, completeness, reliability, or timeliness of any data or content made available through the Services.
Limitation of Liability
To the fullest extent permitted by applicable law, in no event shall Dnaerys Pty Ltd, its directors, employees, partners, agents, suppliers, or affiliates be liable for any direct, indirect, incidental, special, exemplary, consequential, or punitive damages, including without limitation loss of profits, business interruption, loss of data, or other losses, arising out of or in connection with:
- your access to or use of, or inability to access or use, the Services;
- any errors, omissions, inaccuracies, or delays in any data or content;
- any results obtained, decisions made, or actions taken based on the Services;
- any other matter relating to the Services.
You acknowledge that your use of the Services is at your sole risk.
No Medical or Clinical Advice
Data and information provided through the Services do not constitute clinical, medical, diagnostic, therapeutic, or patient-care guidance. Users are responsible for independently verifying scientific results and drawing their own conclusions.
No Professional or Guaranteed Results
No guarantees are made regarding accuracy, completeness, performance, or suitability of the Services for a particular purpose. Users bear full responsibility for any decisions or actions taken based on data obtained through the Services.
Service Interruptions and Maintenance
We reserve the right to modify, suspend, or discontinue the Services, in whole or in part, at any time without notice. We shall not be liable for any modification, suspension, downtime, or discontinuation of the Services.
Third-party data sources
Some data may originate from third-party public datasets. We are not responsible for the accuracy, completeness, availability, or licensing of external data sources. Users are responsible for complying with all applicable third-party terms and attribution requirements.