How to Request Datasets from dbGaP
The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. The advent of high-throughput, cost-effective methods for genotyping and sequencing has provided powerful tools that allow for the generation of the massive amount of genotypic data required to make these analyses possible.
dbGaP provides two levels of access – open and controlled – to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information.
See the NIH Genomic Data Sharing (GDS) Policy NOT-OD-14-124 which sets forth NIH’s expectations for the broad and responsible sharing of large-scale human and non-human genomic data. In addition, please refer to NOT-OD-24-157, Implementation Update for Data Management and Access Practices Under the Genomic Data Sharing Policy, which includes additional IT security requirements.
Requesting datasets from dbGaP includes steps within the eRA Commons as well as within SAGE.
Steps to Request Datasets from dbGaP
- Are you qualified and ready to do so?
- Must be an employee of the University of Washington.
- Appropriate system credentials in eRA Commons. If you need access, see Commons Roles at the UW.
- When requesting access to controlled-access genomic data, you must have:
- the requisite IT systems or a license to UW’s third-party computing infrastructure compliant with NIST SP 800-171.
- See information from UW IT on Computing for Restricted Access Data.
- Authorized IT Director confirmation that the IT environment meets NIH Security Best Practices for Users of Controlled-Access Data.
- A System Security Plan (SSP) approved by the IT Director.
- Assurance signed by Approved User that the NIH Security Best Practices can be met.
- the requisite IT systems or a license to UW’s third-party computing infrastructure compliant with NIST SP 800-171.
- Review NIH How to Request and Access Datasets from dbGaP
- If you will request controlled-access genomic data, reach out to an authorized IT Director for consultation prior to completing your Data Access Request (DAR).
- Start your Data Access Request (DAR).
- Choose datasets you wish to access.
- Some datasets require IRB approval. See the Human Subjects Division guidance on obtaining IRB approval.
- Select the Signing Official: Select the authorized official. Your OSP reviewer will update the Signing Official to themselves after they receive the accompanying SAGE request. See steps to Prepare your Request in SAGE to OSP.
- In the DAR, list the authorized IT Director who has firsthand knowledge of the IT environment you intend to use. This is the same person who signs the IT Director Confirmation.
- If using a Cloud Computing IT Environment (UW Government Community Cloud or UW GCC), upload the UW Cloud Computing IT Environment Statement into the DAR.
- Read the attestation language.
- Add other necessary attachments required by NIH, such as IRB Approval.
- Read and agree to the terms and conditions as the “Approved User”:
- Investigators and their institutions are responsible for safeguarding the accessed datasets. Pay close attention to the Data Use Certification (DUC) being made by you as an Approved User.
- Choose datasets you wish to access.
- Review and approve the Data Access request so it begins routing to the Signing Official.
- Download a copy of the DAR, then proceed with next steps to prepare your SAGE request to OSP.
Prepare your SAGE Request to OSP
There are two scenarios:
- Is the DAR associated with an existing sponsored program? Route an OSP & GCA Modification Request (MOD) in SAGE.
- Is the DAR not associated with a specific sponsored program? Route a Non-award Agreement (NAA) eGC1.
- Gather these items and attach to your Award Modification or eGC1:
- Copy of the DAR.
- If you are requesting access to controlled-access genomic data, a copy of a signed confirmation from the IT Director named in your DAR, that you have an SSP in place and access to a compliant IT Environment.
- Do NOT attach the SSP itself.
- Assurance signed by Approved User that the NIH Security Best Practices can be met.
- If the dataset you wish to access requires IRB approval, a copy of the IRB approval.
- OSP will review the Award Modification Request (MOD) or NAA eGC1 together with the DAR in eRA Commons.
- Check status on “My Requests” page in eRA Commons.
Signing Official (OSP) Review
- DAR is complete.
- An authorized IT Director is identified.
- For controlled-access genomic data, a corresponding signed confirmation statement from IT Director is attached in SAGE to the NAA eGC1 or Award Modification, identifying the IT Environment.
- Assurance statement signed by the Approved User is attached to SAGE item.
- If the IT Environment used is “GCC High”, that PI has uploaded a Cloud Computing Statement in the DAR.
- IRB approval, if needed, is attached to the DAR, and corresponds to the study in question.