- UW Research - https://www.washington.edu/research -

Genomic Data Sharing

QUICK GUIDE

Researchers that plan to submit genomic and linked phenotypic data to NIH-designated repositories [1] must obtain institutional certification that data submission plans are consistent with NIH policies. The UW IRB is responsible for reviewing researchers’ genomic data sharing plans and consent forms to verify that NIH certification requirements have been met.

Detailed information can be found in this guidance and in the support materials listed below.

Purpose and Applicability

This webpage provides guidance to IRB members, HSD staff, and researchers about the review of: (1) research involving plans for sharing genomic data with NIH-designated repositories; and (2) requests for certification of the data.

Back to Table of Contents [23]

Overall Considerations for Institutional Certification

The institutional certification [24] should state whether the data will be submitted to an unrestricted or controlled-access database.

The institutional certification should assure:

Back to Table of Contents [23]

Consistency with Applicable Laws and Policies

HSD staff identify applicable laws and policies as they would for review of any application (GUIDANCE Human Subjects Regulations [25]; WORKSHEET Pre-Review).

Applicable policies include HSD SOPs and may include UW Privacy Policies.

Applicable laws frequently include HHS human subjects protections regulations (45 CFR 46), FDA human subjects protection regulations (21 CFR Parts 50 and 56), and the Health Insurance Portability and Accountability Act Privacy Rule (45 CFR Part 160 and Part 164, Subparts A and E).

Data submission must also be consistent with applicable tribal laws when the data are from American Indian and Alaska Native peoples. For example, tribal nations have jurisdiction over research conducted on tribal lands with tribal citizens. In general, the IRB relies on the researcher to provide relevant information about tribal laws as requested in the IRB Protocol form.

Back to Table of Contents [23]

Data Collection is Consistent with 45 CFR 46

Data collection procedures must be consistent with HHS human subjects protections regulations.

Back to Table of Contents [23]

Consideration of Risks

Risk Assessment

The IRB considers the risks associated with the genomic information in the event of re-identification and disclosure. The IRB considers ways to minimize those risks within the context of the expected benefits of broad sharing.

The IRB also considers the extent to which genomic information associated with the participant could be used to identify an individual, or their family, by matching data sets to other sources of information.

UW considers the sharing of genomic data through NIH-designated repositories to involve minimal risk provided the criteria listed below are met. It is important to note that sharing of genomic information through NIH repositories that does not meet these criteria is not inherently more than minimal risk.

Risks of Re-identification

Currently, NIH-designated repositories that share genomic data do not meet the definition of human subjects research under HHS regulations at 45 CFR 46 because the data submitted to the repositories are collected solely for other research studies, and because the data are coded and the identity of the individuals from whom the data were obtained will not be readily ascertainable to the investigators maintaining the repository.

NIH notes that this review and certification process goes beyond the requirements of 45 CFR 46. However, NIH has implemented these policy requirements due to concerns that the evolution of genomic technology and analytical methods could increase the risk of re-identification and consequently risks associated with inadvertent or inappropriate use or disclosure.

Technologies available within the public domain today, and expected technological advances, make the identification of specific individuals from their genomic information increasingly straightforward.

The number of DNA markers, such as single nucleotide polymorphism (SNPs), that are needed to uniquely identify an individual is small. Data can be used with high certitude to confirm that two samples come from the same person. Nevertheless, the ease of identifying people from genomic data should not be overstated. This cannot be done without reference data and a high degree of expertise

Examples of populations that may be at a higher risk of re-identification include:

Risks Associated with the Freedom of Information Act (FOIA)

NIH-designated repositories are U.S. government records that are subject to the Freedom of Information Act. NIH is required to release government records unless the records are exempt from release under one of the FOIA exemptions.

NIH believes the release of certain information to be an unreasonable invasion of privacy under FOIA exemption 6, 5 U.S.C. §552 (b)(6). Therefore, NIH foresees preserving the privacy of research participants and the confidentiality of genetic information by, for example, redacting individual-level genotype and phenotype data from any disclosures made in response to FOIA requests and the denial of unredacted requests.

Risks associated with Law Enforcement

Although NIH-repositories hold only coded data, it is conceivable that law enforcement agencies could ask for genomic information from the repositories, and, for example, search for matches to DNA for forensic purposes. Law enforcement might seek to compel disclosure of identifying information from the institution holding the identifying information.

Release of identifiable information may be protected from compelled disclosure if a Certificate of Confidentiality is or was obtained for the original study. See GUIDANCE Certificate of Confidentiality [26].

Potential Harms to Individuals, Family Members, Specific Populations, Groups, and Communities

Harms that result from inappropriate use or disclosure of genomic data may include denial of employment or insurance.

The Genetic Information and Non-discrimination Action of 2008 (GINA) provides a baseline level of protection against genetic discrimination in the United States.

Harms may also include psychosocial harms such as stress, anxiety, stigmatization, or embarrassment resulting from disclosure of information about family relationships, ethnic heritage, or potentially stigmatizing conditions.

Research has shown that some populations demonstrate a higher predisposition to developing certain diseases or disorders than others. Genetic variants associated with physical disorders, diseases, and behavioral traits and causative variants will be found in all populations with differing frequencies. Higher or lower frequencies that contribute to observed health patterns, particularly those that can be viewed negatively, can lead to genetic stereotypes and stigmatization of a population group.

Return of Individual Research Results

Return of individual research results to participants from research using data shared through NIH-repositories is expected to be an extremely rare occurrence. Nonetheless, the return of results must be carefully considered because the information can have a psychological impact (i.e. stress and anxiety) as well as implications for the participant’s health and well-being. While clinically valid and meaningful results can have a positive impact on an individual’s health, harms can occur if un-validated research results are provided back to participants or used for medical decision-making.

Secondary investigators will not be able to return results directly to participants because they will not have access to the identities of these individuals. If a secondary investigator does generate clinically valid results of immediate clinical significance, they can only facilitate their return by contacting the contributing investigator who holds the key (if still maintained) to the code that identifies participants.

When links to identifying information are retained, individual participants should be given the option of choosing or declining to receive results. If participants are given the option of receiving results, researchers should be aware that results may be returned years after they have submitted the study data to NIH.

Back to Table of Contents [23]

De-identification of Data is Consistent with GDS Policy

De-identification Requirements:

Back to Table of Contents [23]

Informed Consent

Consent Requirements and Expectations for Genomic Data Sharing

Use the worksheet Genomic Data Sharing Certification to identify the applicable consent requirements for genomic data sharing and to determine whether the requirements are met. If the consent requirements cannot be met, the data will not qualify for GDS certification.

Studies Involving Minors

If the study involves children, the IRB must consider the appropriateness of the continued maintenance and sharing of the data when the child reaches the legal age of consent.

In particular, it is important to consider whether consent should be obtained from the now-adult subject. When a link to identifiers is maintained, researchers must provide the subject with the opportunity to withdraw data from the NIH-repositories, unless the IRB approves a waiver of the consent requirement for the now-adult subjects. See GUIDANCE Consent Protected and Vulnerable Populations [29] for information about consent waivers.

Studies Involving Consent by Legally Authorized Representative (LAR)

If the study proposes to obtain consent from legally authorized representatives, the IRB must consider the issues related to LAR consent as described in GUIDANCE Consent Diminished or Fluctuating Consent Capacity and Legally Authorized Representative (LAR) [30].

In particular, it is important to consider reconsent of subjects who regain the capacity to consent for themselves. When a link to identifiers is maintained, researchers must obtain consent from the subjects who regain the capacity to consent and provide the subject with the opportunity to withdraw data from the NIH-repositories unless the IRB approves a waiver of the consent requirement.

Back to Table of Contents [23]

Data Use Limitations

Consistency With Informed Consent

Through the Controlled Access process for providing data access to secondary users, mechanisms are in place to minimize the likelihood of usage of genomic data in ways that are inconsistent with the original informed consent. The IRB is expected to: (1) have reviewed all proposed submissions of data to NIH-designated repositories to ensure that the submission and subsequent sharing for research purposes are consistent with the informed consent of the study participants; (2) certify the appropriate research uses of the data; and (3) identify the specific data use limitations.

The IRB accomplishes this by reviewing the terms of the consent form and documenting any limitations to use of the data, as expressed in the consent form, in the Institutional Certification.

For example, if the consent form includes the possibility of data sharing but states that the data will only be used for the study or a particular disease, a disease specific data use limitation should be documented in the Institutional Certification unless subjects are re-consented for broader use of the data.

Four Main Categories of Limitations. (see NIH reference [31])

Modifiers to the Main Categories. The following limitations are modifiers of the four main categories:

Back to Table of Contents [23]

Definitions

This section provides definitions for key Genomic Data Sharing concepts, as described in NIH Policies.

Coded: Any identifying information (such as name) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code) and a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.

Controlled-access: Data are available to an investigator for a specific project only if certain stipulations are met.

dbGaP (database of Genotypes and Phenotypes): A central data repository at the National Center for Biotechnology Information (NCBI), a branch of the National Library of Medicine.

De-identified data: Note that this definition is specific to NIH’s Genomic Data Sharing policy. Data that has been de-identified according to the following criteria: the identifiers of data subjects cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR46.102(f)); the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) are removed; and the submitting institution has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the subject of the data.

Large-scale genomic data: The GDS Policy applies to all NIH-funded research that generates large-scale human or non-human genomic data as well as use of these data for subsequent research. Large-scale data include genome-wide association studies (GWAS), single nucleotide polymorphisms (SNP) arrays, and genome sequence, transcriptomic, metagenomics, epigenomic, and gene expression data. Examples are included below. See Supplemental Information to the NIH Genomic Data Sharing Policy [32] for more examples.

NIH GWAS Data Repository: Also known as the “Database of Genotype and Phenotype (dbGaP)”, the NIH GWAS Data Repository is a database developed by the National Center for Biotechnology Information (a division of the National Library of Medicine) to archive and distribute the results of studies that have been investigated.

NIH-designated repository: Any data repository maintained or supported by NIH either directly or through collaboration.

Unrestricted-access: Data are accessible to anyone via public website (previously referred to as “open access”).

UW IO: A Senior Official at the institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH. The UW Institutional Official who has the authority to provide institutional certification for data sharing under the GWAS and GDS Policies is the Grant and Contract Administrator processing the award.

Back to Table of Contents [23]

Related Materials

GUIDANCE Certificate of Confidentiality [26]
GUIDANCE Human Subjects Regulations [25]
GUIDANCE Consent [33]
SOP Genomic Data Sharing Certification – HSD Staff [HSD staff access only]
SOP Request for Genomic Data Sharing – Investigators [3]
SUPPLEMENT Genomic Data Sharing [2]
WORKSHEET Genomic Data Sharing Certification [4]
WORKSHEETs Pre-Review [HSD staff access only]

Back to Table of Contents [23]

Regulatory References

Back to Table of Contents [23]

Version Information

Open the accordion below for version changes to this guidance.

Version History

Version Number Posted Date Implementation Date Change Notes
1.7 03.28.2024 03.28.2024 Revise to note that when there is no consent, NIH will review requests to use genomic data collected after 1/25/15; retire GDS consent worksheet and roll relevant information into WORKSHEET GDS Certification; update NIH reference hyperlinks
1.6 01.27.2022 01.27.2022 Minor wordsmithing, moderate reorganziation of content, and transfer content from app-based Word document to HTML webpage
1.5 06.24.2021 06.24.2021 Remove gendered terms; update formatting
1.4 01.03.2020 01.03.2020 Removed link to retired document
1.3 12.13.2019 12.13.2019 Updated links
Previous versions 10.08.2021 10.08.2021 For older versions: HSD staff see the SharePoint Document Library; Others – contact hsdinfo@uw.edu

Keywords: Ancillary review; GDS; Results