
A total of 371 articles published between 2013 and 2024 were identified and screened on PubMed, Institute of Electrical and Electronics Engineers (IEEE) Explore, Association for Computing Machinery (ACM), and JSTOR. Articles influenced by the COVID-19 pandemic were published between March 2020 and December 2024. After abstract screening, which involved removal of duplicates and confirmation of article type, a total of 53 full-text articles were reviewed based on this study’s systematic review inclusion and exclusion criteria. After further review of the full text, 41 indexed, peer-reviewed articles were deemed eligible for analysis and synthesis of results (11 excluded for non-health focus), 21 from PubMed, 13 from JSTOR, 2 from ACM, and 1 from IEEE Explore (shown in Fig. 1). Additionally, 4 articles consisting of policy briefs and communiqués from the gray literature were reviewed and came from The University of Arizona Native Nations Institute (U.S.), First Nations Information Governance Center (Canada), and Te Mana Raraunga Māori Data Sovereignty Network (New Zealand), and are discussed in the eSupplement. Article types primarily consisted of original research articles (n = 20, 49%), followed by essays, commentaries, and perspectives (n = 12, 29%), reviews (n = 5, 12%), and policy briefs (n = 4, 10%). Study designs, key findings, and other review-relevant summary information are presented in Table 1.
Nearly half (n = 20) of included articles were original research articles that explored health-related applications of IDS, ranging from public health surveillance to data governance to responsible conduct of CBPR, primarily in North American Indigenous contexts (U.S. and Canada). One article presented the findings of a randomized controlled trial assessing the acceptability and relevance of research training modules adapted for an AI/AN-focused research project among AI/AN community members, and several (n = 8, 20%) performed critical analyses of tribal institutional review board (T-IRB) and AI/AN-focused research review protocols or conducted interviews with regional T-IRB members, community members, and individuals interfacing with AI/AN communities. Regional perspectives from AI/AN individuals, Tribal leaders, academics, and professionals broadly covered the Pacific Northwest, Midwest, and Southwestern regions of the United States, and also made specific recommendations premised on recognition of an AI/AN Tribe or Villages as governmental entities.
Challenges to tribal public health authority and the exercise of IDS during the course of the COVID-19 pandemic were frequently referenced (n = 10, 24%), likely due to increased discussions on which entities should regulate and oversee the collection, management, and access of public health data obtained from Indigenous Peoples. A number of articles focused on the ethical, legal, and social considerations of collecting genomic and genetic data from Indigenous Peoples (n = 11, 27%), and were less focused on considerations for any other types of data generated by or collected from AI/AN communities. Several articles compared and contrasted a data system framework that incorporated IDS principles onto another framework, namely the FAIR and CARE frameworks (defined later in this review). Two articles detailed specific guidelines or principles for safeguarding digital health and online multimedia data generated by Tribal members living away from Tribal lands. Additional sections characterize IDS principles and frameworks, research considerations, and technology use cases.
The articles discussed in this review primarily focus on the proposed collection, access, and use of research and health data from Indigenous communities. Other relevant sub-themes focused on how the misuse of data from Indigenous communities can perpetuate health disparities and conflict with Indigenous cultures and origin narratives.
More than half of the included articles (n = 23) articulated a theoretical framework or detailed a set of cultural and practice guidelines for using Indigenous data. Several (n = 5) with direct relevance to the management of digital data from AI/AN communities are presented in Table 2. These purposefully operationalize IDS in different global contexts.
The CARE Principles for Indigenous Data Governance, designed and validated by a group of AI/AN and Indigenous researchers, are meant to encourage the responsible use of big data and associated technologies. Key considerations under the first CARE principle (Collective Benefit) include focuses on inclusive technology development and innovation, improved data governance and user engagement, and equitable outcomes. Frequent engagement with and iterative feedback from end-users of data-driven technologies can result in updates and adjustments that maximize benefit and return on investment of Indigenous data. Key considerations under the second CARE principle (Authority to Control) focus on the recognition of Indigenous rights and interests, collecting data that furthers Indigenous self-governance, and prioritizing Indigenous governance of data, especially when data are housed at external academic centers and biorepositories. Considerations under the third CARE principle (Responsibility) focus on positive relationships, promoting employment opportunities and research capacity, and respect for Indigenous languages and worldviews. The fourth CARE principle (Ethics) includes a focus on minimizing harm and maximizing benefit, justice, and restrictions on secondary use of data.
The SEEDS Principles are focused on population health data linkages and complement the CARE framework’s sociopolitical implications of contrasting “governance of data” (a set of practices) with “data for governance,” which focuses on data-driven policymaking. The first two principles (Self-Determination and Exercise Sovereignty) acknowledge the fundamental rights of AI/AN and Indigenous communities to oversee the conduct of ethical Indigenous health research (third principle), implement robust data governance mechanisms (fourth principle), and production of meaningful research findings that support reconciliation (fifth principle) with non-Indigenous societies. By linking together Indigenous and non-Indigenous data systems, a fuller picture of Indigenous health outcomes can be obtained. This information can then be used by policymakers and Indigenous rights advocates to secure resources and financial support.
The OCAP Principles were developed by the First Nations Information Governance Center in Canada and are specific to North American Indigenous contexts. The principles of Ownership, Control, Access, and Possession have been implemented into policies and procedures concerning the release of provincial data to First Nations communities and researchers. This prior implementation of the OCAP principles was paramount to a robust response to the COVID-19 pandemic among First Nations communities and medical personnel.
The framework “Data Refusal from Below” explores the practice of refusal in a North American Indigenous context through considerations of autonomy, time, power, and cost. Community refusal is distinct from non-consent and functions to increase research utility and relevance to Indigenous knowledge in contemporary society. Refusal is also a form of resistance against policies and research practices that do not honor the authority and data sovereignty of Indigenous communities. In the case of digital technology systems, researchers and Indigenous leaders may want to consider how the constructs in these frameworks map to specific data structures, protocols, and algorithms for the purposes of IDS and responsible data management.
The final framework is presented through a data and computer science perspective and the New Zealand Indigenous context. In Bowen and Hinze the Te Mana o te Raraguna Framework is mapped onto the implementation of the Hakituri project, which incorporates data from occupational work biosensors, environmental hazard monitoring systems, and feedback systems routed to supervisory parties at worksites. This Framework details measurable concepts and characteristics that can be qualitatively and quantitatively assessed as being met at the levels of low, medium, and high by researchers and community members. Data users are assessed for their level of relationship, expertise, authority, and responsibility specific to the aims of the implementation, and data use cases are assessed for their level of value, trust, originality, and nature of application. This Māori framework promotes IDS by facilitating participatory data design wherein user-identified needs and challenges form the basis for the development of technology-driven solutions, rather than trying to use a one-size-fits-all design irrespective of local contexts.
Together, all of these frameworks provide examples of IDS being operationalized in AI/AN and global Indigenous contexts and the potential benefits that can be derived from external partnerships.
Researchers believe that data originating from or generated by tribal communities, especially genomic data, can provide unique insights into the evolution of humanity. However, many in the scientific community believe that such data has no value until researchers spend significant time and resources to transform the data into something that is considered “valuable” by researchers. Given that there are many types of AI/AN data that may interest researchers, ranging from the anthropological to biological, it is important to specify the tribal governmental entities and community-based groups that can regulate or co-manage the collection of AI/AN data by external entities and outside researchers on and off tribal lands and scope of IDS (see Table 3).
Notably, data management challenges often arise when non-AI/AN entities (e.g., local and state governments) collect data from AI/AN communities, especially during times of public health crisis, and refuse to share that data with tribal public health authorities. Tribal governments and tribal epidemiology centers have the same public health authority as local and state governments, as designated by Congress, but these groups do not readily share primary source data with tribal authorities. In the midst of COVID-19, AI/AN morbidity and mortality data were misclassified as other, multi-racial, or not reported at all due to the low proportion of AI/AN COVID-19 cases relative to other racial and ethnic groups. It was estimated that just more than half of U.S. states (n = 26) were able to report COVID-19 mortality rates for AI/AN communities, despite there being 574 federally-recognized tribes in 37 states and disparate COVID-19 mortality and morbidity rates compared to the non-Hispanic White population.
Other considerations include collecting AI/AN research data electronically and also via paper forms, an important consideration given long-standing challenges in accessing reliable Internet coverage among AI/AN communities and limited sharing of data between local, state, tribal, and federal data systems. However, attention should be given to the accuracy of paper-based responses transcribed into electronic research databases because of the risk of human error. While paper-based data collection may seem outdated, it allows AI/AN research participants to limit the collection of metadata that can be used for other purposes not related to primary data collection and analysis, and promotes greater participation of rural and urban-underserved AI/AN communities in human subjects research.
Data collected from an AI/AN community may be stored at an academic institution, managing entity (e.g., biorepository at a non-profit institution), or by tribal health authorities. Regardless, it is important that all data maintain relationality with the Indigenous context they came from, as researchers may seek to deconstruct or parse data for the purposes of analysis, but such a reductivist approach may conflict with Indigenous knowledge systems. Indigenous researchers have proposed alternative approaches, such as “weaving together” Indigenous and Western methods of inquiry and knowledge generation to strengthen research findings and make them relevant to AI/AN communities. Notably, even data analyses and interpretations are subject to AI/AN data management protections because there is potential for misuse of data and biospecimens. In human genomics and genetics research, investigators have been criticized by AI/AN scientists and communities for using “destructive” analytical tools (e.g., chemical solvents, drilling) that physically damage ancestral remains and objects of cultural significance. Data digitalization can minimize the need for destructive analyses, so long as data management plans are in place that ensure ownership after the conclusion of research.
Tribal interests and priorities largely shape permitted research involving AI/AN individuals, communities, and entities. Mismanagement of AI/AN research data has led to AI/AN governments barring specific types of research (e.g., 2002 Navajo Nation moratorium on genetic research studies) and prioritizing research that is tied to real issues faced by AI/AN communities, rather than research for the sake of research. This system maximizes the potential benefit of the research for AI/AN communities and promotes cultural safety by prohibiting research that can challenge the validity of AI/AN cultural knowledge. Of note, these considerations vary from community to community, with some allowing for investigation into their genetic origins, while others may only permit genetics research to address present-day health disparities.
Indigenous researchers have importantly characterized restrictions on permitted research and non-participation in external (non-AI/AN-based) research parties as a matter of refusal. This concept challenges external researchers to better engage with AI/AN communities and identify research questions that are tied to identified community needs, rather than institutional priorities or requirements of a funding agency. The affirmative exercise of refusal at all stages of the research process also has relevance in an increasingly digital world, wherein the iterative or step-wise design of machine learning algorithms and artificial intelligence can benefit from operationalizing IDS and purposefully incorporating community feedback.
Tribal governments have used numerous mechanisms to protect and exert ownership over their health and research data, including the application of intellectual and cultural property claims and use of relevant state and federal laws, such as the Native American Graves Protection and Repatriation Act in the United States, when improper research practices have been identified by AI/AN communities. With IDS, the focus is not solely on individual ownership of data, but also community ownership of data, because a single AI/AN or Indigenous individual does not necessarily have the authority to represent the interests and priorities of their community. This principle is most applicable to tribal members living on tribal lands because T-IRBs have authority over all research conducted on reservations. However, these legal protections do not prevent or discourage tribal members from participating in research conducted off tribal lands. However, tribal governance and exercise of IDS over academic and related research conducted outside of the boundaries of tribal lands is limited and may not be respected by the parties overseeing the research, which can perpetuate AI/AN health disparities.
There are outstanding questions of data ownership pertaining to non-recognized AI/AN governmental entities, Native Hawaiian or Pacific Islander governments (e.g., entities with no federal recognition or a process to do so), or those in the active process of seeking recognition in the US. While these groups are unable to use federal statutes and regulations spacific to federally-recognized AI/AN governmental entities, the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP) articulates inherent rights for Indigenous entities that should ultimately be respected by signatories of this UN declaration, such as the US. The UNDRIP may also help non-recognized AI/AN governments and urban AI/AN communities secure similar research protections as given to federally-recognized AI/AN governments, even if these guidelines and principles are not enshrined in state and federal research statutes and regulations.
When data are available from AI/AN communities, unique identifiers are often removed (e.g., tribal affiliation, health facilities) to protect community. Other practices in furtherance of IDS goals may include removing discussion of sensitive cultural practices and spiritual knowledge incidentally collected on audio recordings, limiting access to primary data to specified team members, and specifying policies and procedures for the return of data and dissemination of findings to the community for policymaking and health promotion to protect Indigenous knowledge from being exploited. These considerations are typically part of a larger data management plan that is agreed upon by all parties at the outset of the proposed research and can be modified by a T-IRB. Use of data should also be limited to individuals who have completed training on responsible conduct of research and any other applicable tribal requirements, including cultural sensitivity training and certificates of confidentiality regarding AI/AN cultural practices. Training modules from the Collaborative IRB Training Initiative are frequently completed by persons involved in biomedical, behavioral, and social sciences research, but these modules are not readily adapted for research studies involving AI/AN and other Indigenous communities.
Tribal data-sharing and data-use agreements are typically signed between academic institutions, researchers, T-IRBs, and, in some cases, the Indian Health Service. These agreements specify how research findings are to be disseminated (e.g., conferences, publication) and often require internal review of manuscripts and abstracts before submission. Federal law and tribal law provide authority to T-IRBs and governmental agencies to request manuscript language changes, non-disclosure of sensitive data analyses, and removal of information that could identify an AI/AN community engaged in research. Other actions include limiting access to primary datasets, return of all data, culturally-appropriate destruction of biospecimens (e.g., hair, blood), and presentation of research findings to the community to facilitate knowledge exchange and capacity-building.
Federally-funded research studies may require data submission to open-environment biorepositories that lack the robust protections and governance structures used by AI/AN research authorities. The pooling of data from numerous AI/AN communities has the risk of generalizing research findings from one community to another or an unrelated community on the basis of race alone, rather than on genetics and other relevant factors. Further, the federally-appointed membership of the committees that oversee data requests from these repositories usually does not have expertise in AI/AN-focused data management. This creates tension between non-AI/AN-engaged researchers who favor “open” research environments and AI/AN communities who favor “closed” research environments or “use-by-consent” to safeguard against misuse of data that may perpetuate health disparities and stereotyping of AI/AN lifestyles and practices through a Western lens.
Engaging Tribal leaders throughout the entire research process, especially in subsequent policy and program development, remains a challenge. The implementation of the U.S. National Institutes of Health All of Us Research Program was informed by a Tribal Advisory Committee (TAC) that was hastily formed after grantees began regional recruitment in areas with large AI/AN populations without first consulting Tribal governments. Policies resulting from the work of the TAC included: (1) a pause on active recruitment from AI/AN populations until consultations could be conducted nationwide; (2) no specification of a participants’ AI/AN tribal affiliation unless the tribal government has an agreement with All of Us; and (3) barring nationwide access to AI/AN participant data until clear mechanisms for AI/AN data governance could be developed. Some tribal leaders responded to the All of Us Research Program by supporting the creation of the Native BioData Consortium on the lands of the Cheyenne River Sioux Nation. The Consortium is the first Indigenous-led U.S. biorepository working to ensure that emerging technologies and large-scale health research studies, like the All of Us Research Program, benefit all AI/AN communities.
The primary purpose of IDS is to use lessons learned from past misuses of research data to protect communities from exploitation and safeguard resources. Open data environments do not readily use mechanisms that protect the rights and interests of Tribal governments, which has encouraged organizations like the IEEE to identify practices for the provenance and responsible use of digital Indigenous data. Some argue that open data has a dual benefit of reducing the need to create new data and the need to conduct destructive analyses, but AI/AN leaders and communities have concerns about the longevity of these data. These discussions warrant considering sociocultural reasons for data use and whether these needs outweigh the inherent rights of AI/AN communities to limit access to their data.
The movement for open data and inherent longevity of clinical, genetic, and digital data raise questions about informed consent and the “politics of reuse” across the entire research timeline, especially in Indigenous communities. Tribes can permit varying types of consent, such as consent specific to a single research study or broad consent, which is specific to data reuse across multiple research studies, either with or without requirements for additional T-IRB review and participant consent. Broad consent conflicts with IDS because it minimizes the involvement of T-IRB, and it may increase the risk of harm and loss of data to third parties with interests to co-opt and commercially exploit AI/AN health data. In the U.S., these concerns were raised during the peak of the COVID-19 pandemic when AI/AN communities participated in novel vaccine development and relied on third-party COVID-19 antigen testing services without time to review data use agreements.
Increasing recognition of the importance of IDS has led to the design of technologies and data systems that respect its principles and the real-world application of IDS. Joffrion and Fernández present a case study with the Penobscot Nation in the U.S. state of Maine, highlighting how the CARE Framework was used to collect, store, and disseminate data from geographic information system (GIS) forestry analyses, monitoring environmental health, and vegetation density. Bowen and Hinze detail IDS design considerations and data flows for the Māori Hakituri Project, which collects streaming personal sensor data from forestry workers (e.g., physical activity) and contextual data from the work environment (e.g., environmental sensors) on a closed-network system. These data are then processed and routed back to workers and managers in the form of real-time feedback, such as hazard alerts and fatigue warnings. In this case, research data management plans and data use agreements were used to operationalize IDS and delineate varying levels of data access. Notably, there were no discussions about the use of these data for internal research and development by the manufacturers of these technologies as specified in the relevant Terms and Conditions of Service and Privacy policies, which may warrant future investigation and action by tribal authorities.
Before the COVID-19 pandemic, many proposed applications of IDS focused on genetics and genomics research, including biobanking operations and concerns regarding digital data longevity and informed consent. In an effort to bolster tribal precision medicine infrastructure, the Native BioData Consortium (as described in Mackey et al.) is working to incorporate privacy preserving data governance protocols into the operation of their genomic biorepository, after concerns were raised with regard to federal handling of AI/AN genetic data by the U.S. National Institutes of Health and Mayo Clinic. A Community Advisory Board comprised of individuals from the U.S. Northern Plains Region provides input on data access requests, potential benefits sharing, and handling of biospecimens. Other IDS discussions have explored the potential benefits and harms of training machine learning algorithms on data attributable to AI/AN communities in open-data environments. Radin highlights how the Pima Indians Diabetes Dataset has helped researchers and AI/AN communities model and identify obesity and diabetes trends among tribal members in the Southwestern U.S., then highlights a concerning use of the same dataset by non-AI/AN data scientists to train algorithms in New York City to predict the likelihood of catastrophic sewer explosions and manhole fires. This is an example of how AI/AN data can be transformed and used in a way that the original context is lost, and no benefit derived from use of the data can return to the community.
The literature identified key considerations for the responsible collection, management, and use of research, health, and digital data generated by Indigenous Peoples. It appears that new technologies and systems, such as machine learning algorithms and artificial intelligence, are being rapidly designed without explicit rules and regulations that operationalize IDS and protect Indigenous Peoples from exploitation and cultural harm in the era of big data. Emerging digital health technologies have clear potential to benefit communities, but only when they are used in a manner that respects the communities that provide their foundational data.

