dARK - Decentralized Archival Resource Key

A decentralized implementation of the ARK persistent identifier

  • What is dARK

    dARK is a decentralized implementation of Archival Resource Key (ARK) that assigns and resolves ARK identifiers through institutional blockchain nodes. It operates on a "public good" network where data ownership, storage, and control are distributed among all participating organizations.

    The initial project was primarily hosted and funded by the Brazilian Institute of Information in Science and Technology (IBICT), with additional support from LA Referencia, made possible thanks to the backing of the Global Sustainability Coalition for Open Science Services (SCOSS) pledges.

    Decentralization

    An initial implementation of decentralized ARK based on a lightweight private blockchain network

    Fault Tolerance

    Fault-tolerant decentralized attribution and resolution of ARK identifiers through a distributed network

    Integration

    An aggregator-level ARK attribution system for legacy research production in the Brazilian open science ecosystem (OasisBr, IBICT Brazil)

    Motivations

    ARK Persistent Identifier

    The ARK identifier has emerged as a viable, low-cost alternative due to the possibility of implementing local providers for the global resolver. Its use facilitates long-term access and preservation of digital resources, ensuring stable and reliable links.

    Research Assessment

    Persistent identifiers are essential for building more robust research graphs, generating accurate indicators, and improving the evaluation of scientific output. Their ability to link various information objects enhances the analysis and understanding of research impact.

    Challenges in the Global South

    In Global South countries, the lack of persistent identifier coverage is a common issue. This is primarily due to the costs associated with these services, limiting access to essential infrastructures for ensuring the visibility and preservation of research outputs.

    Need for Decentralization

    Currently, most persistent identifier systems operate under centralized models, relying on a few agencies to maintain service infrastructure. A decentralized approach, such as the one proposed by dARK, reduces this dependency and increases system resilience, promoting greater equity in access and management of identifiers.

    Long-term Objectives

    Open Infrastructure

    Providing an open and non-centralized system for unique/deduplicated persistent identifiers accessible to all

    Resolution Services

    Offering a decentralized resolution service for the Open Science ecosystem, interoperable with other PID services (such as DOI agencies)

    Metadata Preservation

    Ensuring the decentralized preservation of metadata associated with digital objects referenced by ARK identifiers, aiming to provide consistent PIDs and metadata to research graphs (OpenAIRE, OpenAlex, among others)

    Important Note

    This development is not intended to replace or compete with DOI identifiers/agencies, but to serve as a complementary solution that will also be interoperable with DOI providers.

    more
  • Architecture and Components

    The dARK system architecture is designed with a clear separation of components, organized into the Service Layer and the Core Layer.

    dARK Architecture Diagram

    Service Layer

    The Service Layer provides essential services that interface with the Core Layer components. These services include:

    dARK Resolver

    Integrated with the global nt2.info resolver system, enabling persistent identifier resolution

    dARK Minter

    Used to create and register new PIDs in the system

    dARK Dashboard

    Provides monitoring and administrative capabilities for the platform

    dARK API

    Facilitates communication between applications and the underlying blockchain

    dARK Backup

    Ensures data durability and system reliability

    dARK LA Referencia

    Implements bulk dARK minting on LA Referencia Harvester Plasform

    These services are supported by load balancing mechanisms to ensure high availability and optimal system performance.

    Core Layer (dARK dApp)

    The Core Layer is built on a permissioned blockchain network that forms the backbone of the dARK system. At its heart is a public permissioned network operating on a Proof of Authority (PoA) consensus mechanism, providing both security and efficiency for PID management.

    dARK dApp

    Core decentralized application that implements the PID management smart contracts and ensures data integrity through blockchain technology

    Blockchain Foundation

    The network leverages Hyperledger Besu technology to provide a secure and efficient blockchain foundation. Hyperledger Besu is an Ethereum client designed for enterprise use that supports both public and private permissioned network deployments. Its implementation of the Ethereum Virtual Machine (EVM) allows for sophisticated smart contracts that manage PID operations with full transparency and auditability.

    Network Architecture

    Designed with resilience and reliability as core principles, the architecture begins with a Minimum Viable dARK Network (MVDN). This network consists of essential blockchain nodes that provide the fundamental functionality required for system operation. These nodes manage RPC/API communications and maintain the distributed ledger of persistent identifiers. Each full node implements API endpoints for external service interaction through load balancing.

    To guarantee continuous operation even during node failures, the architecture incorporates fault-tolerant redundancy through backup nodes and data replication systems. This distributed approach ensures that no single point of failure can compromise the integrity or availability of the PID infrastructure.

    Application Layer

    At the application layer, the dARK dApp delivers the central functionality for managing persistent identifiers through smart contracts. This application logic handles the creation, updating, and resolution of PIDs while enforcing governance rules defined by the network participants.

    Federated Infrastructure

    The architecture supports multiple independent blockchain networks operated by different authorities, creating a truly federated PID infrastructure.

    Scalable Design

    The system can scale horizontally by adding more nodes to the network, ensuring high performance even with increasing numbers of PIDs.

    Future Extensions

    The modular design enables future incorporation of additional storage solutions like IPFS for larger metadata payloads, while maintaining data integrity through on-chain cryptographic verification.

    Ecosystem Integration

    The dARK system is designed to integrate seamlessly with the existing scholarly ecosystem, particularly with repository networks, diammond journals and metadata aggregators, following this initial workflow:

    1

    Metadata Harvesting

    Aggregators regularly harvest metadata from institutional repositories, journals, and other content providers through standard protocols like OAI-PMH or custom APIs.

    2

    PID Assignment

    For content without persistent identifiers, the aggregator can request new ARKs through the dARK Minter API. For existing ARKs, they are validated and registered in the dARK system.

    3

    Blockchain Registration

    The dARK system records each ARK on the blockchain, along with its target URL and essential metadata, providing a decentralized, tamper-evident registry of identifiers.

    4

    PID Distribution

    The newly minted or validated ARKs can be sent back to repositories for inclusion in their metadata records, enabling a standardized approach to persistent identification across the network.

    5

    Resolution

    When a user accesses an ARK, the global resolver redirects to the dARK resolver, which uses the blockchain to retrieve the current location information, ensuring persistent access even when resource locations change.

    This integration approach enables metadata aggregators like LA Referencia to enhance their services with decentralized PID infrastructure while preserving existing workflows and adding value to the repository network as a whole. It also allows for seamless transitions when repositories move content or change platforms, as the PID resolution system can be updated without breaking external links.

    Future Development

    In the next development phases, the dARK project plans to:

    • Transform this initial project (currently working in IBICT/Brazil) into a comprehensive regional service designed as a public infrastructure, following the principles established by LA Referencia
    • Develop plugins for the most widely used repository and journal systems to facilitate seamless integration with the dARK infrastructure
    • Implement decentralized metadata persistence to preserve bibliographic information and serve as a reliable data source for analytical systems like OpenAlex

    These enhancements will further strengthen the dARK ecosystem and expand its utility within the scholarly communication landscape across Latin America and beyond.

    more