
dARK - Decentralized Archival Resource Key
A decentralized implementation of the ARK persistent identifier
-
What is dARK
dARK is a decentralized implementation of Archival Resource Key (ARK) that assigns and resolves ARK identifiers through institutional blockchain nodes. It operates on a "public good" network where data ownership, storage, and control are distributed among all participating organizations.
The initial project was primarily hosted and funded by the Brazilian Institute of Information in Science and Technology (IBICT), with additional support from LA Referencia, made possible thanks to the backing of the Global Sustainability Coalition for Open Science Services (SCOSS) pledges.
Decentralization
An initial implementation of decentralized ARK based on a lightweight private blockchain network
Fault Tolerance
Fault-tolerant decentralized attribution and resolution of ARK identifiers through a distributed network
Integration
An aggregator-level ARK attribution system for legacy research production in the Brazilian open science ecosystem (OasisBr, IBICT Brazil)
Motivations
ARK Persistent Identifier
The ARK identifier has emerged as a viable, low-cost alternative due to the possibility of implementing local providers for the global resolver. Its use facilitates long-term access and preservation of digital resources, ensuring stable and reliable links.
Research Assessment
Persistent identifiers are essential for building more robust research graphs, generating accurate indicators, and improving the evaluation of scientific output. Their ability to link various information objects enhances the analysis and understanding of research impact.
Challenges in the Global South
In Global South countries, the lack of persistent identifier coverage is a common issue. This is primarily due to the costs associated with these services, limiting access to essential infrastructures for ensuring the visibility and preservation of research outputs.
Need for Decentralization
Currently, most persistent identifier systems operate under centralized models, relying on a few agencies to maintain service infrastructure. A decentralized approach, such as the one proposed by dARK, reduces this dependency and increases system resilience, promoting greater equity in access and management of identifiers.
Long-term Objectives
Open Infrastructure
Providing an open and non-centralized system for unique/deduplicated persistent identifiers accessible to all
Resolution Services
Offering a decentralized resolution service for the Open Science ecosystem, interoperable with other PID services (such as DOI agencies)
Metadata Preservation
Ensuring the decentralized preservation of metadata associated with digital objects referenced by ARK identifiers, aiming to provide consistent PIDs and metadata to research graphs (OpenAIRE, OpenAlex, among others)
Important Note
This development is not intended to replace or compete with DOI identifiers/agencies, but to serve as a complementary solution that will also be interoperable with DOI providers.
-
Architecture and Components
The dARK system architecture is designed with a clear separation of components, organized into the Service Layer and the Core Layer.
Service Layer
The Service Layer provides essential services that interface with the Core Layer components. These services include:
dARK Resolver
Integrated with the global nt2.info resolver system, enabling persistent identifier resolution
dARK Dashboard
Provides monitoring and administrative capabilities for the platform
dARK API
Facilitates communication between applications and the underlying blockchain
dARK LA Referencia
Implements bulk dARK minting on LA Referencia Harvester Plasform
These services are supported by load balancing mechanisms to ensure high availability and optimal system performance.
Core Layer (dARK dApp)
The Core Layer is built on a permissioned blockchain network that forms the backbone of the dARK system. At its heart is a public permissioned network operating on a Proof of Authority (PoA) consensus mechanism, providing both security and efficiency for PID management.
dARK dApp
Core decentralized application that implements the PID management smart contracts and ensures data integrity through blockchain technology
Blockchain Foundation
The network leverages Hyperledger Besu technology to provide a secure and efficient blockchain foundation. Hyperledger Besu is an Ethereum client designed for enterprise use that supports both public and private permissioned network deployments. Its implementation of the Ethereum Virtual Machine (EVM) allows for sophisticated smart contracts that manage PID operations with full transparency and auditability.
Network Architecture
Designed with resilience and reliability as core principles, the architecture begins with a Minimum Viable dARK Network (MVDN). This network consists of essential blockchain nodes that provide the fundamental functionality required for system operation. These nodes manage RPC/API communications and maintain the distributed ledger of persistent identifiers. Each full node implements API endpoints for external service interaction through load balancing.
To guarantee continuous operation even during node failures, the architecture incorporates fault-tolerant redundancy through backup nodes and data replication systems. This distributed approach ensures that no single point of failure can compromise the integrity or availability of the PID infrastructure.
Application Layer
At the application layer, the dARK dApp delivers the central functionality for managing persistent identifiers through smart contracts. This application logic handles the creation, updating, and resolution of PIDs while enforcing governance rules defined by the network participants.
Federated Infrastructure
The architecture supports multiple independent blockchain networks operated by different authorities, creating a truly federated PID infrastructure.
Scalable Design
The system can scale horizontally by adding more nodes to the network, ensuring high performance even with increasing numbers of PIDs.
Future Extensions
The modular design enables future incorporation of additional storage solutions like IPFS for larger metadata payloads, while maintaining data integrity through on-chain cryptographic verification.
Ecosystem Integration
The dARK system is designed to integrate seamlessly with the existing scholarly ecosystem, particularly with repository networks, diammond journals and metadata aggregators, following this initial workflow:
1Metadata Harvesting
Aggregators regularly harvest metadata from institutional repositories, journals, and other content providers through standard protocols like OAI-PMH or custom APIs.
2PID Assignment
For content without persistent identifiers, the aggregator can request new ARKs through the dARK Minter API. For existing ARKs, they are validated and registered in the dARK system.
3Blockchain Registration
The dARK system records each ARK on the blockchain, along with its target URL and essential metadata, providing a decentralized, tamper-evident registry of identifiers.
4PID Distribution
The newly minted or validated ARKs can be sent back to repositories for inclusion in their metadata records, enabling a standardized approach to persistent identification across the network.
5Resolution
When a user accesses an ARK, the global resolver redirects to the dARK resolver, which uses the blockchain to retrieve the current location information, ensuring persistent access even when resource locations change.
This integration approach enables metadata aggregators like LA Referencia to enhance their services with decentralized PID infrastructure while preserving existing workflows and adding value to the repository network as a whole. It also allows for seamless transitions when repositories move content or change platforms, as the PID resolution system can be updated without breaking external links.
Future Development
In the next development phases, the dARK project plans to:
- Transform this initial project (currently working in IBICT/Brazil) into a comprehensive regional service designed as a public infrastructure, following the principles established by LA Referencia
- Develop plugins for the most widely used repository and journal systems to facilitate seamless integration with the dARK infrastructure
- Implement decentralized metadata persistence to preserve bibliographic information and serve as a reliable data source for analytical systems like OpenAlex
These enhancements will further strengthen the dARK ecosystem and expand its utility within the scholarly communication landscape across Latin America and beyond.