Keep Calm and Shard On

SuperMassive is an open-source licensed under BSD-3, high-performance distributed key-value database designed for unlimited scale without compromise. It combines lightning-fast in-memory operations with robust reliability through intelligent sharding and parallel processing across unlimited nodes. The system features automatic self-healing, recovery, and synchronization capabilities.

SuperMassive's elegant timestamp-based consistency model enables effortless horizontal scaling, delivering exceptional throughput while maintaining zero-downtime resilience - simply add nodes to expand your capacity.

SuperMassive is the designed as a solution for mission-critical applications that demand a key-value database with speed, reliability, and scalability.

Features

Horizontal Scaling

Add nodes on demand to increase capacity. The cluster can refresh configurations and connect to new additions automatically.

Smart Sharding

Intelligent data distribution using sequence-based round-robin approach ensures balanced write operations across primary nodes.

High Availability

Automatic failover and self-healing capabilities ensure continuous operation even when nodes fail.

Data Consistency

Timestamp-based version control with automatic conflict resolution maintains data consistency across the cluster.

Parallel Operations

Read operations are done in parallel across primaries and (if need be) replicas. Write operations are done in sequence.

Multiplatform

SuperMassive is available for Linux, macOS, and Windows. It can be compiled from source or downloaded as pre-built binaries.

Quick Start Guide

To get SuperMassive for your system you can download the binaries on the release page on github. https://github.com/supermassivedb/supermassive/releases

1. Basic Setup

When starting a cluster, primary or replica with no .cluster .node .nodereplica new ones with defaults will be created on start up.

A shared key is always required for each instance type.

A cluster requires a username and password to be started. This can then be used to access the cluster through a client.

Terminal
# Start a cluster node
./supermassive --instance-type=cluster --username admin --password secure123 --shared-key cluster_key

# In another terminal, start a node
./supermassive --instance-type=node --shared-key cluster_key

# Start a replica, the replica should live in a different directory if doing this locally.
./supermassive --instance-type=nodereplica --shared-key cluster_key

2. Basic Operations

Client Connection Example
# Connect and authenticate
(echo -n "AUTH " && echo -n $"admin\\0secure123" | base64 && cat) | nc -C localhost 4000
OK authenticated

# Write data
PUT user:1001 '{"name": "John Doe", "email": "john@example.com"}'
OK key-value written

# Read data
GET user:1001
user:1001 {"name": "John Doe", "email": "john@example.com"}

# Delete data
DEL user:1001
OK key-value deleted
Note: The default cluster configuration uses port 4000 for the cluster, 4001 for the first node, and 4002 for the first replica.

Architecture

SuperMassive follows a distributed architecture with distinct component types.

Component Hierarchy

Component Description Role
Cluster Central coordination unit Manages node distribution, request routing, and health monitoring
Primary Node Data storage unit Handles write operations and maintains primary data copy with replica health monitoring
Replica Node Redundancy unit Maintains synchronized copy of primary node data

Data Flow

Write Operation Flow
1. Client → Cluster: Write request
2. Cluster → Primary Node: Route based on sequence
3. Primary Node → Journal: Async write
4. Primary Node → Replicas: Relay operation
5. Primary Node → Cluster: Confirmation | If primary node unavailable we go to write to next primary node in sequence
6. Cluster → Returns result to client

Authentication

SuperMassive implements a multi-layer authentication system to secure both client-cluster and inter-node communications.

Security Note: Always use secure passwords and protect your shared keys. Consider using environment variables for sensitive credentials.

Authentication Methods

  1. Client Authentication

    Clients must authenticate using base64-encoded credentials in the format username\0password. This is a form of basic authentication.

  2. Inter-node Authentication

    Nodes authenticate using a shared key specified during authentication phases.

  3. TLS Support

    Optional TLS encryption for all communications.

Configuration

Cluster Config Example

1 primary with 1 replica for that primary.

YAML Configuration

health-check-interval: 2
server-config:
    address: localhost:4000
    use-tls: false
    cert-file: /
    key-file: /
    read-timeout: 10
    buffer-size: 1024
node-configs:
    - node:
        server-address: localhost:4001
        use-tls: false
        ca-cert-file: ""
        connect-timeout: 5
        write-timeout: 5
        read-timeout: 5
        max-retries: 3
        retry-wait-time: 1
        buffer-size: 1024
      replicas:
        - server-address: localhost:4002
          use-tls: false
          ca-cert-file: ""
          connect-timeout: 5
          write-timeout: 5
          read-timeout: 5
          max-retries: 3
          retry-wait-time: 1
          buffer-size: 1024

Primary Config Example

To continue above 1 primary 1 replica for primary config.

YAML Configuration

health-check-interval: 2
max-memory-threshold: 75
server-config:
    address: localhost:4001
    use-tls: false
    cert-file: /
    key-file: /
    read-timeout: 10
    buffer-size: 1024
read-replicas:
    - server-address: localhost:4002
      use-tls: false
      ca-cert-file: /
      connect-timeout: 5
      write-timeout: 5
      read-timeout: 5
      max-retries: 3
      retry-wait-time: 1
      buffer-size: 1024

                

Replica Config Example

Replica configurations are simple! Just server related.

YAML Configuration

server-config:
    address: localhost:4002
    use-tls: false
    cert-file: /
    key-file: /
    read-timeout: 10
    buffer-size: 1024
max-memory-threshold: 75
                

Command Reference

SuperMassive supports a simple set of commands for data manipulation and cluster management.

Data Operations

Command Description
PUT key value Store a key-value pair
GET key Retrieve a value by key
DEL key Delete a key-value pair
INCR key [value] Increment numeric value by amount
DECR key [value] Decrement numeric value by amount
REGX pattern [offset] [limit] Search keys using regular expression with optional pagination

Cluster & Node Operations

Command Description
RCNF Refresh configuration files across cluster and all nodes
PING ping pong

Advanced Pattern Matching

REGX Examples
# Match user keys with ID between 1000-1999
REGX ^user:1[0-9]{3}$

# Find all session keys from today
REGX session:2025-02-23.*

# Get first 10 logs from February
REGX ^log:2025-02.* 0 10

# Find all temporary keys
REGX ^temp:.*:([0-9]+)$

# Example results
OK user:1001 {"data": 123}
user:1002 {"data": 1234}
user:1003 {"data": 12345}
user:1004 {"data": 123456}
..

Monitoring & Statistics

SuperMassive provides detailed statistics and monitoring capabilities through the STAT command.

Statistics Example
# Get full cluster stats
CLUSTER localhost:4000
    current_sequence 0
    client_connection_count 1
PRIMARY localhost:4001
DISK
    sync_enabled true
    sync_interval 128ms
    avg_page_size 1024.00
    file_mode -rwxrwxr-x
    is_closed false
    last_page 99
    storage_efficiency 0.9846
    file_name .journal
    page_size 1024
    total_pages 100
    total_header_size 1600
    total_data_size 102400
    page_utilization 1.0000
    header_overhead_ratio 0.0154
    file_size 104000
    modified_time 2025-02-23T04:39:31-05:00
MEMORY
    load_factor 0.3906
    grow_threshold 0.7500
    max_probe_length 2
    empty_buckets 156
    utilization 0.3906
    needs_grow false
    needs_shrink false
    size 256
    used 100
    shrink_threshold 0.2500
    avg_probe_length 0.2600
    empty_bucket_ratio 0.6094

Cluster Statistics

Statistic Description Example Value
current_sequence Current primary node to in sequence next up to write to 32
client_connection_count Amount of clients connected to cluster 2

Memory Statistics

Statistic Description Example Value
Basic Metrics
size Total number of buckets in the hash table 32
used Number of occupied buckets 20
load_factor Ratio of used buckets to total size (used/size) 0.6250
Threshold Controls
grow_threshold Load factor threshold that triggers table growth 0.7500
shrink_threshold Load factor threshold that triggers table shrinking 0.2500
Performance Metrics
avg_probe_length Average number of steps to find an item 1.2500
max_probe_length Maximum number of steps needed to find an item 3
empty_buckets Number of unused bucket slots 12
empty_bucket_ratio Proportion of empty buckets to total size 0.3750
utilization Same as load_factor, efficiency measure 0.6250

Disk Statistics

Statistic Description Example Value
File Information
file_name Name of the pager file data.db
file_size Total size of the file in bytes 1048576
file_mode File permissions -rw-r--r--
modified_time Last modification timestamp 2025-02-23T10:30:00Z
Configuration
page_size Size of each page in bytes 4096
sync_enabled Whether background syncing is enabled true
sync_interval How often the file is synced to disk 1s
is_closed Whether the pager has been closed false
Page Statistics
total_pages Number of pages in the file 250
last_page Index of the last page 249
Storage Metrics
total_header_size Total size used by page headers (16 bytes per page) 4000
total_data_size Actual data size without headers 1044576
storage_efficiency Ratio of data size to total size 0.9962
avg_page_size Average amount of data per page 4178.30
page_utilization How full pages are on average 0.9845
header_overhead_ratio Proportion of space used by headers 0.0038

Replication & Recovery

SuperMassive uses a journal-based replication system to ensure data consistency and fault tolerance.

Replication Process

  1. Primary Write

    Primary node writes data to journal and sends operation to replicas.

  2. Replica Sync

    Replicas receive operation and write to their own journal for recovery.

  3. Recovery

    At startup, all nodes recover from the journal. A replica will start a sync with its primary once the STARTSYNC command is transmitted; This happens on primary node health checks. The synchronization process is checkpoint-like in nature. Essentially, the replica informs the primary of its current journal position (for example, “My journal is at page 44”), and the primary then examines its own journal to send any missing entries. Once this process completes, the replica will be fully synchronized. Full synchronizations from an empty journal can take longer than those where the replica only needs to catch up on a few entries.

Recovery Process

Recovery Steps
1. Replica → Journal: Read journal entries
        

Performance Tuning

Node Memory Configuration

Node Configuration
max-memory-threshold: 75  # Percentage of max memory to cap at
Performance Tip: Add more nodes to the cluster to prevent caps, allowing for more data storage and faster operations.