Pickbox is a distributed storage system implemented in Go that provides file operations with replication and consistency guarantees.
- File operations (OPEN, READ, WRITE, CLOSE)
- Distributed storage with multiple nodes
- Chunk-based storage with replication
- Vector clock-based conflict resolution
- Concurrent request handling
- Structured logging
The current implementation provides advanced multi-directional file replication where any node can initiate changes that automatically propagate to all other nodes while maintaining strong consistency through Raft consensus.
graph TB
subgraph "Pickbox Multi-Directional Distributed Storage System"
subgraph "Node 1 (Leader)"
N1[Node 1<br/>127.0.0.1:8001]
FW1[File Watcher<br/>fsnotify]
FSM1[Enhanced FSM<br/>Content Hash<br/>Deduplication]
RF1[Raft Instance<br/>Leader]
FS1[Local Storage<br/>data/node1/]
ADM1[Admin Server<br/>:9001<br/>FORWARD Support]
STATE1[File State<br/>SHA-256 Tracking]
N1 --> FW1
N1 --> FSM1
N1 --> ADM1
FW1 -->|"Detect Changes"| FSM1
FSM1 --> RF1
FSM1 --> FS1
FSM1 <--> STATE1
FSM1 -.->|"Pause During Apply"| FW1
end
subgraph "Node 2 (Follower + Watcher)"
N2[Node 2<br/>127.0.0.1:8002]
FW2[File Watcher<br/>fsnotify]
FSM2[Enhanced FSM<br/>Content Hash<br/>Deduplication]
RF2[Raft Instance<br/>Follower]
FS2[Local Storage<br/>data/node2/]
ADM2[Admin Server<br/>:9002<br/>FORWARD Support]
STATE2[File State<br/>SHA-256 Tracking]
N2 --> FW2
N2 --> FSM2
N2 --> ADM2
FW2 -->|"Detect Changes"| FSM2
FSM2 --> RF2
FSM2 --> FS2
FSM2 <--> STATE2
FSM2 -.->|"Pause During Apply"| FW2
end
subgraph "Node 3 (Follower + Watcher)"
N3[Node 3<br/>127.0.0.1:8003]
FW3[File Watcher<br/>fsnotify]
FSM3[Enhanced FSM<br/>Content Hash<br/>Deduplication]
RF3[Raft Instance<br/>Follower]
FS3[Local Storage<br/>data/node3/]
ADM3[Admin Server<br/>:9003<br/>FORWARD Support]
STATE3[File State<br/>SHA-256 Tracking]
N3 --> FW3
N3 --> FSM3
N3 --> ADM3
FW3 -->|"Detect Changes"| FSM3
FSM3 --> RF3
FSM3 --> FS3
FSM3 <--> STATE3
FSM3 -.->|"Pause During Apply"| FW3
end
subgraph "Users & Applications"
USER1[User/App A<br/>Edits Node 1]
USER2[User/App B<br/>Edits Node 2]
USER3[User/App C<br/>Edits Node 3]
CLI[Admin CLI<br/>Cluster Mgmt]
end
%% User Interactions
USER1 -->|"Create/Edit/Delete Files"| FS1
USER2 -->|"Create/Edit/Delete Files"| FS2
USER3 -->|"Create/Edit/Delete Files"| FS3
CLI --> ADM1
CLI --> ADM2
CLI --> ADM3
%% Multi-Directional Replication Flow
%% Leader Direct Processing
FSM1 -->|"Direct Apply (Leader)"| RF1
%% Follower Forwarding to Leader
FSM2 -->|"TCP FORWARD Command"| ADM1
FSM3 -->|"TCP FORWARD Command"| ADM1
%% Raft Consensus Distribution
RF1 -->|"Log Replication"| RF2
RF1 -->|"Log Replication"| RF3
RF2 -.->|"Heartbeats/Votes"| RF1
RF3 -.->|"Heartbeats/Votes"| RF1
%% Apply Commands to All FSMs
RF1 -->|"Apply Log Entry"| FSM1
RF1 -->|"Apply Log Entry"| FSM2
RF1 -->|"Apply Log Entry"| FSM3
%% Smart File System Updates
FSM1 -->|"Hash-Verified Write"| FS1
FSM2 -->|"Hash-Verified Write"| FS2
FSM3 -->|"Hash-Verified Write"| FS3
%% File System Event Detection
FS1 -.->|"inotify Events"| FW1
FS2 -.->|"inotify Events"| FW2
FS3 -.->|"inotify Events"| FW3
%% Result: Synchronized State
FS1 -.->|"Identical Content"| FS2
FS2 -.->|"Identical Content"| FS3
FS3 -.->|"Identical Content"| FS1
%% Key Features Callouts
subgraph "Key Features"
FEAT1[β
Any Node β All Nodes]
FEAT2[β
Strong Consistency]
FEAT3[β
Content Deduplication]
FEAT4[β
Real-time Sync]
FEAT5[β
Fault Tolerant]
FEAT6[β
Concurrent Users]
end
end
- π Multi-Directional Replication: Any node can initiate file changes that replicate to all others
- π‘οΈ Strong Consistency: Raft consensus ensures all nodes maintain identical state
- β‘ Real-time Synchronization: File changes detected and replicated within 1-4 seconds
- π Content Deduplication: SHA-256 hashing prevents infinite replication loops
- π₯ Concurrent Users: Multiple users can edit files simultaneously on different nodes
- π High Performance: Sub-second change detection with efficient consensus protocol
.
βββ cmd/
β βββ pickbox/ # Main CLI application
β βββ main.go # Entry point
β βββ node.go # Node management commands
β βββ multi_replication.go # Multi-directional replication
β βββ cluster.go # Cluster management
β βββ script.go # Script execution
βββ pkg/
β βββ storage/ # Storage layer
β β βββ manager.go # Storage manager implementation
β β βββ raft_manager.go # Raft consensus implementation
β β βββ *_test.go # Tests
β βββ replication/ # Replication logic
β β βββ fsm.go # Finite state machine
β β βββ fsm_test.go # Tests
β βββ watcher/ # File watching
β β βββ file_watcher.go # File system monitoring
β β βββ state_manager.go # State management
β β βββ *_test.go # Tests
β βββ monitoring/ # Monitoring and metrics
β β βββ metrics.go # Metrics collection
β β βββ dashboard.go # Dashboard UI
β β βββ *_test.go # Tests
β βββ admin/ # Admin interface
β βββ server.go # Admin server
β βββ server_test.go # Tests
βββ test/ # Integration tests
β βββ integration_test.go # End-to-end tests
β βββ n_node_test.go # N-node cluster tests
β βββ *_test.go # Other test files
βββ scripts/ # Automation scripts
β βββ cluster_manager.sh # Cluster management
β βββ demo_n_nodes.sh # N-node demos
β βββ tests/ # Test scripts
βββ examples/ # Example configurations
β βββ cluster-configs/ # Cluster configuration files
βββ .cursor/debug/ # Architecture documentation
βββ README.md # This file
- Go 1.21 or later
- Git for cloning the repository
-
Clone the repository:
git clone <repository-url> cd pickbox
-
Build the application:
make build # or go build -o bin/pickbox ./cmd/pickbox
-
Setup development environment (optional but recommended):
make setup # Install tools and pre-commit hooks
-
Start a cluster using the CLI:
# Start 3-node cluster using CLI ./bin/pickbox node start --node-id node1 --port 8001 --bootstrap & ./bin/pickbox node start --node-id node2 --port 8002 --join 127.0.0.1:8001 & ./bin/pickbox node start --node-id node3 --port 8003 --join 127.0.0.1:8001 & # Or use multi-directional replication mode ./bin/pickbox node multi --node-id node1 --port 8001 --bootstrap & ./bin/pickbox node multi --node-id node2 --port 8002 --join 127.0.0.1:8001 & ./bin/pickbox node multi --node-id node3 --port 8003 --join 127.0.0.1:8001 &
-
Alternative: Use cluster management scripts:
# 3-node cluster (backward compatible) ./scripts/cluster_manager.sh start -n 3 # 5-node cluster ./scripts/cluster_manager.sh start -n 5 # 7-node cluster with custom ports ./scripts/cluster_manager.sh start -n 7 -p 9000 -a 10000 # Use configuration file ./scripts/cluster_manager.sh start -c examples/cluster-configs/5-node-cluster.conf
-
Test the system:
# Create files on any node - they replicate everywhere! echo "Hello from node1!" > data/node1/test1.txt echo "Hello from node2!" > data/node2/test2.txt echo "Hello from node3!" > data/node3/test3.txt # Verify replication (all nodes should have all files) ls data/node*/
-
Run comprehensive tests:
# Test specific cluster size ./scripts/tests/test_n_replication.sh -n 5 # Test with original scripts (3-node) ./scripts/tests/test_multi_replication.sh
Port Assignment Schema (for N nodes starting at BASE_PORT=8001):
- node1: Raft=8001, Admin=9001, Monitor=6001
- node2: Raft=8002, Admin=9002, Monitor=6002
- nodeN: Raft=800N, Admin=900N, Monitor=600N
- Dashboard: 8080 (shared across all nodes)
The pickbox
CLI provides comprehensive cluster management:
# Start a node
./bin/pickbox node start --node-id node1 --port 8001 --bootstrap
# Start multi-directional replication
./bin/pickbox node multi --node-id node1 --port 8001 --bootstrap
# Join existing cluster
./bin/pickbox node start --node-id node2 --port 8002 --join 127.0.0.1:8001
# Check cluster status
./bin/pickbox cluster status --addr 127.0.0.1:9001
# Join cluster
./bin/pickbox cluster join --leader 127.0.0.1:8001 --node-id node4 --node-addr 127.0.0.1:8004
# Run predefined scripts
./bin/pickbox script demo-3-nodes
./bin/pickbox script cleanup
Pickbox now supports generic N-node clusters with flexible configuration. You can run anywhere from 1 to 20+ nodes with automatic port assignment and cluster management.
The new cluster_manager.sh
provides comprehensive cluster lifecycle management:
# Start clusters of any size
./scripts/cluster_manager.sh start -n 5 # 5-node cluster
./scripts/cluster_manager.sh start -n 10 -p 18000 # 10-node with high ports
# Manage cluster lifecycle
./scripts/cluster_manager.sh status -n 5 # Check status
./scripts/cluster_manager.sh logs -n 5 # View logs
./scripts/cluster_manager.sh restart -n 5 # Restart cluster
./scripts/cluster_manager.sh clean # Clean everything
# Use configuration files
./scripts/cluster_manager.sh start -c examples/cluster-configs/10-node-high-ports.conf
Pre-built configurations for common scenarios:
examples/cluster-configs/5-node-cluster.conf
- Standard 5-node setupexamples/cluster-configs/7-node-cluster.conf
- 7-node clusterexamples/cluster-configs/10-node-high-ports.conf
- 10-node with high ports
Example configuration:
NODE_COUNT=5
BASE_PORT=8001
ADMIN_BASE_PORT=9001
MONITOR_BASE_PORT=6001
DASHBOARD_PORT=8080
HOST=127.0.0.1
DATA_DIR=data
BINARY=./bin/pickbox
BINARY_ARGS="node multi"
# Multi-environment clusters
./scripts/cluster_manager.sh start -n 3 -p 8001 # Development
./scripts/cluster_manager.sh start -n 5 -p 12001 --data-dir staging # Staging
./scripts/cluster_manager.sh start -n 7 -p 18001 --data-dir prod # Production
# Dynamic expansion
./scripts/cluster_manager.sh start -n 3 # Start with 3 nodes
go run scripts/add_nodes.go -nodes 2 -start 4 # Add node4, node5
# Generic testing
./scripts/tests/test_n_replication.sh -n 5 -v # Test 5-node cluster
./scripts/tests/test_n_replication.sh -n 10 -p 18001 # Test with custom ports
All existing 3-node scripts remain functional:
# Legacy scripts (still work)
./scripts/run_multi_replication.sh # 3-node cluster
./scripts/run_live_replication.sh # Live replication demo
./scripts/tests/test_multi_replication.sh # 3-node tests
The system automatically replicates file operations across all nodes. You can work with files directly through the file system:
Creating Files:
# Create a file on any node
echo "Hello World!" > data/node1/example.txt
echo "Content from node2" > data/node2/another.txt
echo "Data from node3" > data/node3/document.txt
Reading Files:
# Read files from any node (content is identical across all nodes)
cat data/node1/example.txt
cat data/node2/example.txt # Same content as node1
cat data/node3/example.txt # Same content as node1
Editing Files:
# Edit files on any node using any editor
echo "Updated content" >> data/node2/example.txt
nano data/node3/document.txt
vim data/node1/another.txt
Verifying Replication:
# Check that all nodes have identical files
find data/ -name "*.txt" -exec echo "=== {} ===" \; -exec cat {} \;
Cluster Status:
# Check cluster status via admin interface
echo "STATUS" | nc localhost 9001 # Node 1 admin port
echo "STATUS" | nc localhost 9002 # Node 2 admin port
echo "STATUS" | nc localhost 9003 # Node 3 admin port
# Or use the CLI
./bin/pickbox cluster status --addr 127.0.0.1:9001
Cleanup:
# Clean up all processes and data
./scripts/cleanup_replication.sh
# or
./bin/pickbox script cleanup
The storage system is implemented with the following components:
- Storage Manager: Manages multiple storage nodes and coordinates operations
- Storage Node: Handles chunk storage and replication
- Vector Clock: Implements vector clocks for conflict resolution
- Each client connection is handled in a separate goroutine
- Storage operations are protected by mutexes for thread safety
- Vector clock operations are atomic
The system uses structured logging via logrus
for better observability. Logs include:
- Server startup and shutdown
- Client connections and disconnections
- File operations
- Storage operations
- Error conditions
Pickbox includes a comprehensive test suite covering unit tests, integration tests, and benchmarks. The system provides:
- Unit Tests: Storage package, Raft manager, and pickbox CLI components (active)
- Integration Tests: End-to-end 3-node cluster testing (currently disabled for CI/CD stability)
- Benchmark Tests: Performance testing for critical operations (active)
- Test Scripts: Automated testing for all replication modes (manual execution only)
# Run all tests with coverage
./scripts/run_tests.sh
# Run integration tests
cd test && go test -v .
# Run unit tests
go test -v ./pkg/storage ./cmd/pickbox
scripts/tests/test_replication.sh
- Basic Raft replication testsscripts/tests/test_live_replication.sh
- Live file watching testsscripts/tests/test_multi_replication.sh
- Multi-directional replication tests
π For comprehensive testing documentation, see test/README.md
Pickbox enforces strict code quality standards through comprehensive linting and automated checks:
- golangci-lint: Comprehensive Go linter with 25+ enabled checks
- staticcheck: Advanced static analysis for Go
- gosec: Security vulnerability scanner
- pre-commit: Automated quality checks on every commit
- β Unused Code Detection: Catches unused variables, functions, and struct fields
- β Security Scanning: Detects potential security vulnerabilities
- β
Code Formatting: Enforces consistent formatting with
gofmt
andgoimports
- β Performance Analysis: Identifies inefficient code patterns
- β Style Consistency: Maintains consistent coding style across the project
# Setup development environment
make setup # Install tools + pre-commit hooks
# Code quality commands
make lint # Run all linters
make lint-fix # Auto-fix issues where possible
make check-unused # Check for unused code specifically
make security # Run security analysis (go vet + gosec if available)
make security-install # Install gosec and run full security analysis
make verify-all # Run all checks (lint + test + security)
# Pre-commit integration
git commit # Automatically runs quality checks
make pre-commit # Run pre-commit hooks manually
All quality checks run automatically in GitHub Actions:
- Pre-commit hooks prevent bad code from being committed
- CI pipeline runs comprehensive linting on every push/PR
- Security scanning generates SARIF reports for GitHub Security tab
- Coverage enforcement maintains quality thresholds
Pickbox uses GitHub Actions for continuous integration and deployment:
- Multi-Go Version Testing: Tests against Go 1.21 and 1.22
- Comprehensive Test Suite: Unit tests, integration tests, and benchmarks
- Code Quality Checks:
go vet
,staticcheck
, and security scanning - Cross-Platform Builds: Linux, macOS, and Windows binaries
- Coverage Reporting: Automated coverage reports via Codecov
- Security Scanning: Gosec security analysis
- Automated Releases: Binary releases on main branch pushes
- Test Suite (
test
) - Runs unit tests with coverage - Integration Tests (
integration-test
) - End-to-end testing (currently disabled - see Improvements section) - Build (
build
) - Cross-platform binary compilation - Security (
security
) - Security vulnerability scanning - Release (
release
) - Automated GitHub releases - Notify (
notify
) - Pipeline status notifications
- Coverage Reports: HTML and raw coverage data
- Binaries: Cross-platform executables for the pickbox CLI
- Security Reports: SARIF format security scan results
- Integration Logs: Debug logs from failed integration tests