GAMMA

Game Analyzing Model Methods Attentively Guessing Alternative Model Mechanics Analytically Grasping Attention Mechanism Mysteries Accessibly

╭─────────────────────────────────────────────────────────╮
│                                                         │
│       ☇  GAMMA - LLM Learning & Experimentation  ☇   │
│                                                         │
╰─────────────────────────────────────────────────────────╯

Overview

GAMMA is a comprehensive toolkit for exploring, comparing, and experimenting with Large Language Models (LLMs). It transforms complex AI concepts into interactive experiences.

Interactive Tools

☇ Interactive Game: Predict what the model will generate next and compete against AI
☛ Chat Interface: Simple, direct conversations with any supported model
☰ Tutorial Mode: Learn how LLMs work through guided lessons
☄ Quick Inference: Single-shot generation with performance metrics

Comparison & Analysis Tools

☲ Model Comparison: Side-by-side analysis of different models
⚗ Mind Meld: Experimental multi-model collaboration system
⚗ Language Comparison: TypeScript vs JavaScript LLM code generation benchmarks

Utilities

⚘ Color Library: dream.js - Material Design 3 color utilities with HCT color space

Quick Start

Installation

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install base requirements
pip install -r requirements.txt

# Choose your engine (install at least one):
pip install -r requirements-pytorch.txt     # PyTorch (recommended)
pip install -r requirements-llamacpp.txt    # llama.cpp for GGUF models
pip install -r requirements-onnx.txt        # ONNX Runtime
pip install -r requirements-mlx.txt         # Apple Silicon

# For language comparison benchmarks (optional)
cd src/benchmarks/dream
npm install
cd ../../..

First Run

# Unified CLI
python gamma.py game                        # Interactive game
python gamma.py comparison                  # Model comparison
python gamma.py mind-meld                   # Mind meld experiments
python gamma.py language-comparison         # Benchmarks

# Direct entry points
python game.py                              # Interactive mode
python game.py --chat                       # Chat mode
python game.py --tutorial                   # Tutorial mode
python game.py --prompt "Explain quantum computing"  # Quick inference

Model Sources

GAMMA supports models from multiple sources with automatic detection:

1. Ollama Models (Recommended - Local & Fast)

# GAMMA auto-detects Ollama models
ollama list

# Use directly - no configuration needed
python game.py  # Interactive menu shows your Ollama models

Features:

☑ Auto-detection of all Ollama models
☑ No downloads required
☑ Works with llamacpp or ollama engine
☑ Deduplicates models found in multiple locations
☑ Shows model source (Ollama, HuggingFace, local files)

2. HuggingFace Models

# Auto-downloaded on first use
python game.py --engine pytorch --model google/gemma-2-2b-it

# For gated models (like Gemma), login first:
huggingface-cli login

3. Local GGUF Files

# Place GGUF files in models/ directory
python game.py --engine llamacpp --model models/my-model.gguf

# Or create symlinks to Ollama models:
ln -s ~/.ollama/models/blobs/sha256-abc123... models/qwen-coder.gguf

Supported Engines

Engine	Best For	Hardware	Status	When to Use
llamacpp	GGUF models	CPU, GPU (ROCm/CUDA)	☑ Fully Supported	Default - Quantized models, efficient inference
pytorch	HF Transformers	CUDA, ROCm, MPS	☑ Fully Supported	Full-precision models, latest HF models
tensorflow	TF/Keras models	CUDA, CPU	⚠ Experimental	TF-specific models or pipelines
jax	JAX/Flax models	TPU, CUDA	⚠ Experimental	TPU support or JAX models
onnx	ONNX Runtime	CPU, CUDA, DirectML	⚠ Experimental	Cross-platform, DirectML on Windows
mlx	MLX-optimized	Apple M1/M2/M3/M4	⚠ Experimental	Apple Silicon MLX optimizations

Quick Guide:

☐ Local models (Ollama) → Use llamacpp
☁ HuggingFace models → Use pytorch (or llamacpp for GGUF)
♁ Apple Silicon → Use llamacpp (or mlx if you have MLX models)
☐ Windows without CUDA → Use llamacpp or onnx
⚗ TPU/specialized → Use matching engine (jax for TPU, tensorflow for TF Serving)

Engine Selection Logic:

Interactive menu auto-detects Ollama models → recommends llamacpp
Falls back to PyTorch if HuggingFace is authenticated
Shows warnings for gated models without authentication
Displays available VRAM and memory requirements

Note: Ollama is a model provider (like HuggingFace), not an engine. GAMMA uses the llamacpp engine to run Ollama's GGUF files directly.

Usage Modes

Interactive Menu (Recommended)

python game.py

Features:

Hardware detection (GPU, VRAM, CPU)
Auto-detects Ollama models
Shows memory requirements before loading
Recommends engines based on your setup
Local vs. downloadable model indicators

Menu Options:

Just Play - Classic game with smart defaults
Quick Tutorial - Start learning immediately
Quick Compare - Compare 2 small models
Classic Game - Full configuration options
Tutorial Mode - Customized learning experience
Comparison Mode - Multi-model analysis
Mind Meld Mode - Experimental collaboration

Command-Line Interface

# Classic game with specific model
python game.py --engine llamacpp --model models/qwen3-coder-30b.gguf

# Chat mode
python game.py --engine ollama --model qwen3-coder:30b --chat

# Single-shot inference with performance stats
python game.py --prompt "Write a Python hello world" --steps 20

# Compare two models
python game.py --comparison \
  --comparison-models \
    llamacpp:models/model1.gguf \
    ollama:qwen3:30b

# Tutorial with specific model
python game.py --tutorial --engine pytorch --model google/gemma-2-2b-it

# Advanced options
python game.py \
  --engine llamacpp \
  --model models/my-model.gguf \
  --temperature 0.7 \
  --top-k 40 \
  --top-p 0.95 \
  --steps 50 \
  --show-attention \
  --verbose

Configuration Options

# Core Settings
--engine ENGINE           # ollama, llamacpp, pytorch, etc.
--model MODEL            # Model name or path
--steps N                # Max generation steps (default: 8)
--temperature T          # Sampling temperature 0.1-2.0 (default: 0.7)
--top-k K                # Top-K filtering (default: 8)
--top-p P                # Nucleus sampling 0.0-1.0 (default: 0.95)

# Display Options
--show-attention         # Show attention heatmaps
--verbose                # Detailed explanations
--num-choices N          # Choices per round (default: 4)

# Game Modes
--chat                   # Chat mode
--tutorial               # Tutorial mode
--comparison             # Comparison mode
--prompt "TEXT"          # Single-shot inference

# Engine-Specific
--llama-cpp-n-gpu-layers N    # GPU layers for llama.cpp (-1 = all)
--llama-cpp-n-ctx N           # Context size (default: 2048)
--load-in-4bit                # 4-bit quantization (PyTorch)
--load-in-8bit                # 8-bit quantization (PyTorch)

Features

☇ Intelligent Model Selection

Auto-Detection: Finds models in Ollama, HuggingFace cache, and local directories
Memory Estimation: Calculates VRAM requirements before loading
Smart Defaults: Recommends engine and models based on hardware
Deduplication: Detects same model in multiple locations
Source Indicators: Shows where each model comes from

☄ Performance

Hardware Detection: CUDA, ROCm, Metal, CPU backends
GPU Offloading: Configurable layer distribution
KV Cache: Efficient context management
Quantization: 4-bit and 8-bit support (PyTorch)
Memory Mapping: Efficient model loading

☇ Game Modes Explained

Classic Game Mode

Predict the model's next token choice. Learn about:

Temperature effects on randomness
Top-K and Top-P sampling
Attention mechanisms
Token probabilities

Chat Mode

Simple conversation interface with:

Multi-turn conversations
Context preservation
Exit commands (/quit, /exit, /bye)

Tutorial Mode

Interactive lessons covering:

How LLMs work
Tokenization
Sampling strategies
Attention visualization

Comparison Mode

Side-by-side model analysis:

Compare predictions
See probability differences
Understand model biases
Test prompts across models

Mind Meld Mode (Experimental)

Multi-model collaboration featuring:

Dynamic model swapping
KV cache bridging
Weighted averaging
Agreement-based ensembling
Custom swap strategies

⚙ Tools

Model Downloader

python tools/download_model.py --repo-id <REPO_ID> --filename <FILENAME>

API Server

python tools/run_api_server.py --model <MODEL> --engine <ENGINE>

Architecture

gamma/
├── gamma.py                        # Unified CLI entry point
├── game.py                         # Game entry point
├── src/
│   ├── game/                       # ☇ Game-specific code
│   │   ├── game_logic.py
│   │   ├── game_displays.py
│   │   └── tutorial_mode.py
│   ├── comparison/                 # ☲ Model comparison tools
│   │   └── comparison_mode.py
│   ├── mind_meld/                  # ⚗ Multi-model collaboration
│   │   ├── strategies/
│   │   └── advanced/
│   ├── benchmarks/                 # ⚗ Benchmarking suite
│   │   ├── mind_meld_benchmark.py
│   │   └── dream/                  # DREAM: TypeScript vs JavaScript benchmarks
│   │       ├── index.js
│   │       ├── tasks/              # 20+ coding tasks
│   │       ├── runner/
│   │       └── evaluator/
│   ├── color_utils/                # ⚘ Color utilities
│   │   ├── dream.js                # Material Design 3 colors
│   │   ├── demo/                   # MILCHICK demo
│   │   └── test/
│   ├── core/                       # Core shared utilities
│   │   ├── engine_interface.py
│   │   ├── interactive_menu.py
│   │   ├── model_catalog.py
│   │   └── ...
│   └── engines/                    # Model execution engines
│       ├── pytorch_engine.py       # ☑ Fully implemented
│       ├── llamacpp_engine.py      # ☑ Fully implemented
│       ├── ollama_engine.py        # ☑ Fully implemented
│       └── ...
├── tools/                          # Standalone CLI tools
├── models/                         # Local model storage
├── requirements*.txt               # Dependencies
└── docs/                           # Documentation

Examples

Example 1: Quick Start with Ollama

# GAMMA auto-detects your Ollama models
python game.py

# Select Ollama from the menu
# Choose your model from the list
# Models are marked with ☐ (local) and show size

Example 2: Chat with Code Model

python game.py \
  --engine llamacpp \
  --model models/qwen3-coder-30b.gguf \
  --chat

Example 3: Compare Models

python game.py \
  --comparison \
  --comparison-models \
    ollama:qwen3-coder:30b \
    ollama:deepseek-r1:32b \
  --prompt "Write a Python function to calculate fibonacci"

Example 4: Memory-Efficient Inference

# Use quantization for large models
python game.py \
  --engine pytorch \
  --model google/gemma-2-9b-it \
  --load-in-4bit \
  --chat

Example 5: Tutorial Learning

# Learn about LLMs interactively
python game.py --tutorial

# Or with a specific model
python game.py --tutorial \
  --engine ollama \
  --model gemma3:1b-it-qat

Troubleshooting

Ollama models not detected

# Check Ollama is running
ollama list

# Restart GAMMA
python game.py

Out of memory errors

# Use smaller model
python game.py --model google/gemma-2-2b-it

# Use quantization
python game.py --load-in-4bit

# Reduce context size
python game.py --llama-cpp-n-ctx 1024

# Use CPU layers
python game.py --llama-cpp-n-gpu-layers 0

HuggingFace authentication

# For gated models (Gemma, Llama, etc.)
huggingface-cli login

# Or set token
export HF_TOKEN=your_token_here

llama.cpp GPU support

# Rebuild llama-cpp-python with GPU support
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

# Or for ROCm (AMD)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python

Advanced Topics

Mind Meld Mode

Multi-model collaboration system (experimental):

python game.py \
  --mind-meld \
  --meld-models \
    pytorch:google/gemma-2-2b-it \
    pytorch:Qwen/Qwen2-1.5B-Instruct \
  --meld-strategy round_robin

Swap Strategies:

fixed_interval: Swap every N tokens
round_robin: Rotate through models
pattern: Swap on specific patterns
confidence: Swap when model is uncertain
random: Random switching

Advanced Features:

Weighted averaging of logits
Agreement-based ensembling (ABE)
KV cache bridging (limited support)
Vocabulary translation

See Mind Meld Documentation for details.

Custom Engine Configuration

# In game.py or custom script
engine_config = {
    'llama_cpp_n_ctx': 4096,
    'llama_cpp_n_gpu_layers': -1,
    'llama_cpp_lib_verbose': False,
    'seed': 42
}

engine = get_engine('llamacpp', 'models/model.gguf', engine_config)

Relationship to PAWS/REPLOID

GAMMA and PAWS/REPLOID are complementary tools serving different purposes:

GAMMA:

☇ Educational focus - Learn how LLMs work
☲ Model comparison - Understand differences between models
⚗ Experimentation - Test sampling strategies, attention mechanisms
☐ Local operation - All models run locally

PAWS/REPLOID:

☇ Development focus - AI-assisted code generation
☲ Multi-agent competition - 3-5 LLMs compete with test-driven consensus
⚗ Production workflows - Git-backed reproducibility
☥ Visual review - Browser interface with diff viewer

Use GAMMA when: Learning about LLMs, testing models, exploring AI concepts Use PAWS/REPLOID when: Developing software, refactoring code, production changes

Both projects share:

Philosophy of transparency and human oversight
Support for local models (Ollama, HuggingFace)
Educational value through clear explanations
Open-source MIT license

Contributing

Contributions welcome! Areas for help:

☇ Bug fixes and improvements
☐ Documentation
⚗ Tests
⚛ New game modes
⚙ Engine implementations
⛶ Benchmarking tools

See CONTRIBUTING.md for guidelines.

License

MIT License - See LICENSE for details.

Credits

Built with:

Support

☰ Full docs: docs/
⚠ Report issues: GitHub Issues
☛ Discussions: GitHub Discussions

Made by developers who believe understanding AI is the first step to using it wisely.

☇ Interactive Learning × ☲ Model Comparison × ⚗ Experimentation = GAMMA

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
docs		docs
models		models
reports		reports
results/2025-10-16T20-24-06-715Z		results/2025-10-16T20-24-06-715Z
sessions		sessions
src		src
tests		tests
tools		tools
.coverage		.coverage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
game.py		game.py
gamma.py		gamma.py
requirements-jax.txt		requirements-jax.txt
requirements-llamacpp.txt		requirements-llamacpp.txt
requirements-mlx.txt		requirements-mlx.txt
requirements-onnx.txt		requirements-onnx.txt
requirements-pytorch.txt		requirements-pytorch.txt
requirements-tensorflow.txt		requirements-tensorflow.txt
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh

License

clocksmith/gamma

Folders and files

Latest commit

History

Repository files navigation

GAMMA

Overview

Interactive Tools

Comparison & Analysis Tools

Utilities

Quick Start

Installation

First Run

Model Sources

1. Ollama Models (Recommended - Local & Fast)

2. HuggingFace Models

3. Local GGUF Files

Supported Engines

Usage Modes

Interactive Menu (Recommended)

Command-Line Interface

Configuration Options

Features

☇ Intelligent Model Selection

☄ Performance

☇ Game Modes Explained

Classic Game Mode

Chat Mode

Tutorial Mode

Comparison Mode

Mind Meld Mode (Experimental)

⚙ Tools

Model Downloader

API Server

Architecture

Examples

Example 1: Quick Start with Ollama

Example 2: Chat with Code Model

Example 3: Compare Models

Example 4: Memory-Efficient Inference

Example 5: Tutorial Learning

Troubleshooting

Ollama models not detected

Out of memory errors

HuggingFace authentication

llama.cpp GPU support

Advanced Topics

Mind Meld Mode

Custom Engine Configuration

Relationship to PAWS/REPLOID

Contributing

License

Credits

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages