Decentralized memory custodian for personalized agentic experiences

We'd love to partner with early-stage companies to build together. Join us in the Tiles Discord server. Subscribe to our blog Neurons for updates on on-device AI and personalization research.

Consider supporting our research and open-source projects through Github Sponsors.

This work is currently supported by Boris Mann, Luke Hubbard, Curran Dwyer, Xi Zhang, Dietrich Ayala, and Hugo Duprez.

Resources

Below is a living index of resources that inform and inspire our work.

Engineering

✨ Modelfile Reference - Ollama English Documentation
✨ Introducing Gemma 3n: The developer guide
✨ Foundation Models adapter training - Apple Intelligence - Apple Developer
vLLM Semantic Router: Next Phase in LLM inference
✨ Use MergeKit to Extract LoRA Adapters from any Fine-Tuned Model
✨ Apple’s New Containerization Framework: A Deep Dive into macOS’s Future for Developers
instavm/coderunner: A secure local sandbox to run LLM-generated code using Apple containers
Introducing the unified multi-modal MLX engine architecture in LM Studio
Announcing Spiral, Data 3.0, with backing from the best
mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL
✨ Optimizing AI Inference at Character.AI
✨ Optimizing AI Inference at Character.AI (Part Deux)
✨ PrimeIntellect-ai/prime-iroh: Asynchronous P2P communication backend for decentralized pipeline parallelism
✨ Introducing Gemma 3 270M: The compact model for hyper-efficient AI
✨ Unternet Kernel
✨ SwiftWasm, WebAssembly support for the Swift programming language
DSPy Notebook, The pretty much "official" DSPy framework for Typescript
Structured outputs for LLMs
Accelerated PyTorch training on Mac
✨ Unsloth AI - Open Source Fine-tuning & RL for LLMs
✨ Introducing LFM2: The Fastest On-Device Foundation Models on the Market
✨ Mistral.rs, a cross-platform, highly-multimodal inference engine
Osmosis, Unlocking AI self-improvement at production scale
Supermemory MCP
✨ Introducing the v0 composite model family, Vercel
Agent Reinforcement Trainer, OpenPipe
Universal Quantized File Format: UQFF
GGUF Tool Suite
uqff_maker
Minions, Big & Small LLMs working together
✨ The Kaitchup Index: A Leaderboard for Quantized LLMs
Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily
Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber
📏RULER: Easy Mode for RL Rewards
ART·E: How We Built an Email Research Agent That Beats o3
OpenBench, Provider-agnostic, open-source evaluation infrastructure for language models
✨ LoRA's Limitations: Head-to-Head with Full RL
✨ A case for client-side machine learning, Christopher Fleetwood
Democratizing Al: The Psyche Network Architecture, Nous Research
Interoperability: Swift’s Super Power, Speaking in Swift by The Browser Company

Research

LoRA Learns Less and Forgets Less
✨ The Bitter Lesson is coming for Tokenization
On the Way to LLM Personalization: Learning to Remember User Conversations, Apple Machine Learning Research
✨ Text-to-LoRA: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input ,Sakana AI
Transformer²: Self-Adaptive LLMs
How memory augmentation can improve large language models, IBM Research
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
✨ The Power of Efficiency: Edge Al’s Role in Sustainable Generative Al Adoption
✨ Small Language Models are the Future of Agentic AI, NVIDIA Research
✨ Defeating Prompt Injections by Design, Google Deepmind
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Introducing FlexOlmo: a new paradigm for language model training and data collaboration, Allen AI
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers, Argmax
✨ Towards Large-scale Training on Apple Silicon, Exo Labs
Kinetics: Rethinking Test-Time Scaling Laws
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air
Comparative Analysis of Retrieval Systems in the Real World
FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
On the Way to LLM Personalization: Learning to Remember User Conversations
A Preliminary Report On Edge-Verified Machine Learning, Exo Labs
✨ Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
✨ Intent-Based Architecture and Their Risks
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design
Towards Feasible Private Distributed LLM Inference, Dria

Reference

RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI
✨ Hand-picked selection of articles on AI fundamentals/concepts that cover the entire process of building neural nets to training them to evaluating results.
✨ The State of On-Device LLMs
✨ Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack
How to Scale Your Model
✨ r/LocalLLaMA
✨ An Analogy for Understanding Transformers
✨ Neural networks, 3Blue1Brown
GGUF Quantization Docs (Unofficial)
Reverse-engineering GGUF | Post-Training Quantization
Reference implementation of the Transformer architecture optimized for Apple Neural Engine
H100 PCIe vs SXM vs NVL: Which H100 GPU Is Fastest and Most Cost-Effective for Fine-Tuning LLMs?
Apple Developer, Technotes, Learn about specific development topics through these in-depth technical articles.
The Apple Wiki
LLMs on a Budget

Resource inspired from GPU Glossary, Modal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiles

Resources

Engineering

Research

Reference

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Sponsors

Top languages

Uh oh!

Most used topics

Uh oh!