Warning
Prototype Stage: This project is currently in a pre-release/prototype stage. The actual coding functionality has not been fully implemented yet as research into optimization is ongoing.
Zweek Code
AI Coding Assistant That Runs on YOUR Machine
No cloud. No telemetry. Just you and your code.
What Is This?
A terminal-based AI coding assistant powered by specialized local models. Unlike cloud assistants, everything runs offline on consumer hardware using optimized small models (135M-1.1B parameters).
Philosophy: Use deterministic tools when possible, AI when necessary.
Features
✅ 100% Offline – No internet, no tracking, no data leaves your machine
✅ Fast – Router stays resident (~150MB RAM), responses in seconds
✅ Private – Your code is yours
✅ Lightweight – Runs on older hardware (4GB RAM minimum)
✅ Smart Routing – SmolLM classifies intent (CODE/CHAT/TOOL) instantly
Architecture
Optimized 3-Model System:
- Router (SmolLM-135M) - Stays loaded, classifies intent with GBNF grammar
- Code Drafter (StarCoder-Tiny) - Generates code when needed
- Chat (Qwen3-0.6B-Q8_0) - Answers questions about your code
Key Optimizations:
- Resident models stay in memory (~350MB idle)
- GBNF grammars eliminate hallucination
- Compiler check (
cl.exe) validates code instantly (no AI) - Peak RAM: ~500MB during inference
Quick Start
Requirements
- Windows 10/11, Linux, or macOS
- 4GB RAM minimum (8GB recommended)
- CMake 3.20+
- C++17 compiler
Build
git clone https://github.com/wedsmoker/zweek-code.git
cd zweek-code
# Download models (place in models/ directory)
# - smollm-135m-router.gguf
# - Qwen3-0.6B-Q8_0.gguf
# - starcoder-tiny.gguf
cmake -S . -B build -G Ninja
cmake --build build
.\build\zweek.exe # Windows
./build/zweek # Linux/macOS
See QUICKSTART.md for detailed instructions.
Usage
> /help
Show commands and tips
> what is this function doing?
Chat mode answers questions
> add error handling here
Code mode generates fixes
Keyboard:
m- Switch between Plan and Auto modey/n- Accept/reject changesCtrl+C- Exit
Models
Place these in models/:
| Model | Size | Purpose | RAM |
|---|---|---|---|
| smollm-135m-router.gguf | ~150MB | Intent classification | Resident |
| starcoder-tiny.gguf | ~200MB | Code generation | On-demand |
| Qwen3-0.6B-Q8_0.gguf | ~700MB | Q&A | On-demand |
Download from HuggingFace (GGUF Q8 quantized versions).
Performance
Target: <15 seconds for most operations
Idle RAM: ~350MB (Router + Code Drafter resident)
Peak RAM: ~500MB during chat inference
Status
v1.0.0-alpha - Phase 2 Complete
✅ TUI with FTXUI
✅ Router with GBNF
✅ Chat mode with Qwen3
✅ Compiler-based validation
✅ Command system
✅ Persistent Chat History
Commands
/help- Show available commands/history [n]- Show last n messages/sessions- List saved sessions/load <index>- Load a previous session/clear-history- Clear current session history/cd <path>- Change working directory/ls [path]- List files in directory (current if no path given)
Keyboard Shortcuts
m- Switch between Plan and Auto modet- Toggle visibility of "Thinking" sectionsy/n- Accept/reject changesCtrl+C- Exit
Tech Stack
- llama.cpp - GGUF model inference
- FTXUI - Terminal UI
- nlohmann/json - JSON handling
License
MIT - See LICENSE
No cloud. No compromises. Your code, your machine.
