Overview
tamebi is a general-purpose Python package for working with open-source AI models. It is designed to grow with you, from understanding your hardware, to running models, to building agents and deploying in production.
The current release focuses on hardware detection and model compatibility. Give it one command and it automatically scans your CPU, RAM, GPU, and disk, then tells you exactly which models you can run locally, at which precision, and what performance to expect. Every result includes a memory breakdown, throughput estimates, and ready-to-copy Ollama commands.
More features are on the way, including model inference, agent primitives, provider abstraction, and deployment helpers. This is the foundation.
NVIDIA, AMD, Intel, and Apple Silicon are all detected automatically. No extra flags or environment variables needed. The model catalog updates weekly and covers the latest releases from major labs.
Installation
Install with pip:
pip install tamebiOr with uv (recommended, significantly faster):
uv pip install tamebicurl -LsSf https://astral.sh/uv/install.sh | shNo extra flags, extras, or system dependencies are needed. Platform detection (NVIDIA via NVML, AMD via ROCm, Intel via OpenCL, Apple via system_profiler) is handled automatically at runtime.
Quick Start
Run a hardware scan with a single command:
tamebi checkOutput is divided into three sections:
CLI Reference
›tamebi check
Detect hardware and show what's runnable. Output has three sections: Hardware, Top Recommendations, and Runnable Models.
tamebi check [OPTIONS]| Flag | Short | Default | Description |
|---|---|---|---|
--json | -j | false | Output as JSON instead of rich tables. |
--context-length | -c | 4096 | Context length in tokens. KV cache scales linearly with this. 4K vs 128K changes memory dramatically. |
--batch-size | -b | 1 | Concurrent requests. Each gets its own KV cache. Set >1 if planning to serve multiple users. |
--verbose | - | false | Show detailed detection info: driver versions, compute capability, etc. |
›tamebi models
Show the full model compatibility matrix. Every model in the catalog across all precisions (INT4, INT8, FP16), with fit status and memory at each level.
tamebi models [OPTIONS]| Flag | Short | Default | Description |
|---|---|---|---|
--context-length | -c | 4096 | Context length for KV cache estimation. |
--batch-size | -b | 1 | Batch size for KV cache estimation. |
›tamebi update
Pull the latest model catalog from the remote. The catalog updates automatically in the background, but you can force a refresh at any time.
tamebi updatetamebi update to force a refresh.Examples
Scan your machine and see all compatible models with Ollama commands.
tamebi checkMachine-readable output. Pipe into jq, scripts, or CI pipelines.
tamebi check --jsonEach concurrent user gets their own KV cache. This estimates memory for 4 simultaneous requests with 8K context each.
tamebi check --batch-size 4 --context-length 8192Pass --context-length 0 to use each model's own maximum context window instead of the 4K default.
tamebi check --context-length 0See every model in the catalog and their compatibility across INT4, INT8, and FP16 precisions.
tamebi modelsSupported Hardware
tamebi detects your hardware automatically using platform-native APIs. No extra configuration is needed.
| Platform | Detection Method | What's Reported |
|---|---|---|
| NVIDIA | nvidia-ml-py (NVML) | Model, VRAM, CUDA version, compute capability |
| AMD | rocm-smi (subprocess) | Model, VRAM (requires ROCm) |
| Intel | OpenCL / WMI | Model, VRAM (Arc discrete + integrated graphics) |
| Apple Silicon | system_profiler | Chip model (M1/M2/M3/M4), unified memory |
| CPU-only | psutil + py-cpuinfo | Cores, threads, frequency, architecture |
Model Catalog
The model catalog is maintained automatically, fetched directly from HuggingFace Hub, no manual curation needed. It updates in the background weekly and covers the latest releases from all major labs.
Models are catalogued across multiple precisions (INT4, INT8, FP16) with accurate parameter counts, context windows, and layer counts for GQA-aware KV cache estimation. Run tamebi update at any time to pull the latest catalog without reinstalling.
How Estimation Works
Memory is estimated per model and precision using the following formula:
The KV cache formula is GQA-aware: models that use grouped query attention (like Llama 3, Qwen 2.5, Gemma) have far fewer KV heads than query heads, so their KV cache is proportionally smaller. tamebi uses the actual architecture metadata from HuggingFace, not a rough approximation.