local-llm-troubleshooter
by LB Creations
Diagnose and fix broken local LLM stacks, GPU issues, and stalled model downloads across Ollama, LM Studio, and more.
- Identify why a local inference server is unresponsive or timing out
- Debug CUDA Out-of-Memory (OOM) errors and GPU-to-CPU fallback issues
- Detect and resume stalled Hugging Face or Ollama model downloads
Free
One-time purchase
See it in action
A real example of what this skill takes in and produces.
Sample output
Verdict: STUCK (Ollama) Cause: CUDA Out of Memory (OOM) detected in logs. Model: Llama-3-70b (Q4_K_M) Fix:
- Set OLLAMA_NUM_PARALLEL=1
- Reduce context window from 8k to 4k in Modelfile.
- If issue persists, switch to Q2 quantization. Verification: Re-run diagnostic to confirm GPU runner load.
local-llm-troubleshooter
by LB Creations
Diagnose and fix broken local LLM stacks, GPU issues, and stalled model downloads across Ollama, LM Studio, and more.
Free
One-time purchase
Included in download
- Downloadable skill package
- 1 permission declared
- Instant install
See it in action
A real example of what this skill takes in and produces.
Sample output
Verdict: STUCK (Ollama) Cause: CUDA Out of Memory (OOM) detected in logs. Model: Llama-3-70b (Q4_K_M) Fix:
- Set OLLAMA_NUM_PARALLEL=1
- Reduce context window from 8k to 4k in Modelfile.
- If issue persists, switch to Q2 quantization. Verification: Re-run diagnostic to confirm GPU runner load.
About This Skill
What it does
The Local LLM Troubleshooter is a diagnostic power-tool for developers and AI engineers whose local inference stacks (Ollama, LM Studio, llama.cpp, vLLM, or Hugging Face) are failing. It eliminates the guesswork of "why is my model slow?" or "why won't this load?" by running a bundled diagnostic script that probes connection states, scans logs for failure signatures, and detects stalled downloads.
Why use this skill
Prompting a generic AI about local hardware issues often leads to circular advice. This skill is better because it uses llm_doctor.py to act as a system sensor. It identifies specific technical blockers like GGUF version mismatches, CUDA OOM (Out of Memory) errors, port conflicts, runner crashes, and stalled Hugging Face blobs. It maps these findings to a curated playbook of OS-specific fixes for Apple Silicon, NVIDIA, and WSL2 environments.
Supported tools
- Inference Servers: Ollama, LM Studio, llama.cpp, vLLM
- Model Sources: Hugging Face (hub downloads), Ollama library
- Frameworks: GGUF, local runners, GPU-accelerated backends
What the output looks like
The skill provides a structured triage report including a connectivity verdict (up/down/stuck), identification of the specific bottleneck, and an ordered list of high-probability fixes—ranging from context window adjustments to environment variable corrections.
Use Cases
- Identify why a local inference server is unresponsive or timing out
- Debug CUDA Out-of-Memory (OOM) errors and GPU-to-CPU fallback issues
- Detect and resume stalled Hugging Face or Ollama model downloads
- Resolve port conflicts and runner crashes in LM Studio or llama.cpp
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/local-llm-troubleshooter | tar xz -C ~/.claude/skills/Free skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
Creator
LB designs and builds autonomous AI systems optimized for local deployment. Specializing in distributed inference fleets, multi-model orchestration, and agent-native tooling, everything runs on your hardware, zero API fees.
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
subagent-orchestrator (Develop based on the Claude Code sourcemap)
Turn your AI agent into a coordinator that manages parallel subagents for complex coding and research tasks.
software-architect
A structured framework for planning, reviewing, and evolving complex software systems with explicit trade-offs.
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
consumer-motivation-analyzer
Go beyond surface-level feedback to uncover the psychological drivers and hidden motivations behind buyer behavior.