2

    local-llm-troubleshooter

    by LB Creations

    Diagnose and fix broken local LLM stacks, GPU issues, and stalled model downloads across Ollama, LM Studio, and more.

    Updated May 2026
    0 installs

    Free

    One-time purchase

    Included in download

    • Downloadable skill package
    • 1 permission declared
    • Instant install

    See it in action

    A real example of what this skill takes in and produces.

    Sample output

    Verdict: STUCK (Ollama) Cause: CUDA Out of Memory (OOM) detected in logs. Model: Llama-3-70b (Q4_K_M) Fix:

    1. Set OLLAMA_NUM_PARALLEL=1
    2. Reduce context window from 8k to 4k in Modelfile.
    3. If issue persists, switch to Q2 quantization. Verification: Re-run diagnostic to confirm GPU runner load.

    About This Skill

    What it does

    The Local LLM Troubleshooter is a diagnostic power-tool for developers and AI engineers whose local inference stacks (Ollama, LM Studio, llama.cpp, vLLM, or Hugging Face) are failing. It eliminates the guesswork of "why is my model slow?" or "why won't this load?" by running a bundled diagnostic script that probes connection states, scans logs for failure signatures, and detects stalled downloads.

    Why use this skill

    Prompting a generic AI about local hardware issues often leads to circular advice. This skill is better because it uses llm_doctor.py to act as a system sensor. It identifies specific technical blockers like GGUF version mismatches, CUDA OOM (Out of Memory) errors, port conflicts, runner crashes, and stalled Hugging Face blobs. It maps these findings to a curated playbook of OS-specific fixes for Apple Silicon, NVIDIA, and WSL2 environments.

    Supported tools

    • Inference Servers: Ollama, LM Studio, llama.cpp, vLLM
    • Model Sources: Hugging Face (hub downloads), Ollama library
    • Frameworks: GGUF, local runners, GPU-accelerated backends

    What the output looks like

    The skill provides a structured triage report including a connectivity verdict (up/down/stuck), identification of the specific bottleneck, and an ordered list of high-probability fixes—ranging from context window adjustments to environment variable corrections.

    Use Cases

    • Identify why a local inference server is unresponsive or timing out
    • Debug CUDA Out-of-Memory (OOM) errors and GPU-to-CPU fallback issues
    • Detect and resume stalled Hugging Face or Ollama model downloads
    • Resolve port conflicts and runner crashes in LM Studio or llama.cpp

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Environment Variables

    File Scopes

    local-llm-troubleshooter/**

    Creator

    LB designs and builds autonomous AI systems optimized for local deployment. Specializing in distributed inference fleets, multi-model orchestration, and agent-native tooling, everything runs on your hardware, zero API fees.

    Frequently Asked Questions

    More Premium Skills

    Free