Before I bring any AI idea to the vendors — SIEM copilots, ticket summarizers, phishing triagers — I prototype it on a local model first. Two reasons. First, my prototype data is real SIEM output. I am not sending EDR telemetry to someone else's API while I am sketching. Second, if the idea does not work against a decent 8B model, it is not going to magically work against GPT-5. Bad prompts do not get saved by big models.

The Hardware

One dedicated box sitting on the lab VLAN:

  • Ryzen 7 7700 (8 cores, cheap)
  • 64 GB DDR5
  • RTX 4070 Ti Super with 16 GB VRAM — the sweet spot for models up to 13B at Q5_K_M
  • 2 TB NVMe because quantized models add up fast
  • Ubuntu 24.04 LTS, nothing fancy

Total around $1,800 in early 2026. A used 3090 with 24 GB is also excellent if you can find one under $800 — you get to run 34B models at usable speeds.

The Stack

Ollama is the boring right answer. Install:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama pull qwen2.5-coder:14b
ollama pull nomic-embed-text

Then a thin FastAPI wrapper and a Streamlit page for each experiment. Do not adopt LangChain on day one. You will spend more time debugging abstractions than writing prompts. A hundred lines of Python plus direct Ollama HTTP calls gets you further.

What Actually Works Locally

Prototype these without paying a cloud bill:

  • Alert summarization. Feed a Splunk search result as JSON, ask for a three-bullet incident brief. Llama 3.1 8B does this at about 85% of GPT-4 quality after one round of prompt tuning.
  • Phishing triage. Email headers plus body → structured verdict with reasoning. Great for building a gold-standard dataset you can later evaluate models against.
  • Rule translation. Sigma to Splunk SPL, KQL to Splunk, YARA authoring from a sample. Code models like Qwen2.5-Coder:14B shine here.
  • Log normalization. Unknown vendor log → CEF. Deterministic enough that an 8B model is fine.
  • Semantic search over your own runbooks. Embed with nomic-embed-text, store in SQLite with the sqlite-vec extension. No pgvector cluster required.

What Does Not Work Locally

Be honest about the ceiling. Things I stopped trying to prototype at home:

  • Scale. A local box at 40 tokens per second will not tell you if your idea survives 10,000 alerts per hour. That is a production question.
  • Long-context reasoning. 128K windows exist locally but degrade badly past 32K on 8B quantized models. If your workflow needs a full week of logs in context, test on a hosted model.
  • Tool-use reliability. Function calling on small models is flaky. Prototype the shape of the workflow; do not benchmark reliability here.
  • Multi-agent orchestration. Latency compounds. Two 8B hops feel like waiting for dial-up.

A Workflow I Use Weekly

"Does this prompt idea even deserve a meeting?" Pipeline:

  1. Pull 50 representative inputs from prod (scrubbed).
  2. Write a naive prompt. Run it on Llama 3.1 8B.
  3. Hand-label outputs as good/bad.
  4. If I get less than 60% on a small model, GPT-4-class will maybe get me to 80%. Not enough for a production case.
  5. If I get above 70% locally, the idea is worth a real eval on a hosted model.

That filter has killed more bad ideas than any strategy session. Three hours on a $1,800 box has saved me from three months of vendor POCs that were never going to succeed. Prototype locally. Ship with the best tool for the actual production load.