We turned on LLM-assisted tier-1 triage in our SOC in late summer 2025. Six months is long enough to separate the real signal from the launch-day adrenaline. Here is the honest report.
What the thing actually does
When an alert fires in our SIEM, a pipeline sends it to Claude Sonnet along with enrichment data: asset owner, recent login patterns, threat intel context, and the text of related historical tickets. The model writes a structured triage note: likely category, suggested priority, one-line rationale, and a confidence score. The analyst sees this in their queue alongside the raw alert. They can accept, override, or escalate. The model never closes a ticket by itself.
What it handled well
Enrichment was the biggest unambiguous win. Analysts used to spend the first 90 seconds of every ticket pulling context from four different tools. The model now arrives with that context already distilled. Saved roughly two minutes per ticket. At our volume, that is roughly 40 analyst-hours a week back.
Noise reduction was the second. We were drowning in low-fidelity phishing reports from the user-reported mailbox. The model does a first-pass classification and clusters duplicates. About 55 percent of that queue now resolves without a human opening it, because ten copies of the same phish campaign get auto-merged and one analyst closes the cluster. False negatives on this path are audited weekly. So far we have seen two. Neither was a real attack.
MTTD dropped roughly 20 percent. MTTR for tier-1 categories dropped about 30 percent. Neither number moved on tier-2 or above, which is the point.
What it did not handle
Novel attacks, by definition, have no precedent in the training data or the ticket history. The model either flags them as unknown (good) or confidently misclassifies them as something benign (bad). We caught one of these the hard way. The model called a living-off-the-land technique "expected admin activity" with 70 percent confidence. A human would have called it the same way from the same data. That is not a defense. It is a reminder that the model is not smarter than the analyst, and pretending otherwise is the failure mode that ends careers.
Political calls are the other limit. When a senior exec's account is implicated, the question is not just "is this malicious" but "how do we handle this." No model handles this. No model should.
Analyst morale
This surprised me. I expected resistance. What we got instead, after the first month, was pull. Analysts asked for the tool to be extended to more alert types. The reason, when I asked, was not "AI is cool." It was "the boring parts got shorter and the interesting parts got longer." Retention data is too early to be statistically meaningful but the survey deltas are encouraging.
The caveat: two senior analysts did not like it and said so. Their complaint was that junior analysts now trust the model's confidence score too much and skip the sanity checks senior folks internalized over years. They are right. We added a training module on how to disagree with the model and the metric we track is override rate. If override rate on a given alert type drops to near zero, we worry, we do not celebrate.
Cost reality
Inference costs landed at around four thousand dollars a month at current volume. Engineering time to build and maintain the pipeline was about one FTE for the first quarter, now roughly 20 percent of one FTE ongoing. The value of 40 analyst-hours a week at fully loaded cost is an order of magnitude above the spend. That is the honest ROI. It is not "AI replaces the SOC." It is "AI pays for one more senior hire."
What I would tell another IT leader
Start narrow. Pick the most repetitive, highest-volume, lowest-stakes alert category you have. Instrument heavily before you turn it on so you can tell a real improvement from a vibe. Keep the human as the decision-maker. Audit override rates monthly. And budget for the maintenance, because the alerting landscape changes and your prompts and evals will need to change with it.