Most web stacks log everything by default — IPs, user agents, referer, request URL, cookies if you're sloppy — and then the ops team forgets about it until a breach forces a retention review. The data you don't have can't be stolen and can't be subpoenaed. For privacy-sensitive services, that's the whole argument. You can run a useful, debuggable, abuse-resilient service while collecting strictly less.

Turn Off Identifying Access Logs

Nginx default access logs are not privacy-preserving. A common pattern is to define a custom format that strips IP and identifying headers, and keep the default format only for a short-retention debug buffer if needed.

log_format privacy '- - - [$time_local] "$request" '
                   '$status $body_bytes_sent';

server {
    access_log /var/log/nginx/access.log privacy;
    # or for an onion service:
    access_log off;
    error_log /var/log/nginx/error.log crit;
}

For an onion service specifically, access logging is rarely useful — the source IP is always 127.0.0.1 anyway, which tells you nothing. Turn it off entirely and rely on metrics.

Aggregate Without Storing Identifiers

You almost always want counts, not events. "How many requests to /api/foo in the last hour" is answerable from a simple in-memory counter that writes one row per hour to disk. No IPs, no user-agents, no session IDs. Tools like Prometheus' counter and histogram primitives are literally built for this.

  • Replace log_line(event) with counter.inc(labels). Pick labels that are low-cardinality by construction: HTTP method, status code, route template — never raw path, never user ID.
  • Rotate per-request samples aggressively. If you need a sampled trace for debugging, keep it for minutes, not weeks.
  • When you do need per-user metrics (rare), hash the identifier with a daily-rotating HMAC key, so the mapping is broken every day.

Differential Privacy, Briefly

When you publish aggregate numbers — say, monthly usage stats — you can leak information about individuals through differencing attacks. The canonical example: reporting "active users in region X" for two consecutive months, where one user joined or left, reveals that user's membership. Differential privacy is the mathematical framework that quantifies this, by adding calibrated noise (typically Laplace or Gaussian) to outputs so that any single user's contribution is masked.

You probably don't need full DP machinery. You probably do need:

  • Minimum cohort sizes before reporting a count (say, suppress any cell with <10 entries).
  • Rounded numbers rather than exact counts.
  • Delayed reporting (publish monthly, not real-time) to reduce side-channel leaks.

Apple's, Google's, and the US Census Bureau's DP deployments are good reading if you go deeper.

Reverse-Proxy Hygiene and When Not Logging Hurts

If your app sits behind a reverse proxy or CDN, the proxy is often logging things you thought you disabled. Audit both tiers. Strip X-Forwarded-For before it hits application logs if you've decided not to retain IPs.

Honest tradeoffs:

  • Debugging gets harder. A useful compromise is short-retention detailed logs (hours to a day) plus long-retention aggregates (years). You can diagnose yesterday's incident without carrying forever's data.
  • Abuse response gets harder. If you truly log nothing, you cannot tell one user from another, which makes rate-limiting and banning nuanced. Ephemeral per-session tokens (refreshed often) give you session-level rate limiting without durable identity.
  • Compliance may demand logs. Some regulations (PCI, some financial) mandate retention of specific log fields. Know which of your data actually falls under those regimes and minimize the rest.

The goal isn't zero logs. The goal is logs that answer operational questions while being useless to an adversary who compels them from you. You can design for both.