ValkDB does not store query results. In production hash-only mode, raw SQL is not stored by default; audit logs store query hashes and decision metadata.
What ValkDB stores
| Data | Stored | Notes |
| Query hash (SHA-256) | Always | Irreversible hash of SQL text |
| Decision (allowed/blocked) | Always | What happened and why |
| Rows returned (count) | Always | Count only, not the data |
| Budget state | Always | Used/limit at time of query |
| Tables touched | Always | Table names accessed |
| Columns touched | Always | Column names via AST analysis |
| Agent/session/task ID | When provided | Identity metadata |
| Timestamp + latency | Always | When and how long |
What ValkDB does NOT store
| Data | Stored | Reason |
| Query results | Never | Results returned to agent and discarded |
| Raw SQL text | Not in hash_only | Only SHA-256 hash stored |
| WHERE clause values | Not in hash_only | Hashed, not stored in cleartext |
| Customer database content | Never | ValkDB proxies, does not persist |
Security architecture
- All queries parsed into AST (sqlparser-rs) — no regex-based filtering
- Sensitive columns detected through aliases, CTEs, subqueries
- DML/DDL blocked at AST level (only SELECT allowed)
- Dangerous functions blocked (pg_read_file, lo_import, etc.)
- Budget enforcement is deterministic — no AI in the security layer
- Rate limiting per API key (token bucket)
- Connection strings encrypted with AES-256-GCM
- Response sanitization removes PII patterns before returning to agent
What ValkDB can tell you after the fact
- Which agent queried (agent_id)
- Which session it was (session_id)
- What task it was performing (task_id)
- How many rows it consumed (cumulative)
- Which tables it touched
- Which columns it accessed
- How many queries it executed
- Whether it was blocked and why
- When each query happened
- How long each query took
Recommended usage
ValkDB should be used alongside scoped DB credentials, RLS, read replicas, and least-privilege access. It adds session budgets on top of your existing security stack — it does not replace it.
Current limitations
- Budget limits are global (env vars), not per-API-key configurable via API yet
- No SOC2 certification
- No formal DPA available yet
- No SLA guarantees during the controlled preview
- No webhook notifications yet
- No protocol-level Postgres wire proxy (queries go through HTTP API)