# vault_audit.py — HashiCorp Vault Comprehensive Audit Script

A Python 3 script that performs a full audit of a HashiCorp Vault cluster:
secrets inventory across all namespaces, user/entity activity, and last-access
timestamps sourced from the Vault audit log.

Works with **HCP Vault** (managed), **Vault Enterprise** (on-prem), and
**Vault OSS** (namespaces disabled on OSS).

---

## Features

| Area | What it collects |
|---|---|
| **Namespaces** | Full recursive namespace tree via `sys/namespaces` |
| **Secrets — KV v1** | All secret paths (metadata not available in v1) |
| **Secrets — KV v2** | All secret paths + `created_time`, `updated_time`, version history, custom metadata |
| **Secrets — AWS** | All configured roles (credential_type, role ARNs, policy ARNs) |
| **Secrets — Terraform Cloud** | All configured roles (org, team_id, token_account_type) |
| **Secrets — KMIP** | All scopes → roles (enabled operations, key type/bits) |
| **Last access** | Exact timestamp, entity name, originating IP — **requires audit log** |
| **Users/Entities** | Full identity store scan: aliases, policies, groups, auth method |
| **Last login** | Exact timestamp, auth path, IP — **requires audit log** |
| **Last activity** | Any API call by the entity (not just logins) — **requires audit log** |
| **Token proxy** | Without audit log: token `creation_time` / `last_renewal_time` as a proxy |
| **Auth method users** | Discovers users configured in userpass, ldap, github, approle, cert, oidc, jwt, k8s, etc. |

### Outputs

- **Console** — coloured tables (requires `rich`) or plain text fallback
- **JSON** — full structured report: `vault_audit_YYYYMMDD_HHMMSS.json`
- **Secrets CSV** — one row per secret: `vault_secrets_YYYYMMDD_HHMMSS.csv`
- **Entities CSV** — one row per user/entity: `vault_entities_YYYYMMDD_HHMMSS.csv`

---

## Requirements

```bash
pip install requests urllib3        # required
pip install rich                    # optional — coloured console output
```

Python 3.8+. No other dependencies.

---

## Setup

### Environment variables

```bash
export VAULT_ADDR="https://your-cluster.hashicorp.cloud:8200"
export VAULT_TOKEN="hvs.your-audit-token"

# On-prem with internal CA:
export VAULT_CACERT="/etc/ssl/certs/internal-ca.pem"
```

### Token requirements

**If you have an audit log** (recommended): run with `--no-token-scan`.
The policy then requires only pure `read`/`list` — **no `sudo`, no `update`**.

**Without an audit log**: the token scan (`auth/token/accessors`) acts as a
last-activity proxy, but that endpoint requires `sudo`. Avoid this by enabling
audit logging instead.

| Path | Capability | Why |
|---|---|---|
| `sys/namespaces`, `sys/namespaces/*` | `list` | Namespace discovery |
| `sys/mounts`, `sys/auth` | `read` | Engine/auth mount enumeration |
| `+/metadata`, `+/metadata/*` | `read`, `list` | KV v2 secret metadata (no secret values read) |
| `+/*` | `list` | KV v1 path listing |
| `+/roles`, `+/roles/*` | `list`, `read` | AWS roles |
| `+/role`, `+/role/*` | `list`, `read` | Terraform / approle / oidc / etc. roles |
| `+/scope`, `+/scope/*/role`, `+/scope/*/role/*` | `list`, `read` | KMIP scopes and roles |
| `identity/entity/id`, `identity/entity/id/*` | `list`, `read` | User/entity collection |
| `identity/group/id`, `identity/group/id/*` | `list`, `read` | Group membership |
| `auth/+/users`, `auth/+/users/*` | `list`, `read` | userpass / ldap users |
| `auth/+/groups`, `auth/+/groups/*` | `list`, `read` | ldap groups |
| `auth/+/role`, `auth/+/role/*` | `list`, `read` | approle / oidc / jwt / k8s roles |
| `auth/+/roles`, `auth/+/roles/*` | `list`, `read` | aws / gcp / azure auth roles |
| `auth/+/certs`, `auth/+/certs/*` | `list`, `read` | cert auth roles |
| `auth/+/map/users`, `auth/+/map/users/*` | `list`, `read` | github auth users |
| ~~`auth/token/accessors`~~ | ~~`sudo`, `list`~~ | Token scan only — **skip with `--no-token-scan`** |
| ~~`auth/token/lookup-accessor`~~ | ~~`update`~~ | Token scan only — **skip with `--no-token-scan`** |

---

## Minimal read-only Vault policy

Save as `vault-audit-readonly.hcl` and apply:

```bash
# For HCP Vault or namespace-rooted clusters:
vault policy write -namespace=admin vault-audit-readonly vault-audit-readonly.hcl
vault token create -namespace=admin \
  -policy=vault-audit-readonly \
  -ttl=1h \
  -display-name="vault-audit-run"
```

```hcl
# vault-audit-readonly.hcl
# Pure read/list — no sudo, no write access.
# Use with: python3 vault_audit.py --no-token-scan --audit-log /path/to/audit.log

# ── Namespace discovery ───────────────────────────────────────────────────────
path "sys/namespaces" {
  capabilities = ["list"]
}
path "sys/namespaces/*" {
  capabilities = ["list"]
}

# ── Mount enumeration ─────────────────────────────────────────────────────────
path "sys/mounts" {
  capabilities = ["read"]
}
path "sys/auth" {
  capabilities = ["read"]
}

# ── KV v2 (metadata only — secret values are never read) ──────────────────────
path "+/metadata" {
  capabilities = ["list"]
}
path "+/metadata/*" {
  capabilities = ["read", "list"]
}

# ── KV v1 (path listing only — secret values are never read) ──────────────────
path "+/*" {
  capabilities = ["list"]
}

# ── AWS secrets engine ────────────────────────────────────────────────────────
path "+/roles" {
  capabilities = ["list"]
}
path "+/roles/*" {
  capabilities = ["read"]
}

# ── Terraform Cloud / approle / oidc / jwt / k8s / aws-auth / azure / gcp ────
path "+/role" {
  capabilities = ["list"]
}
path "+/role/*" {
  capabilities = ["read"]
}

# ── KMIP secrets engine ───────────────────────────────────────────────────────
path "+/scope" {
  capabilities = ["list"]
}
path "+/scope/*/role" {
  capabilities = ["list"]
}
path "+/scope/*/role/*" {
  capabilities = ["read"]
}

# ── Identity store ────────────────────────────────────────────────────────────
path "identity/entity/id" {
  capabilities = ["list"]
}
path "identity/entity/id/*" {
  capabilities = ["read"]
}
path "identity/group/id" {
  capabilities = ["list"]
}
path "identity/group/id/*" {
  capabilities = ["read"]
}

# ── Auth method user/role discovery ───────────────────────────────────────────
path "auth/+/users" {
  capabilities = ["list"]
}
path "auth/+/users/*" {
  capabilities = ["read"]
}
path "auth/+/groups" {
  capabilities = ["list"]
}
path "auth/+/groups/*" {
  capabilities = ["read"]
}
path "auth/+/map/users" {
  capabilities = ["list"]
}
path "auth/+/map/users/*" {
  capabilities = ["read"]
}
path "auth/+/role" {
  capabilities = ["list"]
}
path "auth/+/role/*" {
  capabilities = ["read"]
}
path "auth/+/roles" {
  capabilities = ["list"]
}
path "auth/+/roles/*" {
  capabilities = ["read"]
}
path "auth/+/certs" {
  capabilities = ["list"]
}
path "auth/+/certs/*" {
  capabilities = ["read"]
}

# ── Token scan (only needed WITHOUT --no-token-scan) ──────────────────────────
# If you have an audit log, use --no-token-scan and omit these two blocks.
# auth/token/accessors requires the "sudo" capability — avoid if possible.
#
# path "auth/token/accessors" {
#   capabilities = ["sudo", "list"]
# }
# path "auth/token/lookup-accessor" {
#   capabilities = ["update"]   # POST endpoint, but it's a read operation
# }
```

> **`+` vs `*` in path globs**
> `+` matches exactly one path segment (e.g. the mount name).
> `*` matches the rest of the path including slashes.
> Using `+` is more precise and avoids overly broad grants.

---

## Usage

### Quick start

```bash
# HCP Vault — root namespace is auto-detected, starts in "admin"
# With audit log (recommended — enables exact timestamps, no sudo needed)
python3 vault_audit.py --no-token-scan --audit-log /path/to/audit.log

# HCP Vault — interactive: script will prompt for the audit log path
python3 vault_audit.py --no-token-scan

# On-prem Enterprise
python3 vault_audit.py --no-token-scan --audit-log /var/log/vault/audit.log

# Scan a specific namespace subtree only
python3 vault_audit.py --no-token-scan --namespace team-a/prod

# Save nothing to disk (console only)
python3 vault_audit.py --no-token-scan --no-save

# Verbose/debug logging
python3 vault_audit.py --no-token-scan -v
```

### All options

```
Connection:
  --addr URL          Vault address (default: $VAULT_ADDR)
  --token TOKEN       Vault token (default: $VAULT_TOKEN)
  --no-tls-verify     Disable TLS certificate verification
  --ca-cert PATH      CA bundle for on-prem TLS ($VAULT_CACERT)
  --timeout SEC       Per-request timeout, seconds (default: 15)
  --namespace NS      Start scan from this namespace (default: root)

Scope:
  --no-secrets               Skip secret collection
  --no-users                 Skip user/entity collection
  --no-token-scan            Skip token accessor scan (use this when you have an audit log)
  --no-auth-method-scan      Skip auth method user discovery
  --max-accessors N          Max token accessors per namespace (default: 2000)
  --max-depth N              Max KV directory recursion depth (default: 12)

Audit Log:
  --audit-log PATH    Path to Vault audit JSONL log (enables timestamps)

Output:
  --output-dir DIR    Output directory (default: ./vault_audit_output)
  --no-save           Console only, no files
  --no-color          Disable rich/colour output

Performance:
  --workers N         Thread pool size (default: 10)
  --rate-limit RPS    Max API requests/second (default: 50)

Debug:
  -v, --verbose       Debug logging
```

---

## About audit logs

> **"Last accessed" and "last login" are not stored natively in Vault.**
> They only become available by parsing the Vault audit log.

### Enabling audit logging

```bash
# File-based audit device
vault audit enable file file_path=/var/log/vault/audit.log

# Verify
vault audit list
```

**HCP Vault**: Portal → your cluster → **Observability** → **Audit Logging** → enable, then
stream/export to an S3 bucket, Datadog, Splunk, etc. and download the JSONL file.

### Audit log format

Vault writes one JSON object per line (JSONL). Relevant fields the script uses:

```jsonc
{
  "type": "response",                     // "request" entries are skipped (avoids double-counting)
  "time": "2025-03-01T14:23:11.123456Z",
  "auth": {
    "entity_id": "abc-123",               // NOT hashed — used to correlate with identity store
    "display_name": "alice",
    "token_type": "service",
    "policies": ["default", "kv-read"]
  },
  "request": {
    "operation": "read",
    "path": "secret/data/myapp/db-creds",
    "namespace": { "id": "...", "path": "team-a/" },
    "remote_address": "10.0.1.5"         // NOT hashed — real IP (or proxy IP if behind LB)
  },
  "response": {
    "auth": { ... }                       // present only for login events
  }
}
```

**Note on IPs behind a load balancer**: if your Vault is behind a proxy/LB,
`remote_address` will show the proxy IP. Enable `X-Forwarded-For` forwarding with:

```bash
vault write sys/config/auditing/request-headers/X-Forwarded-For insensitive=true
```

The script automatically extracts the real client IP from `X-Forwarded-For` when present.

---

## Architecture

```
main()
 ├─ Phase 1 — Namespace discovery      sys/namespaces (recursive)
 ├─ (interactive prompt for audit log if TTY and --audit-log not given)
 ├─ Phase 2 — Audit log parsing        JSONL → AuditIndex in-memory map
 ├─ Phase 3 — Secret collection        ThreadPoolExecutor over all mounts
 │   ├─ KV v1  list_kv1_recursive()
 │   ├─ KV v2  list_kv2_recursive() + kv2_get_meta()
 │   ├─ AWS    LIST {mount}/roles → GET {mount}/roles/{role}
 │   ├─ TF     LIST {mount}/role  → GET {mount}/role/{role}
 │   └─ KMIP   LIST {mount}/scope → LIST scope/{s}/role → GET scope/{s}/role/{r}
 ├─ Phase 3b — Audit enrichment        enrich_secrets() matches paths in AuditIndex
 ├─ Phase 4 — Entity collection        LIST + GET identity/entity/id/*
 ├─ Phase 5 — Token accessor scan      LIST auth/token/accessors → lookup-accessor
 │                                     (skipped with --no-token-scan)
 ├─ Phase 5b — Auth method user scan   collect_auth_method_users()
 ├─ Phase 5c — Audit enrichment        enrich_entities() (login + activity events)
 └─ Phase 6 — Output
     ├─ Console (rich or plain)
     ├─ vault_audit_<ts>.json
     ├─ vault_secrets_<ts>.csv
     └─ vault_entities_<ts>.csv
```

### Key classes

| Class | Purpose |
|---|---|
| `VaultClient` | HTTP wrapper with rate-limiting, retry, thread-local sessions, namespace header |
| `TokenBucket` | Thread-safe token-bucket rate limiter |
| `AuditIndex` | Parses JSONL audit log into three in-memory dicts: `secret_access`, `entity_login`, `entity_activity` |
| `SecretRecord` | One per discovered secret/role — engine type, metadata, last-access from audit log |
| `EntityRecord` | One per user/entity — aliases, policies, last login, last activity from audit log |
| `TokenProxyRecord` | One per token accessor — `last_renewal_time` used as activity proxy **without** audit log |
| `VaultAuditReport` | Top-level container passed to all output functions |

---

## Example outputs

### Console — startup log

```
10:42:01  INFO     Connecting to: https://myvault.example.com:8200  (TLS verify=True)
10:42:01  INFO     Vault version: 1.17.3+ent, cluster: vault-cluster-prod
10:42:01  INFO     Phase 1/6: Discovering namespaces from root ...
10:42:02  INFO     Found 13 namespace(s) (plus root)

  ┌─ Audit log ──────────────────────────────────────────────────────┐
  │ Provide the path to your Vault audit log (JSON/JSONL format).    │
  │ This enables last-access and last-login timestamps for all       │
  │ secrets and users.                                                │
  │                                                                   │
  │ HCP Vault: portal → cluster → Audit → enable + download logs     │
  │ On-prem  : check your audit device path (vault audit list)        │
  │ Leave empty to continue without last-access data.                 │
  └───────────────────────────────────────────────────────────────────┘
  Audit log path > /var/log/vault/audit.log

10:42:05  INFO     Phase 2/6: Parsing audit log: /var/log/vault/audit.log
10:42:07  INFO     Audit log: 142,831 entries parsed, 48 secret paths, 312 entity logins, 1,205 entity activity records
10:42:07  INFO     Phase 3/6: Collecting secrets from all engine mounts ...
10:42:07  INFO     Found 9 secret engine mount(s) to scan across 14 namespace(s) (types: kv, aws, terraform, kmip)
10:42:09  INFO     Collected 34 secret(s)
10:42:09  INFO     Phase 4/6: Collecting users/entities from identity store ...
10:42:10  INFO     Identity store: 28 entity/user record(s)
10:42:11  INFO     Phase 5/6: Token scan — skipped (--no-token-scan)
10:42:11  INFO     Phase 5b/6: Scanning auth method mounts for configured users ...
10:42:12  INFO     Auth method scan complete. 5 additional principal(s) discovered.
10:42:12  INFO     Phase 6/6: Generating output ...
10:42:12  INFO     Done. 194 API calls in 11.1s. Secrets: 34. Entities: 33.
```

---

### Console — namespace tree

```
══════════════════════════════════════════════════════════════════════════════════════
  HCP VAULT AUDIT REPORT
  Generated : 2025-03-01T10:42:12Z
  Cluster   : https://myvault.example.com:8200  (version 1.17.3+ent, vault-cluster-prod)
  Namespaces: 13  |  Secrets: 34  |  Entities: 33
  Audit log : 142,831 entries from /var/log/vault/audit.log
══════════════════════════════════════════════════════════════════════════════════════

──────────────────────────────────────────────────────────────────────────────────────
  NAMESPACE TREE
──────────────────────────────────────────────────────────────────────────────────────
  (root)
  └─ admin/
    └─ team-a/
      └─ team-a/prod/
      └─ team-a/staging/
    └─ team-b/
    └─ platform/
      └─ platform/aws/
      └─ platform/kmip/
```

---

### Console — secrets table

```
──────────────────────────────────────────────────────────────────────────────────────
  SECRETS  (34 total)
──────────────────────────────────────────────────────────────────────────────────────

  Namespace: admin/team-a/prod/  (11 secrets)
  MOUNT                  PATH                                     ENGINE   CREATED              UPDATED              VER  LAST READ            BY                        IP
  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  secret                 myapp/db-creds                           kv_v2    2024-08-10 09:12:44  2025-01-15 16:33:01   3   2025-02-28 14:23:11  alice (alice@ldap)         10.0.1.42
  secret                 myapp/api-keys                           kv_v2    2024-08-10 09:13:22  2024-11-20 10:01:55   1   2025-02-20 08:55:30  svc-backend (approle)     10.0.2.100
  secret                 shared/tls-cert                          kv_v2    2024-06-01 00:00:00  2025-01-01 00:00:00   5   never                -                         -
  legacy                 old-config/settings                      kv_v1    -                    -                         2025-01-10 11:20:00  bob (bob@ldap)             10.0.1.55
  aws                    prod-role-readonly                        aws      -                    -                         2025-02-25 09:00:12  svc-infra (approle)        10.0.2.101
  aws                    prod-role-admin                           aws      -                    -                         2025-02-01 17:44:03  charlie (charlie@ldap)     10.0.1.88
  terraform              tfc-workspace-deploy                      terraform -                   -                         2025-02-19 14:05:55  svc-ci (approle)           10.0.2.105

  Namespace: admin/platform/kmip/  (4 secrets)
  MOUNT                  PATH                                     ENGINE   ...
  kmip                   acme-corp/db-encrypt                     kmip     ...   2025-02-27 22:10:03  svc-db (token)            10.0.3.10
  kmip                   acme-corp/backup-keys                    kmip     ...   never                -                         -
```

---

### Console — user summary table

```
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  USER ACTIVITY SUMMARY  (33 total)
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  NAME                         NAMESPACE              AUTH TYPE      LAST LOGIN           LAST ACTIVITY        IP (login)         STATUS    LOGINS
  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  alice                        admin/                 ldap           2025-03-01 08:44:02  2025-03-01 14:23:11  10.0.1.42          active    142
  svc-backend                  admin/team-a/prod/     approle        2025-02-28 01:00:00  2025-03-01 12:10:55  10.0.2.100         active    8,312
  svc-infra                    admin/team-a/prod/     approle        2025-02-25 09:00:08  2025-02-25 09:00:12  10.0.2.101         active    44
  bob                          admin/                 ldap           2025-01-10 11:18:44  2025-01-10 11:20:00  10.0.1.55          active    23
  charlie                      admin/                 ldap           2025-02-01 17:43:50  2025-02-01 17:44:03  10.0.1.88          active    11
  svc-ci                       admin/platform/        approle        2025-02-19 14:05:50  2025-02-19 14:05:55  10.0.2.105         active    189
  old-service-account          admin/                 approle        2024-06-30 23:59:00  2024-06-30 23:59:00  10.0.2.200         DISABLED  1,044
```

---

### Secrets CSV columns

```
namespace_path, mount_path, secret_path, engine_type, kv_version,
created_time, updated_time, current_version, oldest_version, max_versions,
last_accessed_time, last_accessed_by_entity_id, last_accessed_by_display_name,
last_accessed_from_ip, last_accessed_operation, access_count,
metadata_error, custom_metadata, engine_data
```

### Entities CSV columns

```
namespace_path, entity_id, name, disabled, policies, groups,
creation_time, last_update_time,
aliases_summary,
token_accessor, token_display_name, token_auth_path,
token_creation_time, token_expire_time, token_last_renewal_time,
last_login_time, last_login_from_ip, last_login_auth_method,
last_login_auth_path, last_login_namespace, login_count,
last_activity_time, last_activity_ip, last_activity_operation,
last_activity_path, last_activity_mount_type, activity_count,
last_seen_time, auth_method_extra
```

---

## Limitations & notes

| Limitation | Detail |
|---|---|
| **Vault OSS** | No namespace support — `sys/namespaces` returns 403; script warns and continues |
| **HCP Vault** | Audit logs are not queryable via API — export from portal first |
| **HCP Vault root** | All real resources live under `admin/` — auto-detected from `.hashicorp.cloud` in URL |
| **Audit log required for timestamps** | Without it, only token `creation_time` / `last_renewal_time` is available as a proxy |
| **HMAC** | Secret values and some metadata fields are HMAC-hashed in audit logs (by design); `entity_id`, `request.path`, and `remote_address` are NOT hashed |
| **Proxy IPs** | If Vault is behind a load balancer, `remote_address` shows the proxy IP unless `X-Forwarded-For` is configured on the audit device |
| **AWS STS roles** | Access via `{mount}/sts/{role}` is also captured (in addition to `/creds/`) |
| **KMIP** | The script lists roles (configs); actual KMIP protocol operations are logged differently |
| **Rate limiting** | Default 50 RPS; tune with `--rate-limit` to avoid overwhelming Vault |

---

## Security recommendations

1. **Use a dedicated audit token** — apply the minimal policy above; do not use `root`.
2. **Always use `--no-token-scan`** when you have an audit log — it removes the only `sudo` requirement from the policy.
3. **Never paste tokens in chat or logs** — use environment variables only.
4. **Rotate the token immediately** after the audit run (or use short-TTL tokens: `-ttl=1h`).
5. **Protect the output files** — they contain your full secret inventory. Store them in a restricted location and delete when done.
6. **Keep audit logging enabled** — without it, there is no reliable record of who accessed what and when.

---

## License

MIT. Use at your own risk. Not an official HashiCorp product.