Operations¶
Operations documentation covers deployment, observability, security posture, and maintenance workflows for production environments.
Operational Pillars¶
| Pillar | Outcome | Example artifacts |
|---|---|---|
| Deployment | Predictable releases | compose profiles, env templates |
| Observability | Fast incident detection | health checks, logs, metrics |
| Security | Reduced attack surface | secrets rotation, RBAC, CORS policies |
| Reliability | Stable runtime behavior | backups, restore drills, dependency updates |
Production Topology (Reference)¶
graph LR
U[Operators] --> F[Frontend]
F --> B[Backend API]
B --> DB[(PostgreSQL)]
B --> A1[Agent: Site A]
B --> A2[Agent: Site B]
A1 --> D1[Device subnet A]
A2 --> D2[Device subnet B]
style B fill:#7c3aed,stroke:#5b21b6,color:#fff
style DB fill:#db2777,stroke:#9d174d,color:#fff
style A1 fill:#0284c7,stroke:#075985,color:#fff
style A2 fill:#0284c7,stroke:#075985,color:#fff
Runbook Baseline¶
- Check service health endpoints
- Review failed wake attempts
- Confirm active agents and heartbeat freshness
- Review cluster coverage for devices that require more than one relay path
- Rotate logs and verify retention
- Review security events and admin changes
- Validate backups and restore points
- Upgrade dependencies and base images
- Rotate secrets where feasible
- Rehearse incident response checklist
Docker Desktop limitation
Direct LAN wake broadcasts from containers on Windows/macOS are unreliable. Keep WOL execution on LAN-adjacent agents.