AWS Monitoring Pages#
从 aws-monitoring-alerting-design.md 拆分:
one infra / service per markdown file
every alert must have threshold, duration, query, and rule example
CloudWatch Alarm covers P0/P1 fallback
VictoriaMetrics / vmalert covers daily trend and service alerts
Common Pages#
| Page |
Scope |
| Alerting |
alert naming, severity, routing, standard payload |
| PrometheusAlerting |
Prometheus alert rules and Alertmanager routing |
| Vmalert |
VictoriaMetrics vmalert rules and notifier setup |
| GrafanaAlerting |
Grafana Alerting contact points, policies, provisioning |
Service Pages#
| Page |
Scope |
| ECSNodeJS |
ECS service and Node.js application metrics |
| ALB |
Application Load Balancer |
| AuroraPostgreSQL |
Aurora PostgreSQL / RDS metrics |
| ElastiCacheValkey |
Valkey / ElastiCache |
| SQS |
Queue backlog, age, DLQ |
| DynamoDB |
throttle, latency, capacity |
| EC2 |
monitoring host and VM resources |
| S3 |
request metrics, security events, growth |
| CloudFront |
CDN errors, latency, cache hit rate |
| SecretsManager |
rotation and access failure events |