Links#
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_firelens.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html
https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchlogs.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html
https://docs.victoriametrics.com/victorialogs/
https://docs.victoriametrics.com/victorialogs/data-ingestion/
https://docs.victoriametrics.com/victorialogs/querying/
1. Important Points#
目标:
ECS application logs -> self-hosted VictoriaLogs
推荐优先级:
1. ECS FireLens / Fluent Bit -> VictoriaLogs
best for new ECS services
works with Fargate and EC2
no CloudWatch Logs ingestion cost for app logs
2. awslogs -> CloudWatch Logs subscription -> Lambda -> VictoriaLogs
best for existing services already using awslogs
keeps CloudWatch Logs as buffer/inspection layer
extra CloudWatch Logs + Lambda cost
3. CloudWatch Agent on ECS EC2 -> CloudWatch Logs -> Lambda -> VictoriaLogs
only for ECS EC2 host/file logs
not for Fargate
not ideal for normal container stdout
4. sidecar log agent with shared volume -> VictoriaLogs
useful when app writes files
more moving parts than FireLens
大局观:
FireLens:
container stdout/stderr -> log router sidecar -> VictoriaLogs
awslogs subscription:
container stdout/stderr -> CloudWatch Logs -> subscription Lambda -> VictoriaLogs
CloudWatch Agent:
EC2 host/file logs -> CloudWatch Logs -> subscription Lambda -> VictoriaLogs
关键选择:
如果你控制 ECS task definition:
use FireLens direct
如果已有大量 awslogs:
use subscription bridge first, migrate later
如果必须保留 CloudWatch Logs:
use awslogs + subscription
2. Architecture Options#
| Option |
Path |
Fargate |
ECS EC2 |
Best For |
Tradeoff |
| FireLens direct |
app -> Fluent Bit -> VictoriaLogs |
Yes |
Yes |
new services |
need log router sidecar |
| awslogs bridge |
app -> CloudWatch Logs -> Lambda -> VictoriaLogs |
Yes |
Yes |
migration / dual-write pattern |
extra cost and latency |
| CloudWatch Agent bridge |
file/host logs -> CloudWatch Logs -> Lambda -> VictoriaLogs |
No |
Yes |
host logs / legacy file logs |
not for Fargate |
| custom sidecar |
app file -> shared volume -> agent -> VictoriaLogs |
Yes |
Yes |
app writes files |
file lifecycle and backpressure complexity |
recommendation:
For ECS app stdout logs:
FireLens direct is the cleanest.
For audit/compliance requiring CloudWatch Logs copy:
awslogs bridge is simpler.
For EC2 instance system logs:
CloudWatch Agent bridge is acceptable.
3. VictoriaLogs Ingest Basics#
VictoriaLogs commonly accepts:
JSON line ingestion
Elasticsearch bulk-compatible ingestion
syslog / other ingestion paths depending deployment
example endpoint:
http://victorialogs.internal:9428/insert/jsonline
recommended labels / stream fields:
service
env
cluster
task_definition
container
verify VictoriaLogs#
curl -s http://victorialogs.internal:9428/health
insert one JSON line#
printf '{"_msg":"hello from ecs","service":"order-api","env":"dev","container":"app"}\n' \
| curl -sS \
-H 'content-type: application/stream+json' \
--data-binary @- \
'http://victorialogs.internal:9428/insert/jsonline?_stream_fields=service,env,container'
query#
curl -G 'http://victorialogs.internal:9428/select/logsql/query' \
--data-urlencode 'query=service:order-api'
4. Option A: ECS FireLens Direct#
when to use#
use when:
new ECS service
app logs to stdout/stderr
want direct delivery to VictoriaLogs
want to avoid CloudWatch Logs as the main ingestion path
works with:
ECS Fargate
ECS EC2
network#
ECS task must reach VictoriaLogs:
same VPC:
use private IP / internal NLB / Cloud Map DNS
other VPC:
VPC peering / Transit Gateway / PrivateLink
public endpoint:
use NAT Gateway / egress proxy
TLS and auth strongly recommended
task execution role policy#
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPullImageAndWriteRouterLogs",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
why CloudWatch Logs permissions still appear:
log_router container itself often uses awslogs for its own diagnostics
application logs can still go directly to VictoriaLogs through FireLens
least privilege:
ECR actions often need Resource="*"
CloudWatch Logs can be scoped to router log group when you create it upfront
task definition sample#
{
"family": "order-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecs-task-execution-role",
"taskRoleArn": "arn:aws:iam::123456789012:role/order-api-task-role",
"containerDefinitions": [
{
"name": "log_router",
"image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit"
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/order-api/firelens",
"awslogs-region": "ap-east-1",
"awslogs-stream-prefix": "firelens"
}
}
},
{
"name": "app",
"image": "123456789012.dkr.ecr.ap-east-1.amazonaws.com/order-api:2026-06-02",
"essential": true,
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"Name": "http",
"Host": "victorialogs.internal",
"Port": "9428",
"URI": "/insert/jsonline?_stream_fields=service,env,ecs_cluster,ecs_task_definition,container",
"Format": "json_stream",
"Header": "Content-Type application/stream+json"
}
},
"environment": [
{
"name": "SERVICE",
"value": "order-api"
},
{
"name": "ENV",
"value": "prod"
}
]
}
]
}
notes:
awsfirelens sends app stdout/stderr to Fluent Bit.
http output sends records to VictoriaLogs.
log_router diagnostics still go to CloudWatch Logs.
For TLS endpoint, use https/443 and configure CA/auth according to your Fluent Bit image support.
{
"level": "info",
"msg": "order created",
"service": "order-api",
"env": "prod",
"order_id": "ord_001",
"request_id": "req_001"
}
best practice:
log JSON to stdout
include service/env/version/request_id
avoid secrets / tokens / raw PII
keep high-cardinality fields out of stream fields
verify#
aws ecs describe-tasks \
--cluster prod-app \
--tasks arn:aws:ecs:ap-east-1:123456789012:task/prod-app/abc
curl -G 'http://victorialogs.internal:9428/select/logsql/query' \
--data-urlencode 'query=service:order-api env:prod'
common failures#
log_router exits:
check /ecs/order-api/firelens CloudWatch log group
check Fluent Bit output plugin option names
no logs in VictoriaLogs:
ECS task cannot reach victorialogs.internal:9428
security group / NACL / route table blocked
wrong URI or content-type
logs arrive but query is hard:
missing service/env/container fields
wrong stream fields
app logs are plain text instead of JSON
5. Option B: awslogs -> CloudWatch Logs -> Lambda -> VictoriaLogs#
when to use#
use when:
ECS service already uses awslogs
you want CloudWatch Logs as source of truth for a while
you want low-risk migration to VictoriaLogs
compliance requires logs in CloudWatch Logs
tradeoff:
extra CloudWatch Logs ingestion/storage cost
Lambda forwarding latency
retry/error handling belongs to Lambda
ECS awslogs config#
{
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/order-api",
"awslogs-region": "ap-east-1",
"awslogs-stream-prefix": "app"
}
}
}
Lambda execution role policy#
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowLambdaWriteOwnLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:ap-east-1:123456789012:log-group:/aws/lambda/cwlogs-to-victorialogs:*"
}
]
}
if Lambda runs in VPC:
add AWSLambdaVPCAccessExecutionRole permissions
configure subnets / security group to reach VictoriaLogs
allow CloudWatch Logs to invoke Lambda#
aws lambda add-permission \
--function-name cwlogs-to-victorialogs \
--statement-id AllowCloudWatchLogsInvoke \
--action lambda:InvokeFunction \
--principal logs.ap-east-1.amazonaws.com \
--source-arn arn:aws:logs:ap-east-1:123456789012:log-group:/ecs/order-api:*
create subscription filter#
aws logs put-subscription-filter \
--log-group-name /ecs/order-api \
--filter-name to-victorialogs \
--filter-pattern "" \
--destination-arn arn:aws:lambda:ap-east-1:123456789012:function:cwlogs-to-victorialogs
Lambda forwarder minimal code#
import base64
import gzip
import json
import os
import urllib.request
VICTORIALOGS_URL = os.environ["VICTORIALOGS_URL"]
def lambda_handler(event, context):
compressed = base64.b64decode(event["awslogs"]["data"])
payload = json.loads(gzip.decompress(compressed))
lines = []
for log_event in payload.get("logEvents", []):
item = {
"_msg": log_event.get("message", ""),
"timestamp": log_event.get("timestamp"),
"log_group": payload.get("logGroup"),
"log_stream": payload.get("logStream"),
"owner": payload.get("owner"),
"subscription_filters": ",".join(payload.get("subscriptionFilters", [])),
}
lines.append(json.dumps(item, separators=(",", ":")))
if not lines:
return {"records": 0}
body = ("\n".join(lines) + "\n").encode("utf-8")
req = urllib.request.Request(
VICTORIALOGS_URL,
data=body,
headers={"content-type": "application/stream+json"},
method="POST",
)
with urllib.request.urlopen(req, timeout=5) as resp:
resp.read()
return {"records": len(lines)}
Lambda environment#
VICTORIALOGS_URL=http://victorialogs.internal:9428/insert/jsonline?_stream_fields=log_group,log_stream
verify#
aws logs describe-subscription-filters \
--log-group-name /ecs/order-api
aws logs tail /aws/lambda/cwlogs-to-victorialogs --follow
curl -G 'http://victorialogs.internal:9428/select/logsql/query' \
--data-urlencode 'query=log_group:/ecs/order-api'
common failures#
subscription filter not invoking:
Lambda permission source ARN wrong
region mismatch
filter is attached to wrong log group
Lambda timeout:
VictoriaLogs network path blocked
endpoint DNS cannot resolve inside VPC
batch too large / timeout too low
duplicated logs:
Lambda retry after partial failure
design ingestion to tolerate duplicates
6. Option C: CloudWatch Agent On ECS EC2#
when to use#
use when:
ECS launch type is EC2
logs are written to host files
you need collect /var/log/messages or custom app files
do not use for:
Fargate
normal container stdout/stderr
replacing FireLens for new ECS app logs
CloudWatch Agent config#
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/ecs/ecs-agent.log",
"log_group_name": "/ecs/container-instance/ecs-agent",
"log_stream_name": "{instance_id}",
"timezone": "UTC"
},
{
"file_path": "/var/log/order-api/*.log",
"log_group_name": "/ecs/order-api/file",
"log_stream_name": "{instance_id}",
"timezone": "UTC"
}
]
}
}
}
}
then:
CloudWatch Agent -> CloudWatch Logs
CloudWatch Logs subscription -> Lambda -> VictoriaLogs
same subscription pattern as Option B.
EC2 instance role policy#
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCloudWatchAgentLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "arn:aws:logs:ap-east-1:123456789012:log-group:/ecs/*"
}
]
}
7. Option D: Custom Sidecar Agent#
when to use#
use when:
app writes logs to file
app and sidecar can share a volume
you need agent features not available through FireLens config
avoid when:
app already writes JSON to stdout
FireLens can solve the problem
task volume pattern#
{
"volumes": [
{
"name": "app-logs"
}
],
"containerDefinitions": [
{
"name": "app",
"mountPoints": [
{
"sourceVolume": "app-logs",
"containerPath": "/var/log/order-api"
}
]
},
{
"name": "log-agent",
"image": "fluent/fluent-bit:latest",
"mountPoints": [
{
"sourceVolume": "app-logs",
"containerPath": "/var/log/order-api",
"readOnly": true
}
]
}
]
}
sidecar risks:
file rotation must be correct
sidecar must not fall behind silently
shared volume lifetime is tied to task
multi-line logs need explicit parser
8. Best Practices#
log schema:
JSON logs
service
env
version
request_id / trace_id
level
msg
event_time
do not log:
password
access token
refresh token
raw authorization header
full credit card / identity document
stream fields:
good:
service
env
cluster
container
bad:
request_id
user_id
order_id
full_url
cost:
FireLens direct:
avoids CloudWatch Logs app ingestion cost
requires operating log router path
CloudWatch bridge:
easier migration
duplicates storage/ingestion path
higher AWS cost
reliability:
log delivery is usually at-least-once
tolerate duplicates
monitor backlog/errors
keep local app logging non-blocking
avoid application crash when logging backend is down
9. Monitoring#
ECS / FireLens:
log_router container health
log_router CloudWatch diagnostic logs
task stopped reason
CPU/memory of log_router
CloudWatch bridge:
Lambda Errors
Lambda Throttles
Lambda Duration
Lambda IteratorAge is not applicable here
subscription filter delivery errors
VictoriaLogs:
ingest request rate
ingest errors
disk usage
query latency
retention
alerts:
log_router exits
Lambda errors > 0
VictoriaLogs ingest errors > 0
no logs from service for N minutes
VictoriaLogs disk free low
10. Production Checklist#
architecture:
chosen path documented
FireLens direct preferred for new ECS app logs
CloudWatch bridge used only when CloudWatch retention/migration is needed
CloudWatch Agent used only for ECS EC2 host/file logs
security:
VictoriaLogs endpoint private or protected by TLS/auth
task execution role least privilege
Lambda role least privilege
sensitive fields redacted at app or agent
operations:
verify command documented
log_router diagnostics enabled
Lambda forwarder alarms enabled
VictoriaLogs ingest/disk alarms enabled
replay strategy exists for CloudWatch bridge
schema:
JSON log format standardized
service/env/container fields included
high-cardinality fields not used as stream fields