https://developer.hashicorp.com/terraform/intro
https://developer.hashicorp.com/terraform/language
https://developer.hashicorp.com/terraform/cli/commands
https://developer.hashicorp.com/terraform/language/backend/s3
https://registry.terraform.io/providers/hashicorp/aws/latest/docs

1. One Sentence#

Terraform 用代码描述基础设施目标状态,通过 plan 预览变化,通过 apply 执行变化,并用 state 记录真实资源和代码的映射关系。

2. Mental Model#

Concept 一句话
provider 调哪个平台的 API,例如 AWS / Kubernetes / GitHub
resource 要创建和管理的资源,例如 VPC / ECS / RDS
data source 只读取已有资源,不创建资源
variable 输入参数
local 中间变量,减少重复表达式
output 暴露给人或其他 Terraform 使用的结果
module 可复用的一组 Terraform 代码
state Terraform 管理资源的事实数据库
plan 即将发生的变更预览
apply 执行 plan
developer writes .tf
    -> terraform plan
    -> review changes
    -> terraform apply
    -> provider calls cloud API
    -> state records resource mapping

3. Core Workflow#

# Format all Terraform files.
terraform fmt -recursive

# Download providers/modules and initialize backend.
terraform init

# Check syntax and internal consistency.
terraform validate

# Preview changes before touching real infrastructure.
terraform plan -var-file=envs/dev.tfvars

# Save a reviewed plan file.
terraform plan -var-file=envs/dev.tfvars -out=tfplan

# Apply exactly the reviewed plan.
terraform apply tfplan

# Show current outputs.
terraform output

# Show current state in human-readable form.
terraform show

Do not teach developers to run blind apply:

# Avoid this in shared environments because the plan is not reviewed.
terraform apply -auto-approve

4. Project Structure#

infra-live
├── backend.tf
├── provider.tf
├── variables.tf
├── locals.tf
├── main.tf
├── outputs.tf
├── envs
│   ├── dev.tfvars
│   ├── uat.tfvars
│   └── prod.tfvars
└── modules
    └── s3-static-site
        ├── main.tf
        ├── variables.tf
        └── outputs.tf

Rules:

backend.tf:
    only remote state config

provider.tf:
    provider versions and provider config

variables.tf:
    typed input contract

locals.tf:
    naming, tags, derived values

main.tf:
    resources and module calls

outputs.tf:
    important IDs, ARNs, DNS names

envs/*.tfvars:
    environment values, not logic

5. Minimal AWS Example#

provider.tf#

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = local.common_tags
  }
}

backend.tf#

terraform {
  backend "s3" {
    bucket       = "company-terraform-state-prod"
    key          = "network/dev/terraform.tfstate"
    region       = "ap-east-1"
    encrypt      = true
    use_lockfile = true
  }
}

Backend rules:

enable S3 bucket versioning
enable state locking
do not commit local terraform.tfstate
do not put access_key / secret_key in backend config
one state file should map to one ownership boundary

AWS S3 backend naming convention and backend config are covered in Terraform AWS.

variables.tf#

variable "env" {
  type        = string
  description = "Environment name: dev, uat, prod"
}

variable "aws_region" {
  type        = string
  description = "AWS region"
}

variable "vpc_cidr" {
  type        = string
  description = "VPC CIDR"
}

locals.tf#

locals {
  name_prefix = "order-${var.env}"

  common_tags = {
    Project     = "order"
    Environment = var.env
    ManagedBy   = "terraform"
  }
}

main.tf#

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${local.name_prefix}-vpc"
  }
}

outputs.tf#

output "vpc_id" {
  value = aws_vpc.main.id
}

output "vpc_cidr" {
  value = aws_vpc.main.cidr_block
}

envs/dev.tfvars#

env        = "dev"
aws_region = "ap-east-1"
vpc_cidr   = "10.10.0.0/16"

6. HCL Basics#

resource#

resource "aws_s3_bucket" "logs" {
  bucket = "${local.name_prefix}-logs"
}
resource address:
    aws_s3_bucket.logs

format:
    resource "<type>" "<local_name>"

data source#

data "aws_caller_identity" "current" {}

output "account_id" {
  value = data.aws_caller_identity.current.account_id
}
data source 只读取,不创建。

for_each#

variable "buckets" {
  type = set(string)
}

resource "aws_s3_bucket" "this" {
  for_each = var.buckets

  bucket = "${local.name_prefix}-${each.key}"
}
prefer for_each over count when each item has a stable name
count index changes can cause accidental replacement

Bad example:

variable "subnet_names" {
  type    = list(string)
  default = ["public-a", "public-b", "private-a"]
}

resource "aws_subnet" "this" {
  count = length(var.subnet_names)

  vpc_id     = aws_vpc.main.id
  cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)

  tags = {
    Name = var.subnet_names[count.index]
  }
}
problem:
    if "public-b" is removed
    private-a moves from index 2 to index 1
    Terraform may plan the wrong update / replacement

Good example:

variable "subnets" {
  type = map(object({
    cidr = string
    az   = string
  }))

  default = {
    public-a = {
      cidr = "10.0.0.0/24"
      az   = "ap-east-1a"
    }
    public-b = {
      cidr = "10.0.1.0/24"
      az   = "ap-east-1b"
    }
    private-a = {
      cidr = "10.0.10.0/24"
      az   = "ap-east-1a"
    }
  }
}

resource "aws_subnet" "this" {
  for_each = var.subnets

  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.value.az

  tags = {
    Name = each.key
  }
}
result:
    resource address is aws_subnet.this["private-a"]
    removing public-b does not shift private-a
    plan is easier to review

depends_on#

resource "aws_cloudwatch_log_group" "app" {
  name = "/aws/ecs/${local.name_prefix}"
}

resource "aws_ecs_service" "app" {
  name = local.name_prefix

  depends_on = [
    aws_cloudwatch_log_group.app
  ]
}
Terraform 通常能自动推导依赖。
只有隐式依赖无法表达时才用 depends_on。

lifecycle#

resource "aws_db_instance" "main" {
  identifier = "${local.name_prefix}-db"

  lifecycle {
    prevent_destroy = true
  }
}
production database / critical bucket 可以加 prevent_destroy
不要滥用 ignore_changes 掩盖 drift

7. Module Pattern#

Module 是可复用封装,不是复制粘贴目录。

module implementation#

modules/s3-bucket
├── main.tf
├── variables.tf
└── outputs.tf
# modules/s3-bucket/variables.tf
variable "name" {
  type = string
}
# modules/s3-bucket/main.tf
resource "aws_s3_bucket" "this" {
  bucket = var.name
}
# modules/s3-bucket/outputs.tf
output "bucket_name" {
  value = aws_s3_bucket.this.bucket
}

module call#

module "app_logs" {
  source = "./modules/s3-bucket"

  name = "${local.name_prefix}-app-logs"
}

Module rules:

module exposes variables and outputs
module should hide resource details, not hide important decisions
root module owns environment values
shared module owns implementation pattern

8. State#

State 是 Terraform 最重要的文件。

state stores:
    resource address
    cloud resource ID
    attributes used for diff
    dependency mapping

Never:

do not edit state manually
do not commit terraform.tfstate
do not share local state by chat
do not run apply from two machines at the same time

Useful commands:

# List resources tracked by state.
terraform state list

# Show one resource from state.
terraform state show aws_vpc.main

# Move resource address after refactor.
terraform state mv aws_s3_bucket.old aws_s3_bucket.new

# Remove resource from state without deleting real infrastructure.
terraform state rm aws_s3_bucket.legacy

9. Import Existing Resource#

Use import when resource already exists and Terraform should start managing it.

import {
  to = aws_s3_bucket.logs
  id = "company-prod-logs"
}

resource "aws_s3_bucket" "logs" {
  bucket = "company-prod-logs"
}
# Generate and review plan for imported resource.
terraform plan

# Apply import into state.
terraform apply

Rules:

write resource block first
import into matching address
run plan until there is no unexpected replacement
do not import production resources blindly

10. Team Workflow#

developer:
    change .tf
    terraform fmt -recursive
    terraform validate
    terraform plan
    open PR with plan summary

reviewer:
    check created / updated / destroyed resources
    check IAM permissions
    check public exposure
    check state boundary
    check naming and tags

pipeline:
    init
    fmt -check
    validate
    plan
    manual approval
    apply reviewed plan

CI example:

# Run in CI before PR merge.
terraform init -backend=false
terraform fmt -check -recursive
terraform validate

# Run in deployment pipeline.
terraform init
terraform plan -var-file=envs/prod.tfvars -out=tfplan
terraform apply tfplan

11. What Developers Must Learn#

Topic Must know
plan read create/update/delete/replacement
resource address know aws_vpc.main and module.network.aws_vpc.main
variables use typed variables, not hardcoded env values
state state is not cache; state is source mapping
modules call modules; do not copy module internals
drift manual console changes create drift
secrets do not put secrets in .tfvars committed to Git
destroy never approve destroy without understanding blast radius

12. Common Mistakes#

Mistake Correct way
one huge state for everything split by ownership / blast radius
local state for team work remote backend with locking
apply without reviewing plan save and apply reviewed plan
commit .terraform/ commit .terraform.lock.hcl, ignore .terraform/
use count for named resources use for_each
hardcode env in resources use tfvars and locals
fix drift in console fix code, then apply
use ignore_changes everywhere use only for fields intentionally owned elsewhere