1. Important Points#

这个栏目只覆盖工作中会用到的英语口语:
    meetings
    daily standup
    technical communication
    code review
    interview
    coworker small talk for work context

不覆盖:
    非工作场景
    生活服务场景
    旅游场景
corpus target:
    each scene targets 1000 unique utterances
    utterances must come from existing public web sources or open datasets
    every utterance must keep source_url and license / source note

copyright rule:
    do not paste long transcript blocks
    do not bulk-copy copyrighted articles
    prefer open corpus / public transcript / short quote with attribution

2. Scene Index#

Scene Page Target
Meeting Meetings.md 1000 meeting utterances
Daily Standup DailyStandup.md 1000 standup utterances
Technical Communication TechnicalCommunication.md 1000 technical update utterances
Code Review CodeReview.md 1000 review comment utterances
Interview Interview.md 1000 interview utterances
Small Talk At Work SmallTalkAtWork.md 1000 workplace small talk utterances

3. Corpus Schema#

required fields:
    id
    utterance
    tag
    source_url
    source_license
    note

dedupe key:
    lowercase
    trim spaces
    remove repeated punctuation

4. Collection Checklist#

before adding utterances:
    source is public
    license / usage note is recorded
    sentence is short enough for learning
    sentence belongs to work context
    sentence is not already in the corpus