1. Important Points#
这个栏目只覆盖工作中会用到的英语口语:
meetings
daily standup
technical communication
code review
interview
coworker small talk for work context
不覆盖:
非工作场景
生活服务场景
旅游场景
corpus target:
each scene targets 1000 unique utterances
utterances must come from existing public web sources or open datasets
every utterance must keep source_url and license / source note
copyright rule:
do not paste long transcript blocks
do not bulk-copy copyrighted articles
prefer open corpus / public transcript / short quote with attribution
2. Scene Index#
| Scene |
Page |
Target |
| Meeting |
Meetings.md |
1000 meeting utterances |
| Daily Standup |
DailyStandup.md |
1000 standup utterances |
| Technical Communication |
TechnicalCommunication.md |
1000 technical update utterances |
| Code Review |
CodeReview.md |
1000 review comment utterances |
| Interview |
Interview.md |
1000 interview utterances |
| Small Talk At Work |
SmallTalkAtWork.md |
1000 workplace small talk utterances |
3. Corpus Schema#
required fields:
id
utterance
tag
source_url
source_license
note
dedupe key:
lowercase
trim spaces
remove repeated punctuation
4. Collection Checklist#
before adding utterances:
source is public
license / usage note is recorded
sentence is short enough for learning
sentence belongs to work context
sentence is not already in the corpus