Goal#
target:
1000 unique code review utterances
scope:
question
suggestion
issue
nit
blocking feedback
non-blocking feedback
approval
request changes
Source Plan#
Collection Rules#
do:
collect short review comments
tag severity and intent
keep source_url and license
remove code-specific private names when needed
deduplicate LGTM / nit variants
do not:
scrape random GitHub comments without checking license
include long code snippets
mix generated review comments into the corpus
Corpus Schema#
| id |
utterance |
tag |
source_url |
source_license |
note |