CodeReview


Goal#

target:
    1000 unique code review utterances

scope:
    question
    suggestion
    issue
    nit
    blocking feedback
    non-blocking feedback
    approval
    request changes

Source Plan#

Source Use
https://conventionalcomments.org/ review comment labels
https://google.github.io/eng-practices/review/ review principles and phrasing
https://crop-repo.github.io/ open code review dataset
https://www.augmentcode.com/guides/what-does-nit-mean-in-code-review nit language
https://www.codelantis.com/blog/code-review-acronyms-lgtm-nit review acronyms

Collection Rules#

do:
    collect short review comments
    tag severity and intent
    keep source_url and license
    remove code-specific private names when needed
    deduplicate LGTM / nit variants

do not:
    scrape random GitHub comments without checking license
    include long code snippets
    mix generated review comments into the corpus

Corpus Schema#

id utterance tag source_url source_license note