This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
oss-health-metrics:metrics:contributors [2017/04/10 16:57] abuhman [7. Pseudo Code/Query] |
oss-health-metrics:metrics:contributors [2017/10/06 20:54] (current) GeorgLink fixed link |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Contributors ====== | ====== Contributors ====== | ||
- | ===== 1. Acronym ===== | ||
- | ===== 2. Alternate Names ===== | + | Page permanently moved to |
- | + | https://github.com/chaoss/metrics/blob/master/activity-metrics/contributors.md | |
- | ===== 3. Classification ===== | + | |
- | Community Health | + | |
- | + | ||
- | ===== 4. Description ===== | + | |
- | Number of contributors | + | |
- | + | ||
- | ==== Visualization ==== | + | |
- | + | ||
- | ==== Interpretation ==== | + | |
- | + | ||
- | ===== 5. Known Implementations ===== | + | |
- | + | ||
- | ===== 6. Data Source ===== | + | |
- | GHTorrent | + | |
- | + | ||
- | ===== 7. Pseudo Code/Query ===== | + | |
- | SQL | + | |
- | # Gets the count of types of contributions by user, only returning users who have interacted with a project in some way | + | |
- | + | ||
- | SET @owner = "rails"; | + | |
- | SET @repo = "rails"; | + | |
- | + | ||
- | USE msr14; # Test database | + | |
- | # USE ghtorrent; # Production database | + | |
- | + | ||
- | SET @proj = (SELECT projects.id FROM projects INNER JOIN users ON projects.owner_id = users.id WHERE projects.name = @repo AND users.login = @owner); | + | |
- | + | ||
- | SELECT * FROM | + | |
- | + | ||
- | ( | + | |
- | SELECT users.id as "user_id", | + | |
- | users.login as "login", | + | |
- | com.count as "commits", | + | |
- | pulls.count as "pull_requests", | + | |
- | iss.count as "issues", | + | |
- | comcoms.count as "commit_comments", | + | |
- | pullscoms.count as "pull_request_comments", | + | |
- | isscoms.count as "issue_comments" | + | |
- | + | ||
- | FROM users | + | |
- | + | ||
- | LEFT JOIN (SELECT committer_id AS id, COUNT(*) AS count FROM commits INNER JOIN project_commits ON project_commits.commit_id = commits.id WHERE project_commits.project_id = @proj GROUP BY commits.committer_id) AS com | + | |
- | ON com.id = users.id | + | |
- | + | ||
- | LEFT JOIN (SELECT pull_request_history.actor_id AS id, COUNT(*) AS count FROM pull_request_history JOIN pull_requests ON pull_requests.id = pull_request_history.pull_request_id WHERE pull_requests.base_repo_id = @proj AND pull_request_history.action = 'merged' GROUP BY pull_request_history.actor_id) AS pulls | + | |
- | ON pulls.id = users.id | + | |
- | + | ||
- | LEFT JOIN (SELECT reporter_id AS id, COUNT(*) AS count FROM issues WHERE issues.repo_id = @proj GROUP BY issues.reporter_id) AS iss | + | |
- | ON iss.id = users.id | + | |
- | + | ||
- | LEFT JOIN (SELECT commit_comments.user_id AS id, COUNT(*) AS count FROM commit_comments JOIN project_commits ON project_commits.commit_id = commit_comments.commit_id WHERE project_commits.project_id = @proj GROUP BY commit_comments.user_id) AS comcoms | + | |
- | ON comcoms.id = users.id | + | |
- | + | ||
- | LEFT JOIN (SELECT pull_request_comments.user_id AS id, COUNT(*) AS count FROM pull_request_comments JOIN pull_requests ON pull_request_comments.pull_request_id = pull_requests.id WHERE pull_requests.base_repo_id = @proj GROUP BY pull_request_comments.user_id) AS pullscoms | + | |
- | ON pullscoms.id = users.id | + | |
- | + | ||
- | LEFT JOIN (SELECT issue_comments.user_id AS id, COUNT(*) AS count FROM issue_comments JOIN issues ON issue_comments.issue_id = issues.id WHERE issues.repo_id = @proj GROUP BY issue_comments.user_id) AS isscoms | + | |
- | ON isscoms.id = users.id | + | |
- | + | ||
- | GROUP BY users.id | + | |
- | ORDER BY com.count DESC | + | |
- | ) user_activity | + | |
- | + | ||
- | WHERE commits IS NOT NULL | + | |
- | OR pull_requests IS NOT NULL | + | |
- | OR issues IS NOT NULL | + | |
- | OR commit_comments IS NOT NULL | + | |
- | OR pull_request_comments IS NOT NULL | + | |
- | OR issue_comments IS NOT NULL; | + | |
- | + | ||
- | ### Number of Contributors (committers) per Project: | + | |
- | + | ||
- | GitHub defines contributors as those who have made "Contributions to master, excluding merge commits" | + | |
- | + | ||
- | I do not see in the GHTorrent database schema a way to determine commits to master vs other branches, | + | |
- | Nor a way to differentiate merge commits from other commits. | + | |
- | + | ||
- | Because of this, for the following SQL query I will define contributors as "users who have made a commit" | + | |
- | + | ||
- | When viewing the table, one may notice that there is a separate author_id and committer_id for each commit | + | |
- | the commit author has written the code of the commit. | + | |
- | the commit committer has made the commit | + | |
- | example: author writes some code and does a pull request. committer approves/merges the pull request | + | |
- | + | ||
- | For the following SQL, I am considering the author to be the contributor. | + | |
- | + | ||
- | select projects.name as project_name, projects.url as url, count(distinct commits.author_id) as num_contributers | + | |
- | from commits | + | |
- | join project_commits on commits.id = project_commits.commit_id | + | |
- | join projects on projects.id = project_commits.project_id | + | |
- | group by project_commits.project_id | + | |
- | + | ||
- | + | ||
- | ===== 8. Data Exchange Format ===== | + | |
- | Describe how the metric is encoded for sharing and communicating between programs. | + | |
- | + | ||
- | ===== 9. References to Academic Literature ===== | + | |
- | Papers released in the academic literature that uses the metric. | + | |
- | + | ||
- | ===== 10. Internet References ===== | + | |
- | Links to websites that add value to the understanding of the metric. | + | |
- | + | ||
- | ===== 11. Contributors ===== | + | |
- | Authors of the metric page and authors who made significant changes. | + |