This is an old revision of the document!

Health Indicators

We roughly categorize health indicators in three categories: code health, community health, and compliance health.

Comments

Disclaimer: We list and describe health indicators. By no means are we evaluating them for suitability. Open source communities have a flurry of different stakeholders and projects with each having s different interpretation of the indicators. For different situations, the indicators will carry different meanings.

Caution: The occurrence count is a rough estimate for how often we encounter the indicator. This is not an exact science.

Keep in mind, many projects do not use the GitHub issue tracker.

Keep in mind, different GitHub projects use pull requests to a greater and lesser degree.

Issue/3: Many of the indicators are also informative when tracked over time.

Interviewees often clarify, that contributions are not merely code commits, but also include documentation, issues, and community management.

We should agree on a template for the metrics.

Community Health

Community health contains indicators descriptive of community interactions and behavior.

Name	Source	Description	Related Code/Queries	Occurrence
Contributor Diversity	Statistic	Ratio of contributors from a single company over all contributors Also described as: Maintainers from different companies. Diversity of contributor affiliation. This is mentioned frequently	Contributor Diversity Queries	Interviews: 3
Issue Response Rate	Statistic	Time between a new issue is opened and a maintainer responds Also called: bug response rate. The maintainer is believed to not “pile on” but try to solve an issue. This is mentioned frequently	Issue Response Rate Queries	Interviews: 3
Community Activity	Indicator	Contribution Frequency (Contribution = commit, issue, comment, …)		Interviews: 3 Issue/1
Contributor Breadth	Statistic	Ratio of non-core committers (drive-by committers) Can indicate openess to outsiders	Commits from non-core committers	Interviews: 2
Contribution Diversity		Ratio of code committed by contributors other than original project initiator Contributions are going up beyond the core team		Interviews: 1
Contribution Acceptance		Ratio of contributions accepted vs. closed without acceptance	Pull Request Acceptance Rate	Issue/1
Bus Factor		see: community/truckFactor.md The number of developers it would need to lose to destroy its progress. Alternatively: Number of companies that would have to stop support.		Issue/1 Literature
Contributors		Number of contributors	Contributors per Project	Interviews: 2 Issue/1
Contributor Activity		Activity level of individual contributors		Issue/1
Relative Activity		I sum up the activities (GH issues+comments, GH pull requests+comments and GH commits) for the project members and for the non-project members, then I create a ratio of the two. Compare the activity between committers-as-a-group and contributors-as-a-group. It easily shows when a project is not yet popular, or when a project is not paying attention to its users. I also feel that a balance between the two groups is essential; ie) a project with a lot more contributor than committer activity is one that is failing to 'recruit' committers quickly enough.		Mailing list
Distribution of Work		How much recent activity is distributed?		Issue/1
Contribution Age		Time since last contribution Gives a sense of how active the community is. (Contribution = commit, issue, comment, …)		Interviews: 1
Forks		Number of forks	Forks Query	Interviews: 2
Stars		Number of stars		Interviews: 2
Watchers		Number of watchers	Watchers Query	Interviews: 2
Issues Open		Number of open issues	Open Issues	Issue/1
Issues submitted/closed		Issues submitted vs. issues closed Example	Issues Submitted vs Closed	Interviews: 2
Issue Comments		Number of Comments per Issue	Issue Comments	Issue/3
Time to Contributor		Time to becoming a contributor		Interviews: 1 Issue/1
Path to Leadership		A communicated path from lurker to contributor to maintainer. (or. track members: time from user to maintainer/leader) Rational: If active contributors are not included in leadership decisions they might lose interest and leave. (Focus on least likely contributor)		Interviews: 2 LFOSLS
Blogposts		Number of blogposts that mention the project		LFOSLS
YouTube Videos		Number of Youtube videos that mention or specifically deal with the project (e.g. tutorials)		LFOSLS
Job Postings		Number of job postings that mention the project as a preferred or required skill		LFOSLS
Downloads		Number of downloads ! beware: downloads might be skewed by builders Used as measure for 'success' (Grewal, Lilien, & Mallapragada, 2006)		LFOSLS (Grewal, Lilien, & Mallapragada, 2006)
Reopened issues		Rate of issues closed but discussion continues or issues that were closed and re-opened		LFOSLS
Release Velocity		Time between releases Regular releases are a reliability metric		LFOSLS
Release Maturity		Ratio of major and minor releases		LFOSLS
Decision Distribution		Central vs. distributed decision making Governance model, scalability of community		LFOSLS
Transparency		Number of comments per issue Discussion is occuring openly - could also indicate level of agreement		LFOSLS
Roadmap		Existence and quality of roadmap Best Practice: community engagement and scalability (might not be automatically computable)
Gatherings		Number of face-to-face/in-person meetings per year Resets contentious issues; Resolve tensions; Avoid longstanding grudges		LFOSLS
Role Definitions		Existence and quality of role definitions Governance related. Relates to “Path do Leadership”		LFOSLS
Rewards		Rewards, shout-outs, recognition, and mentions in pull-requests or change logs - might improve contribution levels		LFOSLS
Retrospectives		Existence of after release meetings Collect lessons learned, improve processes, recognize contributors		LFOSLS
Onion Layers		Distance between onion model layers (users, contributors, committers, and steering committee) Rule of thumb: factor of 10x between layers. (Node.js keynote)		LFOSLS
Release Note Completeness		Number of functionality changes and bug fixes represented in release notes vs. release. Good for users, also shows diligence of community		LFOSLS
Unity		Rivalry or unity of community (sentiment analysis?)		LFOSLS
Use of Acronym		Frequency of acronyms used Specialized language can be a barrier for new contributors.		LFOSLS
Language Bias		Diversity metric: Bias against gender, ethnicity, … in use of language (maybe use sentiment analysis)		LFOSLS
Commit Bias		Diversity metric: acceptance rate (and time to acceptance) differences per gender, ethnicity, etc…		LFOSLS
Stack Overflow		Several metrics: # of questions asked, response rate, number of responding people that have verified solutions		LFOSLS
Non-Source Contributions		Track contributions like running tests in test environment, writing blog posts, producing videos, giving talks, etc…		LFOSLS
Maturity Label		Community assigned label Some communities label projects as incubator, mature, (or something)		LFOSLS
User Groups		user groups perform a variety of crucial marketing, service support, and business-development functions at the grassroots level		(Bagozzi & Dholakia, 2006)
Age of Community		Time since repository/organization was registered; or Time since first release. “Results showed that the age of the project played a marginally significant role in attracting active users, but not developers. We attribute this differential effect of age on users and developers to the fact that age may be seen as an indicator of application maturity by users, and hence taken as a positive signal, whereas it may convey more ambiguous signals to developers.” (Chengalur-Smith et al., 2010, p.674)		(Chengalur-Smith, Sidorova, & Daniel, 2010; Grewal, Lilien, & Mallapragada, 2006)

Code Health

Code health contains indicators descriptive of a code base and its quality.

Name	Description	Related Code/Queries	Occurrence
Pull Request made/closed	Pull requests made vs. pull requests closed Example Encompasses number of pull requests rejected (Issue/1)	Pull Requests Made vs Closed	Interviews: 3
Pull Requests Open	Number of open pull requests Might be more telling than total pull requests	Pull Requests Open	Interviews: 1 Issue/1
Pull Request Comments	Number of comments per pull request	Pull Request Comments	Interviews: 1
Pull Request Discussion Diversity	Number of different people discussing each pull request	Pull Discussion Diversity	Interviews: 1
Update Rate	Number of updates over period x		Issue/1
Update Regularity	How consistently and frequently are updates provided.		Interviews: 1 Issue/1
Update Age	Time since last update		Interviews: 1 Issue/1
Repository Size	Overall size of the repository or number of commits	Total Commits	Issue/1
Size of Code Base	Lines of code		Mailing list
Bugs after Release	Number of bugs reported after a release		LFOSLS
Code Modularity	Modular code allows parallel development, which Linus Torvalds drove for Linux		Linus Torvalds at LFOSLS (Baldwin & Clark, 2006)

Compliance (Risk) Health

Compliance health contains indicators informative of vulnerabilities and license obligations.

Name	Description	Occurrence
Test Coverage		Interviews: 1
Bug Age	Age of known bugs in issue tracker Use label for determining bugs?	Issue/1
Known Vulnerabilities	Number of reported vulnerabilities Could be limited to issue-tracker or extended vulnerability databases (e.g. CVE)	Interviews: 1 Issue/1
Dependency Depth	Number of projects included in code base + number of projects relying on focal project (recursive) Indicator about centrality in open source Dependency network	Interviews: 1
License Declared	What license does the project declare	Issue/1
License Conflict	Does the project contain incompatible licenses
All Licenses	List of licenses
License Count	Number of licenses
License Coverage	Number of files with a file notice (copyright notice + license notice)

Reasons why community health is assessed

This includes reasons why metrics are considered for other reasons This section collects notes on what possible goals might be.

Track Corporate Engagement (is an organization creating value, are organizational goals met, employee contributions)
Risk mitigation
Identify open source projects that need support.
Identify single points of failure (and hopefully prevent them)
Assess value generated through community and engagement
Show that active community management bears desired results. (Measurable outcomes)
Avoid in-take of an inactive project, because it makes it difficult to maintain and might carry unknown bugs and security issues.
Sustainability: “we define a sustainable project as one that exhibits software development and maintenance activity over the long run.” (Chengalur-Smith, Sidorova, & Daniel, 2010, p.660)

Broad categories of indicators that we hear often

Timeliness of maintainers
Diversity of community, contributions, and in code base
Distribution of code contributions (beyond project creator)
Activity level - Responsiveness
Viability (Bus Factor - individual contributors and clustered by employer)
Maturity
Ecosystem health (upstream, downstream, and related projects)
Vanity metrics (might have use in other cases, e.g. stars)
Aggregate project-tree health (combined health metrics of all linked dependencies)
Attentiveness of maintainers to users. See Mailing list

Context: Considerations when evaluating health

Style of project
Programming language
Maturity of project (Projects might seem inactive but rather have fulfilled their goal and community remains responsive to bug reports and security issues, just no new features)
Quality of Ecosystem (metrics of related projects)
Value driven metrics (not just activity)
Development of metrics over time
External users might not be a homogenous group - consider different metrics
Compare similar projects (manually determine which projects to compare)
Classifications (based on a set of metrics, which projects 'behave' similar)
Interrelationships between categories of indicators (maturity might be high while activity low and response rate is up)
Aggregate from repository, to project, to community, (to company)

Other classifications for indicators

We have heard other classifications that we simply list here.

Ideas for these classifications is to 1. generate a uniform classification and through conversations merge the different classifications. 2. create mappings of the indicators to the different classifications

Community/Code/Risk
Activity/Viability/Risk

References

Bagozzi, R. P., & Dholakia, U. M. (2006). Open Source Software User Communities: A Study of Participation in Linux User Groups. Management Science, 52(7), 1099–1115. Retrieved from http://www.jstor.org/stable/20110583
Baldwin, C. Y., & Clark, K. B. (2006). The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model? Management Science, 52(7), 1116–1127. Retrieved from http://www.jstor.org/stable/20110584
Chengalur-Smith, I., Sidorova, A., & Daniel, S. (2010). Sustainability of Free/Libre Open Source Projects: A Longitudinal Study. Journal of the Association for Information Systems, 11(11). Retrieved from http://aisel.aisnet.org/jais/vol11/iss11/5
Grewal, R., Lilien, G. L., & Mallapragada, G. (2006). Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems. Management Science, 52(7), 1043–1056. Retrieved from http://www.jstor.org/stable/20110579

Wiki

Table of Contents

Health Indicators

Comments

Community Health

Code Health

Compliance (Risk) Health

Reasons why community health is assessed

Broad categories of indicators that we hear often

Context: Considerations when evaluating health

Other classifications for indicators

References

Wiki

User Tools

Site Tools

Table of Contents

Health Indicators

Comments

Community Health

Code Health

Compliance (Risk) Health

Reasons why community health is assessed

Broad categories of indicators that we hear often

Context: Considerations when evaluating health

Other classifications for indicators

References

Page Tools