Collect ideas for GSoC student projects on Improving Kernel Workflows here.
In previous work on MAINTAINERS and process conformance, Pia Eichinger [1] has investigated: are patches integrated by the maintainers defined by the responsibilities in MAINTAINERS?
In this project, we are interested in a related (possibly simpler) question: Are the commits integrated into the appropriate integration trees referenced in MAINTAINERS?
The mentor believes a main difference between considering maintainers and integration trees is that the information in MAINTAINERS about integration trees is more erroneous, as it is not used as prominently as the personal maintainer information, name and email, with the wide-spread use of ./scripts/get_maintainer.pl. So, correcting those errors on integration trees in MAINTAINERS is more dominant (but also simpler) compared to correcting errors on personal maintainer information in MAINTAINERS.
The answer on the question above can then ultimately be used to identify which integration tree entries should be added to specific sections in MAINTAINERS to match best against the actual integration observed in git.
The factors and metric to determine what is best is of course the challenging task of identifying a suitable heuristics that is:
Background:
The MAINTAINERS section includes references, through the T: entries, to the location of a source configuration management (SCM) tree with its type, e.g., git, quilt, hg, For each commit, the kernel git history carries the commit's integration tree path, i.e., the information through with source configuration management (SCM) trees a commit was integrated until it was finally integrated into Linus Torvalds' tree.
Ideally the references in the MAINTAINERS sections are:
Goal:
We identify and measure to these properties above, completeness, soundness and precision.
Then, we use that information to determine which integration tree entries should be added to which specific sections to maximally increase the three properties.
To evaluate the adequacy of this method, we can obtain feedback from the responsible kernel maintainers through proposing patches modifying the MAINTAINERS file, for the additions that we identified as most relevant (maximally increasing the properties, to a reasonable threshold of number of patch proposals [to not swamp maintainers initially] and a threshold on relevance [to not send out minor changes that are largely irrelevant to the community]).
In this project, we can make use of:
git:/ /git.lwn.net/gitdm.git
: gitdm includes some scripts to parse MAINTAINERS and obtain the integration tree patch of a commit.and/or
Potential project phases:
Mentor contact: Lukas Bulwahn; lukas.bulwahn-at-gmail.com
References:
Many Linux kernel developers and maintainers use both Patchwork for patch tracking as well as Gmail or G Suite as their email provider. Patchwork is a working solution for storing patch review state, but some developers may prefer to view and modify patch state from their chosen mailer.
The Clk tree (https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/) does this today by syncing patchwork patch state to notmuch tags with a local cron job, and then using a popular tool (https://github.com/gauteh/lieer) to sync notmuch tags with Gmail Labels. This three-part solution allows the Clk team to stay synchronized on patch state without ever leaving the comfort of their chosen MUA. The downsides of this solution is that the cron job is ugly and is run locally on machines with questionable uptime.
A proposal for GSoC is to build a better mechanism to sync Patchwork patch status with Gmail Labels. This would grow the potential userbase beyond the current set of Patchwork+Gmail+Notmuch user into the larger set of Patchwork+Gmail users. Such a solution might be cloud-based, using tools such as Google Apps Script and the Patchwork REST API, as examples.
Dependencies: Gmail/G Suite-based email and Patchwork for patch state tracking Submitted by: Michael Turquette mturquette@baylibre.com (I'm happy to provide pointers to the current scripts used by the Clk team, hosted on github)