Main GSoC Linux Foundation page: How to apply, deadlines, other workgroups, ...
To use Linux in high-integrity regulated environment, such as safety-critical systems, security systems or systems subject to other regulatory norms, it requires to show evidences that Linux has a high software quality. High software quality is roughly assessed by two classes of measurements:
The Google Summer of Code Projects are activities that contribute to those two fields of work.
Primary mentoring contacts: Lukas Bulwahn, lukas.bulwahn at gmail.com; Julia Lawall, julia.lawall at lip6.fr; Nicholas Mc Guire, der.herr at hofr.at; Ralf Ramsauer, ralf.ramsauer at oth-regensburg.de
The SIL2LinuxMP project is an collaborative research project to provide procedures and methods to qualify Linux on a multi-core embedded platform at safety integrity level 2 (SIL2) according to IEC 61508 Ed 2.
Some more information on the SIL2LinuxMP project is available here:
http://www.osadl.org/SIL2LinuxMP.sil2-linux-project.0.html
https://www.youtube.com/watch?v=1eyJ6dAqMmg
The SIL2LinuxMP project has ended at the end of 2018 and the activities around Linux in safety-critical systems will be continued in a new organisational structure in 2019.
The Linux kernel can be compiled with clang recently. This allows to employ clang tools to the Linux kernel source. The clang thread safety analyser is a tool to detect when data is accessed without being locked before, when suitable annotations are provided. The GSoC student project proposal should describe how you would use the clang thread safety Analyser to detect missing locks for on widely used and central kernel data structures in the Linux kernel. The GSoC student project proposal should sketch how this would implemented, it would be best to provide an example and foreseen challenges and explain which the first investigations and design decisions need to be done and provide a first working assumption on those design decisions.
Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com
The fuzzer for system calls, syzkaller, is used to detect kernel crashes and issues in the kernel's internal state, but with its current setup, it is not able to detect if a system call provides wrong output, i.e., output that does not follow the POSIX specification. The Linux Test Project has some basic tests that some of the POSIX specification. The GSoC student project proposal should describe how you would implement a fuzzer to fuzz system calls against the POSIX specifications, possibly making use of the aforementioned resources. The GSoC student project proposal should sketch how this would implemented, it would be best to provide an example and foreseen challenges and explain which the first investigations and design decisions need to be done and provide a first working assumption on those design decisions.
Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com
The PaStA tool, developed by Ralf Ramsauer, relates patch emails sent on the mailing lists to commits in the git repository. This allows to analyse the development process on the mailing list, measure some interesting metrics on the development, and identify outliers with respect to some central properties.
The issue tracker, https://github.com/lfd/PaStA/issues, provides a good overview of possible student activities in a GSoC project related to PaStA. A student start look into those issues and determine a suitable selection of tasks and goals from that issue tracker that can be handled within the timeframe of the GSoC project. A project proposal should breakdown the goals stated in the issue tracker to a more detailed plan of activities, needed extensions of PaStA and the implementation tasks.
Main contact person for this project proposal: Ralf Ramsauer, ralf.ramsauer at oth-regensburg.de
Coccinelle, developed by Julia Lawall and her group, is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. The GSoC project proposal would be to improve the management of complex types and attributes (const, __user, etc) in the program matching and transformation tool Coccinelle. Currently, complex types are represented incorrectly, which makes it impossible to match against them in a general way. Various kinds of attributes are increasingly used in Linux kernel code, and it is important to be able to check these annotations and add them where they are missing. This project is relevant to Linux, Zephyr and many more applications.
Main contact person for this project proposal: Julia Lawall, julia.lawall at lip6.fr
We use a number of tools, checkpatch.pl, coccinelle scripts, sparse, etc. and these tools report certain findings. While the valid ones are addressed by the kernel developers, the invalid tool findings are manually assessed and not acted upon. Over time with addressing the valid findings, the proportion of invalid findings increase compared to newly appearing valid findings, as invalid findings of those tools are not marked and tracked over the various versions.
In this GSoC project, the student should work out methods and tools to track the tool findings and make these tools useful in the Linux kernel community.
Required Knowledge:
Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com
Recently, some efforts by Google's kernel developers have made it possible to compile large parts of the Linux kernel with the clang/LLVM compiler. The clang compiler is known to include a number of static analysis methods that identify typical bug patterns and bad code smells, which the compiler indicates to the user by compiler warnings. However, when one compiles the current Linux kernel with clang and switches on to warn on certain warning classes, the compiler shows a tremendous large number of warnings.
The final goal is to find a proper setup that reduces the number of warnings with clang compiler with a suitable post-processing and filtering, such that Linux kernel developers can practically improve their submitted patches during the development with the reported findings of potential bugs, bug patterns and code smells.
A first analysis of specific cases of the clang warnings showed that many of the found warnings fall into a comparatively small set of classes that follow a very specific and characteristic syntactic pattern and hence, many warnings can be assessed once.
As a first step, one needs to determine the relevance of the type of clang warnings for the Linux kernel and its contribution to the avoidance of certain typical bug patterns. If the clang warning type in general considered relevant, the next step is to identify the typical set of subclasses of this warning. Then, we can write the syntactic patterns in coccinelle scripts to automatically group all warnings of the identified subclasses. As a last step, the subclasses can then be assessed to determine if they are generally false reports (so, an irrelevant subclass within the relevant class of warnings), or if this subclass points to code where a bug can be uncovered with a comparatively high probability. Once the different classes have been identified and assessed, further warnings can be automatically marked and these warnings can be presented to the developers weighted by the confidence and relevance of the warnings.
This way, the developers should have useful static analysis report for their patches and the kernel community can gradually improve the different parts of the Linux kernel during the continuous evolution of the overall source code. If time within the project permits, this approach can further be applied to finding of other static analysis tools, e.g., clang tidy or sparse.
Required Knowledge:
Benefit: The student project provides more evidence that kernel development follows a defined process and discussion on identified patterns shows that a controlled and informed process is followed to decide which code patterns should be generally followed and which are enforced by the different tools and their setup.
To work on this project, you should show that you are able to use all the needed tools, especially coccinelle, and that you can properly analyse Linux kernel source code and write a precise and well-written report on your finding. Details for what is required for a successful application is provided on request.
Goal of this project is to develop a tool that describes the conditions when a source file is included in the build of the Linux kernel with a symbolic term, i.e., a boolean formula over the kernel configuration flags and further kernel build variables.
For example, the `kernel/Makefile` includes the following build instructions:
obj-$(CONFIG_FUTEX) += futex.o ifeq ($(CONFIG_COMPAT),y) obj-$(CONFIG_FUTEX) += futex_compat.o endif
Here, the file `kernel/futex.c` is included in the build if and only if CONFIG_FUTEX is set. Similarly, the `kernel/futex_compat.c` is include if and only if CONFIG_FUTEX and CONFIG_COMPAT is set.
The developed tool should be able to analyse this and provide for this example an output that would look like this:
kernel/futex.c <- CONFIG_FUTEX kernel/futex_compat.c <- CONFIG_COMPAT && CONFIG_FUTEX
To achieve this in perfection, it would require a Herculean task: As the kernel build system is make, which is already quite complex by itself and can further invoke arbitrary shell scripts, it would require to write the complete make build system and shell evaluation as symbolic evaluator. This would just much too complex and laborious considering the value of the information that shall be collected.
However, we are convinced that the Makefiles in the kernel tree have only a rather low number of relevant patterns that are relevant to obtain the information that we are interested in. Hence, the symbolic expression can be quite simply be determined by writing a rather simple symbolic evaluator that understands the most prominent patterns and ignores the parts in the Makefile that are not relevant for the symbolic expression.
This symbolic evaluator provides a first good approximation of that information to answer and address further questions. With an increasing number of patterns covered, the approximation of the tool becomes increasingly better. At any point of the development, the quality of the symbolic evaluation can be evaluated to its ground truth, defined by comparing the tool's symbolic prediction against actual kernel builds with a large number of concrete build configurations.
Furthermore, these boolean expressions can be combined with the constraints of the kconfig tool to find potential contradictions in the overall kernel build configuration setup.
Once, this tool works properly on the granularity, we could consider to extend that even further on precise information for each line within a file. Again, we would follow a similarly pragmatic approach parsing a small set of patterns to determine if certain lines are only included under certain build configuration conditions.
The result of this symbolic evaluation is useful in multiple ways:
Required Knowledge:
To work on this project, you should show that you are able to use all the needed tools, and that you can implement a simple interpreter, e.g., you can write a parser, interpreter and static analyser for a small toy programming language. Details for what is required for a successful application is provided on request.
Task: Run Facebook Infer on the Linux kernel source code, write models in Facebook Infer to improve the analysis.
This is a quite challenging project; in the application, we expect that you have proved that you can run Facebook Infer on the complete Linux kernel source code and you obtained a first analysis result.
If you cannot run Facebook Infer on the complete Linux kernel source, you should prove that you understand why Facebook Infer fails on certain parts, suggest different alternative work arounds and solutions, and already applied the work arounds, so that you can run Facebook Infer on the kernel source code with a fully known, understood and limited number of work arounds.
The project proposal should include first technical steps that show how you write models in Infer.
Required Knowledge:
We use a number of tools, checkpatch.pl, coccinelle scripts, sparse, etc. and these tools report certain findings. While the valid ones are addressed by the kernel developers, the invalid tool findings are manually assessed and not acted upon. Over time with addressing the valid findings, the proportion of invalid findings increase compared to newly appearing valid findings, as invalid findings of those tools are not marked and tracked over the various versions.
In this GSoC project, the student should work out methods and tools to track the tool findings and make these tools useful in the Linux kernel community.
Required Knowledge:
The Linux kernel community has a number of tools to ensure the quality of the continuous kernel development. Among these tools are coccinelle, sparse, checkpatch.pl, lock dependency validator, KASAN, syzkaller and many more.
In the GSoC project, the student should find suitable ways to make the Linux developers aware of the tools' findings. There are various way in which this could be implemented, e.g.:
This project idea is quite wide and we expect the student to provide a more specific description of the task to tackle with some evidence that he/she will be able to implement the proposal.
Required Knowledge:
Goal of this project is to create and gather detailed information on bugs in the Linux kernel.To obtain this information, the student will have to analyse commits of the Linux kernel stabilisation process in a large number by manual assessment.
The assessment should allow us to answer these kind of questions:
This project will require to create an ontology of bugs; this will require some work reading general literature and quite some effort assessing and understanding the bug fix commits. You will not mainly program new features, but you will learn a lot about the kernel code by looking and assessing bug fixes and trying to derive general statements from these observations.
Required Knowledge: