GSoC Projects around Enabling Linux in Safety Applications (ELISA)
- - Some background on Enabling Linux in Safety Applications
Student Project Proposals 2019
Student Project Proposals 2018

GSoC Projects around Enabling Linux in Safety Applications (ELISA)

Main GSoC Linux Foundation page: How to apply, deadlines, other workgroups, ...

To use Linux in high-integrity regulated environment, such as safety-critical systems, security systems or systems subject to other regulatory norms, it requires to show evidences that Linux has a high software quality. High software quality is roughly assessed by two classes of measurements:

Observation, Measurement and Assessment of the Software Development Process and Practices
Verification, Analysis and Assessment of the Software Artefact

The Google Summer of Code Projects are activities that contribute to those two fields of work.

Primary mentoring contacts: Lukas Bulwahn, lukas.bulwahn at gmail.com; Julia Lawall, julia.lawall at lip6.fr; Nicholas Mc Guire, der.herr at hofr.at; Ralf Ramsauer, ralf.ramsauer at oth-regensburg.de

Some background on Enabling Linux in Safety Applications

The SIL2LinuxMP project is an collaborative research project to provide procedures and methods to qualify Linux on a multi-core embedded platform at safety integrity level 2 (SIL2) according to IEC 61508 Ed 2.

Some more information on the SIL2LinuxMP project is available here:

http://www.osadl.org/SIL2LinuxMP.sil2-linux-project.0.html

http://events17.linuxfoundation.org/sites/events/files/slides/2017-10-24_ELCE-2017_Bulwahn_Safety-Critical-Linux_v1.2-presentation.pdf

https://www.youtube.com/watch?v=1eyJ6dAqMmg

The SIL2LinuxMP project has ended at the end of 2018 and the activities around Linux in safety-critical systems will be continued in a new organisational structure in 2019.

Student Project Proposals 2019

Applying Clang Thread Safety Analyser to Linux Kernel

The Linux kernel can be compiled with clang recently. This allows to employ clang tools to the Linux kernel source. The clang thread safety analyser is a tool to detect when data is accessed without being locked before, when suitable annotations are provided. The GSoC student project proposal should describe how you would use the clang thread safety Analyser to detect missing locks for on widely used and central kernel data structures in the Linux kernel. The GSoC student project proposal should sketch how this would implemented, it would be best to provide an example and foreseen challenges and explain which the first investigations and design decisions need to be done and provide a first working assumption on those design decisions.

Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com

Fuzzing System Calls against POSIX Specifications

The fuzzer for system calls, syzkaller, is used to detect kernel crashes and issues in the kernel's internal state, but with its current setup, it is not able to detect if a system call provides wrong output, i.e., output that does not follow the POSIX specification. The Linux Test Project has some basic tests that some of the POSIX specification. The GSoC student project proposal should describe how you would implement a fuzzer to fuzz system calls against the POSIX specifications, possibly making use of the aforementioned resources. The GSoC student project proposal should sketch how this would implemented, it would be best to provide an example and foreseen challenges and explain which the first investigations and design decisions need to be done and provide a first working assumption on those design decisions.

Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com

Patch Trace Analysis with PaStA

The PaStA tool, developed by Ralf Ramsauer, relates patch emails sent on the mailing lists to commits in the git repository. This allows to analyse the development process on the mailing list, measure some interesting metrics on the development, and identify outliers with respect to some central properties.

The issue tracker, https://github.com/lfd/PaStA/issues, provides a good overview of possible student activities in a GSoC project related to PaStA. A student start look into those issues and determine a suitable selection of tasks and goals from that issue tracker that can be handled within the timeframe of the GSoC project. A project proposal should breakdown the goals stated in the issue tracker to a more detailed plan of activities, needed extensions of PaStA and the implementation tasks.

Main contact person for this project proposal: Ralf Ramsauer, ralf.ramsauer at oth-regensburg.de

Handling Complex Types and Attributes in Coccinelle

Coccinelle, developed by Julia Lawall and her group, is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. The GSoC project proposal would be to improve the management of complex types and attributes (const, __user, etc) in the program matching and transformation tool Coccinelle. Currently, complex types are represented incorrectly, which makes it impossible to match against them in a general way. Various kinds of attributes are increasingly used in Linux kernel code, and it is important to be able to check these annotations and add them where they are missing. This project is relevant to Linux, Zephyr and many more applications.

Main contact person for this project proposal: Julia Lawall, julia.lawall at lip6.fr

Develop Methods for Tracking Tool Analysis Findings over Time

We use a number of tools, checkpatch.pl, coccinelle scripts, sparse, etc. and these tools report certain findings. While the valid ones are addressed by the kernel developers, the invalid tool findings are manually assessed and not acted upon. Over time with addressing the valid findings, the proportion of invalid findings increase compared to newly appearing valid findings, as invalid findings of those tools are not marked and tracked over the various versions.

In this GSoC project, the student should work out methods and tools to track the tool findings and make these tools useful in the Linux kernel community.

Required Knowledge:

Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
Required: Very good knowledge of python
Required: Good understanding of git
Recommended: Some understanding of static analysis tools
Recommended: Some understanding of coccinelle

Main contact person for this project proposal: Lukas Bulwahn, lukas.bulwahn at gmail.com

Student Project Proposals 2018

Tailoring clang compiler warnings with coccinelle scripts

Recently, some efforts by Google's kernel developers have made it possible to compile large parts of the Linux kernel with the clang/LLVM compiler. The clang compiler is known to include a number of static analysis methods that identify typical bug patterns and bad code smells, which the compiler indicates to the user by compiler warnings. However, when one compiles the current Linux kernel with clang and switches on to warn on certain warning classes, the compiler shows a tremendous large number of warnings.

The final goal is to find a proper setup that reduces the number of warnings with clang compiler with a suitable post-processing and filtering, such that Linux kernel developers can practically improve their submitted patches during the development with the reported findings of potential bugs, bug patterns and code smells.

A first analysis of specific cases of the clang warnings showed that many of the found warnings fall into a comparatively small set of classes that follow a very specific and characteristic syntactic pattern and hence, many warnings can be assessed once.

As a first step, one needs to determine the relevance of the type of clang warnings for the Linux kernel and its contribution to the avoidance of certain typical bug patterns. If the clang warning type in general considered relevant, the next step is to identify the typical set of subclasses of this warning. Then, we can write the syntactic patterns in coccinelle scripts to automatically group all warnings of the identified subclasses. As a last step, the subclasses can then be assessed to determine if they are generally false reports (so, an irrelevant subclass within the relevant class of warnings), or if this subclass points to code where a bug can be uncovered with a comparatively high probability. Once the different classes have been identified and assessed, further warnings can be automatically marked and these warnings can be presented to the developers weighted by the confidence and relevance of the warnings.

This way, the developers should have useful static analysis report for their patches and the kernel community can gradually improve the different parts of the Linux kernel during the continuous evolution of the overall source code. If time within the project permits, this approach can further be applied to finding of other static analysis tools, e.g., clang tidy or sparse.

Required Knowledge:

required: C and coccinelle, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work, good analytical skills to understand why warnings are reported in certain source code parts, solid documentation skills, good English writing skills with clear precise style
desired: knowledge of compiler construction and static analysis, python, bash

Benefit: The student project provides more evidence that kernel development follows a defined process and discussion on identified patterns shows that a controlled and informed process is followed to decide which code patterns should be generally followed and which are enforced by the different tools and their setup.

To work on this project, you should show that you are able to use all the needed tools, especially coccinelle, and that you can properly analyse Linux kernel source code and write a precise and well-written report on your finding. Details for what is required for a successful application is provided on request.

Symbolic kernel config build evaluator

Goal of this project is to develop a tool that describes the conditions when a source file is included in the build of the Linux kernel with a symbolic term, i.e., a boolean formula over the kernel configuration flags and further kernel build variables.

For example, the `kernel/Makefile` includes the following build instructions:

obj-$(CONFIG_FUTEX) += futex.o
ifeq ($(CONFIG_COMPAT),y)
obj-$(CONFIG_FUTEX) += futex_compat.o
endif

Here, the file `kernel/futex.c` is included in the build if and only if CONFIG_FUTEX is set. Similarly, the `kernel/futex_compat.c` is include if and only if CONFIG_FUTEX and CONFIG_COMPAT is set.

The developed tool should be able to analyse this and provide for this example an output that would look like this:

kernel/futex.c <- CONFIG_FUTEX
kernel/futex_compat.c <- CONFIG_COMPAT && CONFIG_FUTEX

To achieve this in perfection, it would require a Herculean task: As the kernel build system is make, which is already quite complex by itself and can further invoke arbitrary shell scripts, it would require to write the complete make build system and shell evaluation as symbolic evaluator. This would just much too complex and laborious considering the value of the information that shall be collected.

However, we are convinced that the Makefiles in the kernel tree have only a rather low number of relevant patterns that are relevant to obtain the information that we are interested in. Hence, the symbolic expression can be quite simply be determined by writing a rather simple symbolic evaluator that understands the most prominent patterns and ignores the parts in the Makefile that are not relevant for the symbolic expression.

This symbolic evaluator provides a first good approximation of that information to answer and address further questions. With an increasing number of patterns covered, the approximation of the tool becomes increasingly better. At any point of the development, the quality of the symbolic evaluation can be evaluated to its ground truth, defined by comparing the tool's symbolic prediction against actual kernel builds with a large number of concrete build configurations.

Furthermore, these boolean expressions can be combined with the constraints of the kconfig tool to find potential contradictions in the overall kernel build configuration setup.

Once, this tool works properly on the granularity, we could consider to extend that even further on precise information for each line within a file. Again, we would follow a similarly pragmatic approach parsing a small set of patterns to determine if certain lines are only included under certain build configuration conditions.

The result of this symbolic evaluation is useful in multiple ways:

One can find out which files are actually never included in any kernel build. With over 25,000 files, a complex build configuration setup and its continuous rate of change, it is actually quite likely that the kernel repository contains such files due to mistakes in the build configuration.
Kernel developers can quickly determine how they must configure the kernel config that make sure that the files they modified are included in their local test builds.
Knowing the constraints on the kernel config, kernel maintainers can decide how much review is needed and from which developer reviews and tests especially relevant for this patch.
For system maintainers, we already developed a tool that can determine which patches are relevant for a given kernel configuration. With the described tool here, we can extend this information to link which patch is relevant due to which selected kernel configuration in the given kernel configuration. This supports system maintainers to judge which impact an update of the Linux kernel could potentially have on the overall system, and which patches in an update have significant more potential to impact the system than others.

Required Knowledge:

Required: Very good understanding of compiler construction and processing in compilers, e.g., parsing, syntax analysis, code transformations and static analysis.
Required: Good knowledge of make and python
Desired: Basic knowledge of the kernel build system
Desired: Knowledge and experience with some functional programming language

To work on this project, you should show that you are able to use all the needed tools, and that you can implement a simple interpreter, e.g., you can write a parser, interpreter and static analyser for a small toy programming language. Details for what is required for a successful application is provided on request.

Apply Facebook Infer to Linux kernel source code

Task: Run Facebook Infer on the Linux kernel source code, write models in Facebook Infer to improve the analysis.

This is a quite challenging project; in the application, we expect that you have proved that you can run Facebook Infer on the complete Linux kernel source code and you obtained a first analysis result.

If you cannot run Facebook Infer on the complete Linux kernel source, you should prove that you understand why Facebook Infer fails on certain parts, suggest different alternative work arounds and solutions, and already applied the work arounds, so that you can run Facebook Infer on the kernel source code with a fully known, understood and limited number of work arounds.

The project proposal should include first technical steps that show how you write models in Infer.

Required Knowledge:

Required: Very good understanding of static analysis
Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
Required: Very good knowledge of make and python
Required: Good knowledge and practical experience with OCaml
Required: good analytical skills to understand why static analysis reports findings in certain source code parts, solid documentation skills, good English writing skills with clear precise style
Desired: Basic knowledge of the kernel build system

Develop Methods for Tracking Tool Analysis Findings over Time

In this GSoC project, the student should work out methods and tools to track the tool findings and make these tools useful in the Linux kernel community.

Required Knowledge:

Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
Required: Very good knowledge of python
Required: Good understanding of git
Recommended: Some understanding of static analysis tools
Recommended: Some understanding of coccinelle

Make Linux kernel community aware of tool findings

The Linux kernel community has a number of tools to ensure the quality of the continuous kernel development. Among these tools are coccinelle, sparse, checkpatch.pl, lock dependency validator, KASAN, syzkaller and many more.

In the GSoC project, the student should find suitable ways to make the Linux developers aware of the tools' findings. There are various way in which this could be implemented, e.g.:

Setting up an infrastructure that runs those tools on patches provided on the mailing list and reports the findings back to the patch authors
Including the tool findings in the elixir development service
Providing means to tag and comment on tool findings in the distributed Linux kernel development

This project idea is quite wide and we expect the student to provide a more specific description of the task to tackle with some evidence that he/she will be able to implement the proposal.

Required Knowledge:

Required: Very good knowledge of a suitable programming language, e.g., python
Required: Good understanding of git
Required: Good knowledge of C
Recommended: Basic understanding how the kernel community works
Recommended: Basic understanding of the kernel tools

Assess, Aggregate and Provide Detailed Statistics on Kernel Bug Fixes

Goal of this project is to create and gather detailed information on bugs in the Linux kernel.To obtain this information, the student will have to analyse commits of the Linux kernel stabilisation process in a large number by manual assessment.

The assessment should allow us to answer these kind of questions:

Which type of problems are identified in the early stage of the stabilisation process?
Which type of problems are identified in the later stages of the stabilisation process?
How are problems identified, by review, by testing, by tools, by production use?

This project will require to create an ontology of bugs; this will require some work reading general literature and quite some effort assessing and understanding the bug fix commits. You will not mainly program new features, but you will learn a lot about the kernel code by looking and assessing bug fixes and trying to derive general statements from these observations.

Required Knowledge:

Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
Required: Very good understanding of software development processes and software development methods
Required: Good understanding of git

Table of Contents

GSoC Projects around Enabling Linux in Safety Applications (ELISA)

Some background on Enabling Linux in Safety Applications

Student Project Proposals 2019

Applying Clang Thread Safety Analyser to Linux Kernel

Fuzzing System Calls against POSIX Specifications

Patch Trace Analysis with PaStA

Handling Complex Types and Attributes in Coccinelle

Develop Methods for Tracking Tool Analysis Findings over Time

Student Project Proposals 2018

Tailoring clang compiler warnings with coccinelle scripts

Symbolic kernel config build evaluator

Apply Facebook Infer to Linux kernel source code

Develop Methods for Tracking Tool Analysis Findings over Time

Make Linux kernel community aware of tool findings

Assess, Aggregate and Provide Detailed Statistics on Kernel Bug Fixes