LinuxCon Europe 2013 - About Data Races in the Linux Kernel

From Rosalab Wiki
Jump to: navigation, search


October 22, 2013 - Eugene Shatokhin, one of our developers, gave a talk devoted to data races in the Linux kernel modules at LinuxCon Europe.

According to one of possible definitions, a data race is a situation when two or more threads simultaneously access the same memory location and at least one of the threads modifies that memory location.

Such errors could be very hard to find and their consequences may vary from negligible to critical. Hunting the races down is especially important for the Linux kernel, where, for example, the code of the drivers can be executed by many threads at the same time. Now add interrupts and other asynchronous events and remember that the synchronization rules for the data are not always fully described (if described at all)...

Most of the talk was about the tools that can detect data races in the Linux kernel. KernelStrider and RaceHound tools, Eugene was actively involved in development of, were covered in more detail.

KernelStrider collects data about the operation of the kernel component (e.g. a driver) under analysis in runtime. The information about memory accesses, allocations and deallocations, locks and unlocks, etc., is then analyzed in the use space by ThreadSanitizer (Google). The algorithm of searching for races is briefly described here.

KernelStrider may issue false alarms in some cases. For example, false alarms happen when a network driver turns off interrupts in hardware and then accesses some common data without the risk of conflicts with the interrupt handlers.

RaceHound tool allows to check the warnings about data races issued by KernelStrider and find real races among them. RaceHound works as follows.

  • A software breakpoint is placed on an instruction in the binary code of the driver that may be involved in a data race.
  • When the breakpoint triggers, RaceHound determines the address of the memory area which is about to be accessed by the instruction. Then it places a hardware breakpoint to track the accesses of the needed kinds (writes only or both reads and writes) to that memory area.
  • A small delay is made before the execution of the instruction.
  • If some other thread accesses that memory area during the delay, the hardware breakpoint will trigger and RaceHound will report a race.

That is, KernelStrider plays the part of a "detective" or an "analyst" here that narrows the range of "suspects" - the fragments of the code possibly involved in data races. RaceHound is then a covert monitoring system tracking these suspects. If it catches a suspect "red-handed", everything is clear, the "crime" (data race) is confirmed.

There were many questions asked both during the talk and after it. Among other things, the audience was interested in the following.

  • Plans to support ARM (the mentioned tools currently work on x86 only) - may be later, not in the near future.
  • Situations when KernelStrider misses data races - yes, this is possible in some cases, mostly due to how ThreadSanitizer works as well as due to sometimes inaccurate event ordering rules used.
  • Support for suspend/resume in KernelStrider - yes, KernelStrider operates during suspend and resume too.
  • Support for analysis of the kernel proper rather than the modules in KernelStrider and RaceHound - not implemented at the moment.
  • Instrumentation of the code to be analyzed during the compilation rather than during its loading as KernelStrider does now - may be beneficial, it is actually one of the future directions of the development.
  • and so on.

The slides and notes for this talk are available at the project page, in "Talks and Slides" section.

[ List view ]Comments

Very interesting story, thanks for the report.

Please login to comment.