Kernel

From Rosalab Wiki
Revision as of 15:30, 13 March 2014 by NicCo (Talk | contribs)

This is a page snapshot, showing old (but not deleted) versions of images and templates.
Jump to: navigation, search

ROSA Kernel

This text was started on 27 Feb 2014, this is only the very first draft, an early 'work in progress'...

at the end, when complete, it would contain the main specs of each flavours and suggestion of use.


ROSA has in its availability a great number of kernel series

  • Kernel ONE with nr.3 flavour series: basic, nrj, nrjQL
Kernel ONE is the most complex and complete source with so many configs and features, it can configure and generate a lot of different specialized flavours, also called (69) Yin/Yang for its completeness
  • Kernel Vanilla with one basic vanilla flavour plus two vanilla + nrj based flavours
  • Kernel RT with one basic rt flavour plus one rt based flavour (rtQL)


The sources are shared with OpenMandriva linux, so the same sources can generate a great number of different kernel flavours for OpenMandriva Lx http://openmandriva.org/, ROSA Linux http://www.rosalab.com/, and their spin-off like MagOS http://www.magos-linux.ru/, MoonDrake http://moondrake.org/, Unity Linux http://unity-linux.org/, ...


The Kernel ONE

The basic, nrj and nrjQL kernel flavours

nrj and nrjQL are two different codenames used to distinguish the two advanced flavour series from the basic ones

these three flavours series can be generated by a single kernel source called the Kernel ONE 

[1]

History about the nrj and nrjQL kernels

Just below, the link from where all has born...

http://mib.pianetalinux.org/forum/viewtopic.php?f=38&t=3463


Main configs and features

The same source is able to configure and generate nr.3 kinds of kernel flavour levels:


1> basic flavours use 'old mdv model', it's a complete featured kernels set with simple common configs, and the same old mdv names:

examples: kernel-desktop-i586, kernel-desktop, kernel-server, ...


2> nrj flavours = contain the same patches and features of 1> plus few extended features:

  • Full Preemption
  • RCU preemption
  • RCU boosting
  • BFQ (disk I/O sched) enabled by default (instead of standard CFQ)
examples: kernel-nrj-desktop, kernel-nrj-laptop, kernel-nrj-realtime, ...

ROSA Linux distro have chosen kernel-nrj-desktop as the default kernel flavour, but it is possible, after the OS has been installed on HD, to use any of the of different kernel flavours,


3> nrjQL flavours = contain the same patches and features of 2> nrj plus few extended features:

  • C.K. patches, designed to improve system responsiveness and interactivity
  • BFS (Process scheduler), enabled by default (instead of standard CFS)
  • UKSM, the Ultra Kernel Memory DeDuplication, enabled by default
  • TOI (Tux On Ice), suspend-to-disk or hibernate, enabled by default
examples: kernel-nrjQL-desktop, kernel-nrjQL-laptop, kernel-nrjQL-realtime, ...

OpenMandriva Lx distro have chosen kernel-nrjQL-desktop as the default kernel flavour, but it is possible, after the OS has been installed on the HD or SSD, to install and use any of the of different specialized kernel flavours, some examples?

  • If you have a laptop PC, you need a better Cpu cooling and Energy Saving, we sugggest you nrjQL-laptop
  • If you have a laptop / netbook PC, you need better Energy Saving, the lightest flavour is nrjQL-netbook
  • If you need a most responsive system for audio applications, we suggest you installing nrjQL-realtime
  • If you need to prepare a server for your LAMP applications, we suggest you installing nrjQL-server
  • If you need preparing a Game Server for CounterStrike or other FPS games, you have nrjQL-server-games
  • If you need a performant server for encoding/decoding, building sources, you have nrjQL-server-computing


Table of configs and features with descriptions

Main flavour configs and specs (from basic to nrjQL)
Flavour Names basic configs nrj model nrjQL model misc
kernel Hertz TkLs Gov. Xen C.Pr R.Pr R.bs Disk C.K. Prcs MemD Hybr #1 #2 #3
desktop 1000 yes OnD no yes
laptop 300 yes OnD no yes
netbook 250 yes OnD no yes
server 100 yes OnD yes yes
nrj-desktop 1000 yes OnD no yes yes yes BFQ yes yes
nrj-laptop 300 yes OnD no yes yes yes BFQ yes yes
nrj-netbook 250 yes OnD no yes yes yes BFQ yes yes
nrj-realtime 2000 no Perf no yes yes yes BFQ yes yes
nrjQL-desktop 1000 no OnD no yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrj-QL-laptop 300 yes OnD no yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrjQL-netbook 250 yes OnD no yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrjQL-realtime 2000 no Perf no yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrjQL-server 100 yes OnD yes yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrjQL-server-computing 100 yes OnD yes yes yes yes BFQ yes BFS UKSM TOI yes yes yes
nrjQL-server-games 3000 yes Perf no yes yes yes BFQ yes BFS UKSM TOI yes yes yes


LEGENDA

Hertz=Scheduler frequency - TkLs=TickLess mode - Gov.=Power Governor - Xen=XEN Server

C.Pr=Cpu Preempt - R.Pr=RCU Preempt - R.bs=Rcu Boost - Disk=Disk I/O scheduler

C.K.=Con Kolivas patches - PrSc=Process scheduler - MemD=Memory Deduplicator - Hybr=Hybernation/Suspend

misc1 basic available features: 3rd party, AUFS3, OverlayFS, NdisWrapper

misc2 nrj available features: 3rd party, AUFS3, OverlayFS, NdisWrapper, ReiserFS4, ESF, ...

misc3 nrjQL available features: 3rd party, AUFS3, OverlayFS, NdisWrapper, ReiserFS4, ESF, ...



Hertz > Timer Wheel, Jiffies and HZ (or, the way it was) http://elinux.org/Kernel_Timer_Systems#Timer_Wheel.2C_Jiffies_and_HZ_.28or.2C_the_way_it_was.29


The original kernel timer system (called the "timer wheel) was based on incrementing a kernel-internal value (jiffies) every timer interrupt. The timer interrupt becomes the default scheduling quamtum, and all other timers are based on jiffies. The timer interrupt rate (and jiffy increment rate) is defined by a compile-time constant called HZ. Different platforms use different values for HZ. Historically, the kernel used 100 as the value for HZ, yielding a jiffy interval of 10 ms. With 2.4, the HZ value for i386 was changed to 1000, yeilding a jiffy interval of 1 ms. Recently (2.6.13) the kernel changed HZ for i386 to 250. (1000 was deemed too high).



Tickless Mode / Dynamic ticks (TkLs) http://elinux.org/Kernel_Timer_Systems


Tickless kernel, dynamic ticks or NO_HZ is a config option that enables a kernel to run without a regular timer tick. The timer tick is a timer interrupt that is usually generated HZ times per second, with the value of HZ being set at compile time and varying between around 100 to 1500. Running without a timer tick means the kernel does less work when idle and can potentially save power because it does not have to wake up regularly just to service the timer. The configuration option is CONFIG_NO_HZ and is set by Tickless System (Dynamic Ticks), on the Kernel Features configuration menu.



Power Governors (Gov.) https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt


OpenMandriva desktop kernel has OnD (OnDemand), the most responsive realtime flavours use Perf (Performance)


Ondemand


The CPUfreq governor "ondemand" sets the CPU depending on the current usage. To do this the CPU must have the capability to switch the frequency very quickly.

Performance


The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of scaling_min_freq and scaling_max_freq.



Xen hypervisor (Xen) http://en.wikipedia.org/wiki/Xen http://www.xenproject.org/


Xen in server flavour allows the kernel to boot in a paravirtualized environment under the Xen hypervisor.



CPU Preemption (C.Pr) http://en.wikipedia.org/wiki/Preemption_%28computing%29


"Preemptive multitasking allows the computer system to more reliably guarantee each process a regular "slice" of operating time. It also allows the system to rapidly deal with important external events like incoming data, which might require the immediate attention of one or another process."

For the user experience the CPU Preemption makes the Operative System more reactive and responsive to his inputs, more effective in multitasking, and when PC is used as a multimedia workstation



RCU Preemption (R.Pr) http://www.rdrop.com/users/paulmck/RCU/whatisRCU.html


http://www.rdrop.com/users/paulmck/RCU/whatisRCU.html https://lwn.net/Articles/541037/

RCU has 4 modes, NRJ model is configured as shown in item: 4. SMP && PREEMPT: TREE_PREEMPT_RCU

1. !SMP && !PREEMPT: TINY_RCU, which is used for embedded systems with tiny memories (tens of megabytes).

2. !SMP && PREEMPT: TINY_PREEMPT_RCU, for deep sub-millisecond realtime response on small-memory systems.

3. SMP && !PREEMPT: TREE_RCU, which is used for high performance and scalability on server-class systems where scheduling latencies in milliseconds are acceptable.

4. SMP && PREEMPT: TREE_PREEMPT_RCU, which is used for systems requiring high performance, scalability, and deep sub-millisecond response.

So, if you currently use TINY_PREEMPT_RCU, please go forth and test TREE_PREEMPT_RCU on your hardware and workloads.



RCU Boosting (R.bs) http://cateee.net/lkddb/web-lkddb/RCU_BOOST_PRIO.html


This option specifies the real-time priority to which long-term preempted RCU readers are to be boosted. If you are working with a real-time application that has one or more CPU-bound threads running at a real-time priority level, you should set RCU_BOOST_PRIO to a priority higher then the highest-priority real-time CPU-bound thread. The default RCU_BOOST_PRIO value of 1 is appropriate in the common case, which is real-time applications that do not have any CPU-bound threads.

Some real-time applications might not have a single real-time thread that saturates a given CPU, but instead might have multiple real-time threads that, taken together, fully utilize that CPU. In this case, you should set RCU_BOOST_PRIO to a priority higher than the lowest-priority thread that is conspiring to prevent the CPU from running any non-real-time tasks. For example, if one thread at priority 10 and another thread at priority 5 are between themselves fully consuming the CPU time on a given CPU, then RCU_BOOST_PRIO should be set to priority 6 or higher.



Disk I/O scheduler (Disk) http://algo.ing.unimo.it/people/paolo/disk_sched/ http://lwn.net/Articles/275978/


We are currently using BFQv7r2, waiting for v7r3 that promises to double the throughput with random load

"BFQ is a proportional-share storage-I/O scheduler that also supports hierarchical scheduling with a cgroups interface. Here are the main nice features of BFQ.

Low latency for interactive applications According to our results, whatever the background load is, for interactive tasks the storage device is virtually as responsive as if it was idle."

Just a video: http://www.youtube.com/watch?feature=player_embedded&v=J-e7LnJblm8



C.K. > Con Kolivas patches


http://users.on.net/~ckolivas/kernel/

These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but suitable to any workload.



BFS - Process Scheduler Prcs


http://ck.kolivas.org/patches/bfs/3.0/3.12/3.12-sched-bfs-444.patch

BFS is the Brain Fuck Scheduler. It was designed to be forward looking only, make the most of lower spec machines, and not scale to massive hardware. ie, it is a desktop orientated scheduler, with extremely low latencies for excellent interactivity by design rather than "calculated", with rigid fairness, nice priority distribution and extreme scalability within normal load levels.



UKSM - Memory Deduplicator (MemD)


http://www.phoronix.com/scan.php?page=news_item&px=MTEzMTI http://kerneldedup.org/en/projects/uksm/

The Ultra KSM (UKSM) patch-set for the Linux kernel continues to be maintained for providing transparent full-system memory de-duplication for Linux. UKSM is about de-duplication of data in system memory rather than being another de-duplicating file-system. UKSM can work for KVM virtualization as well to reduce memory usage for guest virtual machines and there is also a KernelDeDup project for supporting Xen virtualization too, in an effort to reduce memory pressure.



TuxOnIce - TOI (Hybr)


http://en.wikipedia.org/wiki/TuxOnIce http://tuxonice.nigelcunningham.com.au/

TuxOnIce (formerly known as Suspend2) is an implementation of the suspend-to-disk (or hibernate) feature which is available as patches for the 2.6 Linux kernel. During the 2.5 kernel era, Pavel Machek forked the original out-of-tree version of swsusp (then at approximately beta 10) and got it merged into the vanilla kernel, while development continued in the swsusp/Suspend2/TuxOnIce line. TuxOnIce includes support for SMP, highmem and preemption.



AUFS3 (misc#1) http://en.wikipedia.org/wiki/Aufs


http://aufs.sourceforge.net/ http://sourceforge.net/p/aufs/aufs3-standalone/ci/master/tree/

aufs (AnotherUnionFS in version 1, but advanced multi layered unification filesystem since version 2) implements a union mount for Linux file systems.

Developed by Junjiro Okajima in 2006,[1] aufs is a complete rewrite of the earlier UnionFS. It aimed to improve reliability and performance, but also introduced some new concepts, like writable branch balancing,[2] and other improvements - some of which are now implemented in the UnionFS 2.x branch.



OverlayFS (misc#1)


http://sourceforge.net/projects/olfs/

An FUSE filesystem module that merges content of several directories in to a single directory transparently.



Commands and Tools

There are some command tools that are generated from the same kernel srpm

cpupower
to monitor and / or change the Power Governor profile
kernel-header
it contains the needed headers for some applications
perf
to execute some interesting performance comparison tests



To manage the energy profiles, we can use the command tools, mainly cpupower and perf

The operators available are: conservative, userspace, powersave, ondemand, performance.

All of the kernel flavor for OpenMandriva and ROSA are configured with default OnDemand, except the realtime type flavours and server-games, needing the most of responsiveness, which are configured with Performance governor.


If you have these not installed, you can install now

# urpmi cpupower perf



We can see the list of command options

[root@localhost ~]# cpupower
Usage:  cpupower [-d|--debug] [-c|--cpu cpulist ] <command> [<argsnrgetic>]
Supported commands are:
        frequency-info
        frequency-set
        idle-info
        idle-set
        set
        info
        monitor
        help



To ask which is the used configuration (in the case below is Performance)

[root@localhost ~]# cpupower frequency-info
analisi della CPU 0:
  modulo acpi-cpufreq
  CPU che operano alla stessa frequenza hardware: 0
  CPU che è necessario siano coordinate dal software: 0
  latenza massima durante la transizione: 10.0 us.
  limiti hardware: 1000 MHz - 1.67 GHz
  frequenze disponibili: 1.67 GHz, 1.33 GHz, 1000 MHz
  gestori disponibili: conservative, userspace, powersave, ondemand, performance
  gestore attuale: la frequenza deve mantenersi tra 1000 MHz e 1.67 GHz.
                   Il gestore "ondemand" può decidere quale velocità usare 
                  in questo intervallo.
  la frequenza attuale della CPU è 1.67 GHz (ottenuta da una chiamata diretta all'hardware).
  boost state support:
  Supported: no
  Active: no



If you prefer to enable "powersave" for: the best cpu cooling, more battery lasting, but with the worst performance

[root@localhost ~]# cpupower frequency-set -g powersave
Setting cpu: 0
Setting cpu: 1



We check now that Powersave is on (and it is)

[root@localhost ~]# cpupower frequency-info
analisi della CPU 0:
  modulo acpi-cpufreq
  CPU che operano alla stessa frequenza hardware: 0
  CPU che è necessario siano coordinate dal software: 0
  latenza massima durante la transizione: 10.0 us.
  limiti hardware: 1000 MHz - 1.67 GHz
  frequenze disponibili: 1.67 GHz, 1.33 GHz, 1000 MHz
  gestori disponibili: conservative, userspace, powersave, ondemand, performance
  gestore attuale: la frequenza deve mantenersi tra 1000 MHz e 1.67 GHz.
                   Il gestore "powersave" può decidere quale velocità usare
                  in questo intervallo.
  la frequenza attuale della CPU è 1000 MHz (ottenuta da una chiamata diretta all'hardware).
  boost state support:
    Supported: no
    Active: no



We can edit the config file, to have it permanently

/etc/sysconfig/cpupower

You can replace the 'ondemand' with your preferred governor. Now: Save it and reboot!



Other Kernels

Vanilla Kernels flavours

OpenMandriva has other kernels in its availability, these are generated from different SRPMS

The vanilla flavours are prepared with the most basic vanilla features and configs, with none 3rd party patches add

What is a Vanilla Kernel ?
Is the basic kernel sources in the following link "http://www.kernel.org


  • kernel-vanilla
the basic vanilla 
  • kernel-vanilla-nrj-desktop
vanilla plus the full nrj preemption mode for the CPU and RCU tree, RCU boosting
  • kernel-vanilla-nrj-laptop
vanilla plus the full nrj preemption mode for the CPU and RCU tree, RCU boosting, but at 300Hz


RT Kernels flavours

Kernel RT and the -rt flavours based on the Andrew Morton -rt patchset

What is an RT Kernel ?
Is the basic vanilla kernel sources plus the -rt patches
https://rt.wiki.kernel.org/


  • kernel-rt
the basic -rt flavour
  • kernel-rtQL
rt plus other features, most from QL, as AUFS3, BFQ, REISERFS4, TOI, UKSM


How to Install

When you choose the kernel flavour that you want to install, we suggest to install also the source rpm, or at least, the related flavour -devel package

example:

kernel-nrjQL-laptop + kernel-source

otherwise you may install:

kernel-nrjQL-laptop + kernel-nrjQL-laptop-devel


to simplify the installation and automatically the further updates , we can install all through the call for the metapackages named "-latest"

example:

kernel-nrjQL-laptop-latest + kernel-source-latest

otherwise you may install:

kernel-nrjQL-laptop-latest + kernel-nrjQL-laptop-devel-latest



VMWare Virtualization

The current versions of the widespread virtualization software VMWare Workstation 10.0.1 and VMPlayer 6.0.1 work fine with the stock Kernel only upto kernel 3.11

With Kernel 3.12. the vmci and vsock modules don't build when kernel has the NAMESPACES with UIDGID enabled
With Kernel 3.13 there is a further build problem, this time is with the vnet module, that need to be fixed properly

We have searched for unofficial patches from other Community, but with no luck

MIB has prepared the solutions and the source patches to solve these troubles

The patched archives (vmci.tar, vsock.tar) for the kernel 3.12 are available here:
http://mib.pianetalinux.org/MIB/rosa2012.1/others/vmware-kernel312/

The patched archive (vmnet.tar) for the kernel 3.13.6+ is downloadable from there:
Of course, you need to replace also the two above patched archives for kernel 3.12
http://mib.pianetalinux.org/MIB/rosa2012.1/others/vmware-kernel313/

Make a safety backup of the original /source folder contents from

/usr/lib/vmware/modules/source/

then put in our archives with the fixes in

/usr/lib/vmware/modules/source/

Now, all the needed vmware modules should be built properly!



Who is the developer and maintainer

NicCo (Nicolò Costanza)

Kernel designer, engineer, maintainer and tester for ROSA Desktop and OpenMandriva Lx OSes

System admin, and moderator for the User Community of ROSA and OpenMandriva Linux OSes

MIB Blog > http://mib.pianetalinux.org/blog - MIB Forum > http://mib.pianetalinux.org/forum