Urpmi, rpmdrake and automated dependency resolution
The specific of ROSA repositories (as well as of repositories of many other Linux-based systems) is that dependencies of some packages can be resolved in multiple ways - since sometimes there are several packages providing the same feature. For clarity, let's look at example.
We have a Tesseract OCR in our repositories which requires language-specific packs to work with certain languages. Tesseract currently supports more than 70 languages and for every language a separate data package exists in our repositories. But it is unlikely that user will need all these languages. For most users, it is enough to provide support for their native language. So then installing tesseract, we should somehow decide which language packs to install. To indicate that we have a choice, the following trick is performed on the package level: the tesseract package itself requires tesseract-language, and every package with language-specific data provides tesseract-language. When installing tesseract, urpmi (or Rpmdrake) detects that tesseract-language requirement can be resolved in multiple ways. In f this is the case, urpmi and Rpmdrake either ask user for the choice or perform selection automatically (if --auto or corresponding checkbox in Rpmdrake settings is specified).
By the way, in the recent versions automatic dependency selection in Rpmdrake is turned on by default.
So how does the automatic dependency resolution work?
Algorithms used to automatically resolve dependencies is hidden inside urpmi sources and not known to most users and even developers. Below we'll try to clarify this topic.
Previously, in most situations when choosing between several packages, urpmi just selected the first package that provides necessary requirement. It is often said that the choice is performed randomly, but this is not true. Before making a choice, urpmi ordered all packages according to their identifiers (used by urpmi itself) and then selected the one with the lowest identifier. However, there is no guarantee that the same packages have the same identifiers in different systems (since these identifiers are assigned when scanning metadata of repositories, and different users can have different repositories enabled). As a result, when using '--auto' option to install the same package in different systems, package requirements can be resolved in different ways. This is actually not a big problem; for example, if some package requires Java and works fine with both OpenJDK 6 and OpenJDK 7, then it will be all the same for user if urpmi selects OpenJDK 6 or 7. Nevertheless, we would like to have more unified behavior in different systems, and at the same time make this behavior smarter to avoid at least some issues caused by incorrect package "Provides" records (remember that maintainers can make mistakes from time to time, as well as automated dependency generator used by rpmbuild).
To achieve both of these goals, we have implemented selection of a package whose name has the longest common substring with the requirement requested. That is, if some package requires 'java' then urpmi will choose either 'java-openjdk-1.6.0' or 'java-openjdk-1.7.0' but not something like 'cacao' or other exotic Java machines. Surely, there still can be some ambiguity (as one can see with 'java' example), but it turned out that such an approach in real life gives better results than selection of a first package that provides necessary dependency.
In addition, in certain cases it really does matter for user which package will be selected by urpmi. The most common case is selection of packages that depend on system language settings. In particular, in our example with tesseract, users would like to see support for their native language (or at least the English one) and not something like Old Spanish . To correctly resolve such cases, we should solve two tasks:
- explain urpmi that package selection should be performed on the basis of system language settings
- from urpmi side, take system language into account and choose a package that matches that language.
The first task is solved by adding requirements on corresponding locales-* for language-specific packages, so tesseract-rus requires locales-ru, tesseract-eng requires locales-en and so on.
The second task is solved by urpmi in two steps. First, it checks which locales-* packages besides locales-en are installed in the system. In case when there are several packages installed, system locales (LANG environment variable) is checked, as well. We have implemented such "double pass" since our users are very different and setup their systems in different ways. Some of them work in a completely localized system, while others prefer to see menu and help pages in English even if working with documents in native language. We are still not sure about the best way to satisfy them all, ideas are welcome!
Improvements mentioned above also affect the case when urpmi asks user about package selection. In this case, the top of the alternatives list is always occupied by a package which would be selected by urpmi itself if launched with '--auto' option. Moreover, in case of selection of language-specific packages, urpmi won't display a package if corresponding locales-* package is not installed. For example, I have locales-ru, locales-en, locales-da and locales-ja installed in my system. System locale is set to Russian. In this case urpmi tesseract gives me the following options:
(Sorry for the Russian screenshot, but with English locale it's hard to illustrate the whole magic.)
As we can see, a Russian language package is located at the top, then we can see packages that depend on locales different from the English one and at the end we meet packages that require locales-en (Tagalog and Cherokee depends on locales-en because there are no appropriate locales for them, but as for heb-com, it should depend on locales-he; heh, we have found a bug while writing this article:)). Packages that depend on other locales are not listed at all.Note that some changes are implemented only in urpmi and not in Rpmdrake. The thing is that we are working on a new software center that will replace Rpmdrake; stay tuned!
- unless user is a scientist specialized in this area