Tuesday, May 29, 2007

Reorganizing data in the icon cache #2

Same plot as before, but with a far longer session...

Reorganizing data in the icon cache

I was complaining earlier about the organization of the data in the gtk+ icon cache. So I decided to reorder these data and simulate the cache access. Here is the graph.





Few notes:


  1. "Default layout" is what you get currently in Gnome. "Reorganized" is what I would like to see. Note that the reorganization does not use any statistical information from the recorded session.
  2. "All" vs. "Meta-data": The current reorganization only apply to the meta-data. There is no reorganization on the picture data themselves. So for each data organization I counted the number of page accessed when accessing "all" kind of data, or "meta-data" only. The second one being included in the first one.

I think that once we will start extracting statistical information from the sessions, the numbers can be crunched again. The idea would be to separate "hot" meta-data (often accessed) from "cold" meta-data (almost never when not never accessed), similar to what some compilers are doing with "hot" traces of code and "cold" traces of code in order to reduce I-TLB miss.

Tuesday, May 22, 2007

GTK+Icon Cache #4

How many different pages did gtkiconcache access in order to load the first n icons from my (simple) session? Here is the graph:

This not only include the pages where the icons live, but the pages gtkiconcache had to access to find the icons.

The red line shows the actual number of pages that have been loaded. The green one is the "ideal" situation where all the icons are stored continuously and not metadata are needed to find these icons. In other words: 1+sum(icon size) / page size.

The code to generate this graph needs to be polished a little bit before beeing published. Should be soon though.

Monday, May 21, 2007

GTK+Icon Cache #3

Assumptions, assumptions...

I was of course wrong. I started to write a parser to read the content of the icon cache. My curiosity was raised when I started to dump what was storead and where... and I discovered that the chained elements of the hash table are spread accross the cache, near of their related group of icon. I mean group as each chain element is not related to a specific icon but to a set of related icons that have the same name but stored in a different directory.

On one side, this is not a big deal as whenever you have a match between the icon name and the chain element, the page where the icon is stored is likely to have been loaded. On the other side...

  • If the chain element you are comparing to is not the one related to the icon you are looking for, you loaded a block into memory for nothing.
  • The current layout assume the same icons for different directory are grouped together in the cache file. This assumptions is no longer true if we decide to reorder the icons in the cache file based on their usage...

The first point is supporting my idea that chain elements should be reordered based on their access frequency. I computed for the default gnome theme the values for "How many chains containx x elements?"

Length of Chains# of chains
015
137
240
335
417
59
64
73
83

This means that in the worst case, you may end-up accessing 7 pages that you should not have. For one icon that's a lot of wasted disk access.

Friday, May 18, 2007

GTK+Icon Cache #2

As I announced, here is a first version of the remapper.

I should warn that I did not really test it completely. at least, with no remapping rules, the input and output files are the same,a s by default, an icon with no remapping definition does not have its offset altered.

This means it is the responsability of whichever produce the mapping file to ensure every icon is defined. Otherwise, two icons may end up at the same offset and the statistics displayed will be wrong.

Next step: I need a gtkicon cache reader so that I can not only extract the exact list of icons, but the size of the metadata too so that algorithms output offset actually starting after the metadata. This is under the assumption that the size is not affected by the order of the icons. This is assuming too that there is two areas: The hash table and then the icons, not a mix of both.

GTK+Icon Cache #1

I looked a little bit closer to Federico's suggestions about GTK+Icon Cache

I wrote a small script, split-collated.py, which split the collated file to one file per cache.

My next step is to write a remap script, which takes two input files:

  • A mapping file which associate icons to an offset
  • A file which contains the log access to the cache, i.e. one that has been created by my script

The output will looks similar to the second file, except that the original offsets in the cache file will be substituted with the values from the mapping file.

I'll plan on working on different algorithms. They will output a mapping file, so that we can use with this script the same plotting scripts we use for the original logs.

Other thoughts about the Icon Cache but not related to the reorganization of the icons in the cache file:

  • Nautilus (he is not the only one) likes to access to the same icon several times (2-3 times). It is sometimes the same icon (pair directory/icon-name), or the same icon-name but different directory. However, the meta-data ar organised as a hash-table whose hash is computed based on the icon name. Once the icon entry is found, then you iterate through the different directories existing for this icon. This means that if you get several requests for the same icon name, but on a different directory, you would not have to find again the entry in the hash table if you keep the "last accessed" icon. You would just have to re-iterate through the directory list.
  • In addition to reorganizing the physical position of the icon in the cache file, you can in addition reorganize the position of the icon entry in the hash table based on its frequency. Once you get the hash line, it is a linked list. Let's put the most frequently used icons at the begining of the list.
  • Likewise, if you know which icon is likely to be loaded after the current one, you could add this information in the metadata of the icon with a direct offset to the information related to these icons in the metadata. This would spare the computation/search time to find the information

Note that for the first and last points, the overhead related to compare with these stored values may be higher than the actual gain that it may sometimes provide. But worth having a look. The second point do not add overhead, except at the creation of the cache.