I wrote a small script, split-collated.py, which split the collated file to one file per cache.
My next step is to write a remap script, which takes two input files:
- A mapping file which associate icon
s to an offset
- A file which contains the log access to the cache, i.e. one that has been created by my script
The output will looks similar to the second file, except that the original offsets in the cache file will be substituted with the values from the mapping file.
I'll plan on working on different algorithms. They will output a mapping file, so that we can use with this script the same plotting scripts we use for the original logs.
Other thoughts about the Icon Cache but not related to the reorganization of the icons in the cache file:
- Nautilus (he is not the only one) likes to access to the same icon several times (2-3 times). It is sometimes the same icon (pair directory/icon-name), or the same icon-name but different directory. However, the meta-data ar organised as a hash-table whose hash is computed based on the icon name. Once the icon entry is found, then you iterate through the different directories existing for this icon. This means that if you get several requests for the same icon name, but on a different directory, you would not have to find again the entry in the hash table if you keep the "last accessed" icon. You would just have to re-iterate through the directory list.
- In addition to reorganizing the physical position of the icon in the cache file, you can in addition reorganize the position of the icon entry in the hash table based on its frequency. Once you get the hash line, it is a linked list. Let's put the most frequently used icons at the begining of the list.
- Likewise, if you know which icon is likely to be loaded after the current one, you could add this information in the metadata of the icon with a direct offset to the information related to these icons in the metadata. This would spare the computation/search time to find the information
Note that for the first and last points, the overhead related to compare with these stored values may be higher than the actual gain that it may sometimes provide. But worth having a look. The second point do not add overhead, except at the creation of the cache.