A more comprehensive LocalSearch index for GNOME 50

Hi all,

It has been a long standing issue that file search in Nautilus (and GNOME Shell) only have a fast index for XDG folders in a recursive manner, and has to resort to in-place crawling when search happens in other locations, this is both slower and inferior in features (e.g. no content search).

In order to improve the search experience, there are two changes lined up in LocalSearch for GNOME 50:

The intent behind these is having an index for fast filesystem search available more universally, with less places where it does make sense to fallback into the in-place recursive crawling. These changes are not happening casually, and a list of improvements too long to summarize here has happened over the last few years to accommodate for the extra workload.

There is still a number of filters in effect so that LocalSearch doesn’t go unhinged on the filesystem, git repositories are still ignored, as well as directories with a .nomedia file, and only a few selected plain/text file extensions get their content indexed. A higher workload is expected, but not substantially so, compared to using XDG folders methodically.

Users that might find the new behavior undesirable have a plethora of mechanisms to avoid it, from .nomedia files, to configuration changes, to inhibiting the indexer.

We intend to merge these changes early after 3.10.x (GNOME 49) is branched. If you want to test the first change in advance, you can do:

$ gsettings set org.freedesktop.Tracker3.Miner.Files index-single-directories "[]"
$ gsettings set org.freedesktop.Tracker3.Miner.Files index-recursive-directories "['$HOME']"
$ localsearch reset --filesystem

There is a wide variety of scenarios to consider for hard numbers (initial indexing vs already indexed, indexing on battery vs on mains, some file formats being more expensive than others, …) as well as a wide range of drive speeds these days (HDDs, SD cards, SSDs, NVMe drives, …) as well as disk sizes. As such, it does not make sense to discuss hard numbers, but ballparks if anything. Also, a reminder that indexing of a file (tree) is most often a one time operation meant to speed up later searches, not something experienced every day.

Constructive discussion on this change is welcome.

5 Likes

Hello, Garnacho!

This settings are intended to be tested on GNOME 49 already, or only with the alpha/beta/rc versions of GNOME 50? For context, I’m running GNOME 49 on Debian testing.

Also, what should testers watch out for?

Hi Leonardo!,

LocalSearch 3.10.x is already branched, these changes will only be merged in the main branch (likely this week), and will be available as a release with 3.11.alpha. None of the changes are intended for <3.11, unless testers enable some of them with the provided instructions.

However, for reasons it’s not worth bringing up, Debian stayed with localsearch 3.8, which I consider too old for this kind of experiments, 3.8 was the version paired with GNOME 47, for reference. I would advise to perform testing with 3.10.x, as it is closest to what 3.11 will be.

At the moment, we are looking the most for data points from users that consider themselves above average in terms of number of files (photography hobbyists, audiophiles, documentalists, people that wind up with TBs of downloaded files, etc…). I particularly am most interested in the long term experience after the files are already indexed, e.g. whether localsearch settles to an idle state on GNOME session restart in a reasonable time, or if it becomes noticeable into the session in a enduring way (e.g. making fans spin when doing regular interaction with the filesystem, copying/cloning/downloading/etc).

It would be anecdotal data trying to measure fresh indexing after a reset (e.g. wall clock measure of localsearch reset –filesystem ; localsearch status –follow), combined with disk characteristics, and a count of the files indexed per localsearch status --stat FileDataObject. But as said there’s too wide variety of scenarios, I wouldn’t concern except in extreme cases.

1 Like

Hi, I’m one of those “above average” users :slight_smile: I have a particularly large music collection, a fair amount of photography, and entirely too many backed up ISOs from obsolete Linux distributions.

I’ve actually completely disabled Tracker indexing of my Music directory specifically on several of my computers, since my music library is so large that it was causing significant indexing load. In particular, I was running into issues where a bunch of files were triggering indexing errors, resulting in a very annoying ongoing re-indexing load of files that were mostly unchanging. This is something that I could revisit with newer LocalSearch versions to see if it has improved, but in general I’ve found that I don’t need my Music directory indexed since I tend to use music players like Rhythmbox which maintain and index their own libraries.

I also found in the past that this Music re-indexing load was maintained even when my laptop was on battery, significantly reducing battery life.

Another thing to consider is that on my desktop system, several of my XDG directories (Music, Documents, Downloads, Videos) are actually mounted over NFS from my NAS. This slows down indexing quite a bit, but it also means that file change notifications are not reliable, so some updates are missed (usually until a LocalSearch restart or manual index trigger). I wonder if having LocalSearch attempt to detect if the directory is on a network filesystem, and (by default) disabling indexing if so, might be a good idea.

Anyways, I’ll consider re-enabling indexing on a few of my systems once I get a newer LocalSearch installed, and see if I still run into some of the problems that I’ve previously had.

Regarding indexing removable devices, how will read-only devices (e.g. DVD-ROM) be handled? I have some which I honestly wouldn’t mind being indexed (especially with the ability to query the index with the disc unmounted, so I can find which disc has a file), but that would require the index be stored in my home dir rather than on the media.

Hi @kepstin,

I will definitely appreciate some more quantitative data if you get around to it. I’m sure everyone who disables LocalSearch has their own back story, but certainly I’m less interested on how they became non-users. To all the “LocalSearch did this, did that” I can only say: Try a recent version, the snapshot of that memory you preserve from the past is not representative and will not guide future development of LocalSearch.

But also a note that there’s no shame in changing configuration, the indexed folders have been configurable in Settings for a long long time, in one form or another. It is just not possible to win everybody.

This change might suit better your tastes after all. A long standing behavior of LocalSearch is that it will not recurse into folders configured as mountpoints in fstab. With homedir full indexing, you would find that these folders will not be indexed, unless you explicitly bring them back into the configuration.

We don’t need to architecture astronaut LocalSearch configuration. With the behavior as described above, that’s what the indexed folders configuration is for. Another thing to keep in mind is that other users expect and/or rely on the data from NFS mounts being available for search, so the possibility must be available to them.

You are describing the existing behavior. This is non ideal for multiple reasons:

  • Indexed data from removable volumes has an expiration date of 3 days, in order to avoid unordinary database growth. This also limits the usefulness of keeping an index, unless it’s frequently used media.
  • These locations still come up in results, everybody has to check for tracker:available to find out if results come from available volumes.
  • There may be clashes in mount paths between different media, or the same volume ending on different mount paths between different runs. That results in the indexer having to synchronize the database with filesystem contents more often.

LocalSearch is actively looking to shed all handling of optical discs (e.g. the index-optical-discs setting was deprecated in 3.10). Anybody that wants to search/curate offline volumes will need to use some other more dedicated tool.

Cheers,

Carlos

Yeah, I’ll definitely have to try it again once I get a sufficiently recent version installed (on 3.8.x at the moment). I am still mostly concerned about my laptop, so that’s where i’ll concentrate my testing, since it’ll be the biggest change - I’ve previously disabled a lot of indexing due to battery/heat concerns.

I was curious about this so I took a peek at the code, and it looks like you’re using GioUnix MountEntry. I’m pretty sure that uses info about current mounts from the kernel (e.g. /proc/mounts or /proc/self/mountinfo) which includes mounts that aren’t listed in fstab. But it doesn’t treat nested subvolumes as mountpoints. Seems reasonable.