Desktop search update, summer 2023

sthursfield · July 31, 2023, 2:52pm

Hello again,

At GUADEC 2023 had a well attended desktop search BoF.
Here’s an overview of the plans that came out of the meeting.

Nobody is working anywhere close to full time on desktop search right now,
so there is no timescale for any of these plans. If you’d like it to happen
the best thing is to get involved in development

There are new designs for Shell search & we agreed a new ‘libgnomesearch’ library
as a way forwards. I made a prototype of that here.
There is a clear path forward for this and some unanswered questions too, roughly
the plan would be:

Reimplement D-Bus search providers in the library and migrate GNOME Shell
to use that instead of the current JavaScript code.
Measure and improve performance, some ideas:
- higher and lower priority search providers
- disabling internet-based providers if there’s no internet
- increase type-ahead delay when on low battery
Implement a new filesystem search provider that speaks directly to libtracker-sparql.
Switch Nautilus to use that. Remove Nautilus D-Bus search provider.
See what other core DBus search providers can be merged into libgnomesearch.

Other things we would like to work on soon:

Start wider testing of tracker-miner-fs indexing the entire home directory.
The plan is to roll this out on an opt-in only basis to see how it performs.
Proceed with renaming of “tracker” and “tracker-miner-fs”.
Simplify our 3 websites: make https://tracker.gnome.org/ the main homepage,
and keep Sign in · GitLab only for docs. (This
might change after we rename, I suppose…)

A separate discussion was about Contacts and Calendar possibly using libtracker-sparql to store data, to avoid some issues with Evolution Data Server when used from Flatpak apps. The ideal would be if EDS itself can store & expose its data using libtracker-sparql. A simpler option, is having a system service that keeps a read-only mirror of EDS data in a tracker-sparql database. Either of these would allow a stable D-Bus interface for
querying data with a Flatpak portal and fulltext search. Corentin already has a prototype for doing this with contacts.

Let us know what you think!
Sam

johnd4dg · August 6, 2023, 7:10pm

@sthursfield

Hi Sam,

My 2 cents as a user and developer.

I come from Windows “background”, where a few years ago I started to use “Wox” (before it was hijacked by MS to be in their “Toys” under new name “run”).
see:

Original: “GitHub - Wox-launcher/Wox: A cross-platform launcher that simply works”
MS’s “version”: “PowerToys Run utility for Windows | Microsoft Learn”

These apps uses the architecture of distributed search, and the biggest con of it, is that the scoring system is handled by each provider, so the order of results was not as expected, as each provider was able to give higher scores to its results, and move up the list.
The search results were always “95%” of what was expected, but there was always something annoying when the item that should be in 1st place appeared in 2nd or 3rd place.

In the end, I just gave up, and I wrote my own app, where all providers send their items to be indexed constantly, and there’s only one centralized scoring system.

As far as I understand from the current Gnome API, and , what mention in your text,
this approach is going to be used also in next version of Gnome.
I know that the current “Search” of Gnome has an option to show results in order of certain providers
(via gsettings/“org.gnome.desktop.search-providers sort-order”) but I think this is still not best option and can get lead to “weird” results, as some of the results of one provider can be more relevant to search query, but still it will appear after other results because of order of providers.

As I wrote before, I just “gave up” and wrote my own app (still when using Windows), but since you are going to make such a major change, I think you should also consider what I mentioned before.

I will try to summarize my notes:

There should also be an option for indexing by centralized engine, so a search provider can send items to be indexed continuously, and not respond to queries.
Eg for search file names in system, all files can be indexed before query, and this, combining the option to optimize the search (I used some “tries” (aka prefix-tree) combinations as DB, and searching files on system was very fast)
Namely 2 “kinds” of search providers interfaces should be supplied, one that just sends items to be indexed, and one the response to query.
For providers that do search themselves (respond to query), the result should also include that data used for the score, eg what string/regex was found and where, so then the centralized engine can verify this search result, and see if the score is fair, and also to block "fake "results, and also to be able to sort all items based one scoring system, and not per “search providers”.
In addition, I’ve noticed that sometimes I search for something, which I know “it’s there”, but it doesn’t show up in results.
This happens when I copy a large amount of files, and it looks like “Files” has not indexed them yet. So I think you should some API (and visual indication in “Search” results, or settings page), for this status, eg, status of search provider can be displayed as “Ready, indexed 5000 files”,or “Indexing 4000/5000”, or “Indexing counting new files”, etc.
This should something in form of ISearchProvider::GetStatus, or is simailar
This is again related to my previous notes, I did some tests, and put all my “amazon” files in one folder, when I type the word “amazon”, I see 5 results, where the order is not clear, obviously not what expected .(amazon is not first word of search result, or first word after whitespace, etc, nor date of modified of file of date was used), and in addition it says “Files 26 mores”. Why not show all results, and just let user type and filter them, without need to go to “Files” ?
In case you add to the interface more data about a search result, or indexed item, the scoring system can give higher score based on index of query in word, check if it was after white space (or other non alpha), and also add “weight” of when file was modified etc, and give much better results.

If you think my notes are of some value, I will be also happy to help with coding.

Regards,
Si.

ebassi · August 7, 2023, 11:07am

4 posts were split to a new topic: Disabling search providers

sthursfield · August 8, 2023, 12:24pm

Hey, thanks for writing all this up.

In terms of point (1), there is a centralized index already, it’s called ‘tracker-miner-fs-3’ and is developed here: GNOME / LocalSearch · GitLab. The idea is indeed to make use of the index as much as possible.

Points (2) and (3) are partly about the UI design side, which is something the design team handle - they generally work in Matrix in the design channel.

You also mention getting better ordering of search results, and this is something we really need to get right for the next iteration of desktop search. Currently we don’t have any definition of what “better ordering” would mean. What is the best order to show results? How can we define some testcases to run different searches and show results in the most useful order?

My current thinking on that question is, we need some example content and some example search queries. A lot of work is needed before this is useful but I think it’s important to figure it out before we start making big changes to how search works.

johnd4dg · August 8, 2023, 1:54pm

Hi @sthursfield ,
Thanks for your answer.
So all local searches are from the centralized index ? eg Files ?
Can you give some brief explanation of how index is done ?

In my implementation, which worked very fast on old computers,
I did few tests, using trie, Lucene as DB, and dictionaries,
I’ve received best results when:

Optimization #1

Using trie (prefix tree) for indexing
The indexer “broke” sentences, with lower case"
Eg “File with UPPERCASE and dash this-is-dash.txt”
was indexed as (all in lower case) as parts itself:

“file with uppercase and dash this-is-dash.txt”
“with uppercase and dash this-is-dash.txt”
uppercase and dash this-is-dash.txt"
“and dash this-is-dash.txt”
“dash this-is-dash.txt”
“this-is-dash.txt”
“is-dash.txt”
“dash.txt”

So any typing of letter basically gives result in O(1), just depends of number of letters you type.
(O(k) where k is number of letters typed, not number of items in index)
Then scoring is based on distance form start, is upper/lower as in original item, etc.

if you have “Presentation for meeting in London.txt” and “London flying tickets.txt”
typing just 2 letters of L and O, give both results in O(1), with different score (distance of “LO” from start of sentence)

Optimization #2
The timings to show results/send new search task (when user types another letter), etc,
are all dynamic-adaptive and not fixed.
Namely all the time the app records the typing speed of user, and learns when the best time is to send new search tasks, and cancel (instead the naive way of sending after every new char of after fixed timeout),
This makes search faster and very natural.

From your answer I understand that maybe some of my ideas maybe should be more relevant to design channel.
I’m not sure sure I have energy now to try persuade a design team…
Especially when I have my own running app that already works.

What do you suggest ?

sthursfield · August 8, 2023, 2:29pm

Hello again,

We do have some content at Sign in · GitLab where you can learn a bit about how things currently work. With regards to your optimisations, the actual full text search is done using SQLite’s FTS5 module. Your first optimisation might be relevant there.

The idea about adaptive typeahead delay is interesting. It could be a good optimisation. can you share some example code where this is implemented?

I’m not sure sure I have energy now to try persuade a design team…

It’s not so much about persuasion, more collaboration. If you’re curious I suggest joining their Matrix channel and hanging around. It does take some energy to participate, so it’s good when folk can volunteer time for that, but I understand that not everyone has time

If you have your own app that works, I suggest making some video demos of how it works, if you haven’t already - that can be a quick way to show off the innovative ideas there.

johnd4dg · August 8, 2023, 4:23pm

@sthursfield
Thanks for your answer.
For the technical suggestions, eg indexing and “adaptive typeahead delay”, is there a gitlab/other place to suggest them, (as done in github) or here is enough (and/or correct place) ?

sthursfield · August 9, 2023, 10:52am

Here is probably enough. You could also open issues at Sam Thursfield / libgnomesearch · GitLab. That project may be the future of desktop search,as mentioned above many folk like the idea, but it depends on someone having time available to push it forwards. AFAIK there isn’t anyone paid or with volunteer time available to really drive the project.

system · September 23, 2023, 10:52am

This topic was automatically closed 45 days after the last reply. New replies are no longer allowed.