Hi friends,
In the tracker channel today were talking about stop words, which lead on to the fact that we do only a limited amount of testing of the Tracker 3 full text search functionality. That’s a little ironic given the main purpose of Tracker is provide full text search.
I have been thinking for some time how to improve this. Tracker already has a test suite, with many years of hard work already behind it, but there are limits to what you can achieve with unit testing. In the real world people might be searching through 10GB of content, but we can hardly ship 10GB of example content into tracker-miners.git to create an equivalent test case.
So in the occasional space where I can volunteer an hour or two for improving Tracker, I have been looking at a new “example desktop content” project: Sam Thursfield / example-desktop-content · GitLab
The idea is to define a series of personas for people who use desktop search. From there, define the types of content they might have, and create a collection of redistributable example content. (Previously).
Then, we can define searches each persona might do. When you run a search, you do it because you want to answer some question or other, so to evaluate whether the results for a search are good, we need to have some idea of what the question was. So we define an intent, e.g. “find all e-books by Charles Dickens”, then some search terms, e.g. “charles dickens”, and then report the results.
At least for now, I am not going to try and code a heuristic for “good search results”. Instead we can generate a report, read thru it and check for any obvious issues by eye.
I spent a couple of hours today hacking together a prototype of this ,and here is the first report it generated. Looking at the 4th search, you can already see a bug (which is Seaching for a quoted full sentence doesn't seem to work (#171) · Issues · GNOME / TinySPARQL · GitLab)
example-desktop-content: Tracker 3 testing report
Tracker 3 version: Tracker 3.4.2
Indexing
Indexing time: 3 seconds
Status:
Currently indexed: 6 files, 4 folders
Remaining space on database partition: 7.9 GB (96.09%)
Data is still being indexed: Estimated less than one second left
Store size on disk:
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Audio.db: 1264.0KB
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Documents.db: 4204.0KB
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23FileSystem.db: 1264.0KB
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Pictures.db: 1264.0KB
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Software.db: 1264.0KB
- http%3A%2F%2Ftracker.api.gnome.org%2Fontology%2Fv3%2Ftracker%23Video.db: 1264.0KB
- meta.db: 2696.0KB
- ontologies.gvdb: 220.365234375KB
Search tests
Persona: teacher-creative-writing
0: What books do I have by Charles Dickens?
Search terms: Charles Dickens
Search output:
Results:
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20A%20Christmas%20Carol.epub
Charles Dickens - A Christmas Carol…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Bleak%20House.epub
Charles Dickens - Bleak House.epub
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Hard%20Times.epub
Charles Dickens - Hard Times.epub
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Oliver%20Twist.epub
Charles Dickens - Oliver Twist.epub
1: Find a quote that talk about London
Search terms: london
Search output:
Results:
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20A%20Christmas%20Carol.epub
…City of London, even including…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Bleak%20House.epub
…In Chancery London. Michaelmas term…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Hard%20Times.epub
…Walker LONDON: CHAPMAN & HALL, LD…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Oliver%20Twist.epub
…WALKS TO LONDON. HE ENCOUNTERS…
2: Find a chapter related with ghosts
Search terms: ghost
Search output:
Results:
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20A%20Christmas%20Carol.epub
…CONTENTS STAVE ONE—MARLEY'S GHOST…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Bleak%20House.epub
…VII. The Ghost's Walk VIII…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Hard%20Times.epub
…by the ghost of damp…
3: Find a sentence using present perfect continuous tense
Search terms: i have been
Search output:
Results:
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20A%20Christmas%20Carol.epub
…which I have been a…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Bleak%20House.epub
…for I have been accustomed…
file:///home/sam/example-desktop-content/build/content/Documents/ebooks/Charles%20Dickens%20-%20Hard%20Times.epub
…father. I have been tired…
4: Find a sentence using present perfect continuous tense (2)
Search terms: i’ve been
Search output:
Results: