Tracker pdf indexing on ubuntu

PDF fulltext search does not work on my freshly installed ubuntu 20.10 desktop
I’d be glad to discusss how to enable this feature (or fix this bug?). Any ideas / tips / hints?

Running ubuntu 20.10 with gnome 3.38.1.
tracker --version tells me I have tracker 2.3.6 installed.
full text indexing works for plain text files, as evident from running tracker info -c file.txt
full text indexing does not seem to work for pdf files, as evident from running tracker info -c file.pdf, which outputs 'nie:plainTextContent' = ''
in a similar way, searching in gnome files finds pdfs by metadata but not by content

tracker daemon shows that the extractor is not running
tracker daemon
Store:
07 Nov 2020, 11:09:17: 0% Store - Idle

Miners:
07 Nov 2020, 11:09:17: ✗ Extractor - Not running or is a disabled plugin
07 Nov 2020, 11:09:17: ✓ File System - Idle

note that tracker daemon -s tells me it starts the extractor, but after a few minutes the extractor stops running again, with the same out put as above, i.e., extractor not running.

thank you, br, Johannes

1 Like

Hi,
It’s normal that the tracker-extract daemon only runs when there is work to do.
A good way to test what’s happening for a specific file is to run this command:

TRACKER_VERBOSITY=3 tracker extract foo.pdf

Could you run that for one of the .pdf files and paste the output?

The actual code responsible for PDF extraction is here: https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/master/src/tracker-extract/tracker-extract-pdf.c. It uses Poppler to do the work of reading the file.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.