Tracker pdf indexing on ubuntu

johannesjh · November 7, 2020, 10:11am

PDF fulltext search does not work on my freshly installed ubuntu 20.10 desktop
I’d be glad to discusss how to enable this feature (or fix this bug?). Any ideas / tips / hints?

Running ubuntu 20.10 with gnome 3.38.1.
tracker --version tells me I have tracker 2.3.6 installed.
full text indexing works for plain text files, as evident from running tracker info -c file.txt
full text indexing does not seem to work for pdf files, as evident from running tracker info -c file.pdf, which outputs 'nie:plainTextContent' = ''
in a similar way, searching in gnome files finds pdfs by metadata but not by content

tracker daemon shows that the extractor is not running
tracker daemon
Store:
07 Nov 2020, 11:09:17: 0% Store - Idle

Miners:
07 Nov 2020, 11:09:17: ✗ Extractor - Not running or is a disabled plugin
07 Nov 2020, 11:09:17: ✓ File System - Idle

note that tracker daemon -s tells me it starts the extractor, but after a few minutes the extractor stops running again, with the same out put as above, i.e., extractor not running.

thank you, br, Johannes

sthursfield · November 7, 2020, 6:26pm

Hi,
It’s normal that the tracker-extract daemon only runs when there is work to do.
A good way to test what’s happening for a specific file is to run this command:

TRACKER_VERBOSITY=3 tracker extract foo.pdf

Could you run that for one of the .pdf files and paste the output?

The actual code responsible for PDF extraction is here: src/tracker-extract/tracker-extract-pdf.c · master · GNOME / LocalSearch · GitLab. It uses Poppler to do the work of reading the file.

system · November 21, 2020, 6:27pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.