Hello,
I’m running tracker3 version 3.1.2 on Arch Linux. Seems like tracker3 ignores custom ‘text-allowlist’ extensions and extracts plain text only from default files [‘.txt’, '.md’, ‘*.mdwn’].
I added other extensions to the list:
[arch]$ gsettings get org.freedesktop.Tracker3.Extract text-allowlist
['*.txt', '*.tech', '*.diag', '*.log', '*.md', '*.mdwn']
But ‘.tech’, '.diag’, ‘*.log’ are completely ignored and plain text is not extracted from these. Example of two files from the same directory, one is .txt another is .log and plain-text has been extracted only from the .txt one:
[arch]$ ls -la \*log*
-rw-r--r-- 1 ivan ivan 22008 Apr 14 17:22 log-ToR001.txt
-rw-r--r-- 1 ivan ivan 676903 Apr 14 17:23 putty-log_TOR001.log
[arch]$ xdg-mime query filetype log-ToR001.txt
text/plain
[arch]$ file -b --mime-type log-ToR001.txt
text/plain
[arch]$ xdg-mime query filetype putty-log_TOR001.log
text/x-log
[arch]$ file -b --mime-type putty-log_TOR001.log
text/plain
[arch]$ tracker3 info -c log-ToR001.txt
Querying information for entity: 'log-ToR001.txt'
'file:///home/ivan/OneDrive/Cases/2021-04/5354807557/log-ToR001.txt'
Results:
'tracker:extractorHash' = 'd35fd368fe97892c95134d493a67d39834817454eec787cddc36b8e1ca5612c3'
'nfo:fileLastModified' = '2021-04-14T14:22:54Z'
'nfo:fileLastModified' = '2021-04-14T14:22:54Z'
'nfo:fileName' = 'log-ToR001.txt'
'nfo:fileName' = 'log-ToR001.txt'
'nfo:fileSize' = '22008'
'nfo:belongsToContainer' = 'urn:bnode:aecc5093-b7bf-4584-8d66-c8ba10830e03'
'nfo:fileLastAccessed' = '2021-06-24T16:00:18Z'
'nie:isPartOf' = 'urn:bnode:aecc5093-b7bf-4584-8d66-c8ba10830e03'
'nie:interpretedAs' = 'urn:bnode:193af09e-2a44-47ce-845a-4bb47152e4ef'
'nie:dataSource' = 'urn:bnode:306154d9-03d5-463a-aa26-2361f226b268'
'nie:byteSize' = '22008'
'nie:url' = 'file:///home/ivan/OneDrive/Cases/2021-04/5354807557/log-ToR001.txt'
'http://purl.org/dc/elements/1.1/source' = 'urn:bnode:306154d9-03d5-463a-aa26-2361f226b268'
'http://purl.org/dc/elements/1.1/date' = '2021-04-14T14:22:54Z'
'http://purl.org/dc/elements/1.1/date' = '2021-04-14T14:22:54Z'
'http://purl.org/dc/elements/1.1/date' = '2021-06-24T16:00:18Z'
'nrl:modified' = '436'
'nrl:modified' = '436'
'nrl:added' = '2021-06-24T18:46:06Z'
'nrl:added' = '2021-06-24T18:46:06Z'
'rdf:type' = 'http://www.w3.org/2000/01/rdf-schema#Resource'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nie#DataObject'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nfo#FileDataObject'
'rdf:type' = 'http://www.w3.org/2000/01/rdf-schema#Resource'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nie#DataObject'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nfo#FileDataObject'
[arch]$ tracker3 info -c putty-log_TOR001.log
Querying information for entity: 'putty-log_TOR001.log'
'file:///home/ivan/OneDrive/Cases/2021-04/5354807557/putty-log_TOR001.log'
Results:
'nfo:fileLastModified' = '2021-04-14T14:23:07Z'
'nfo:fileName' = 'putty-log_TOR001.log'
'nfo:fileSize' = '676903'
'nfo:belongsToContainer' = 'urn:bnode:aecc5093-b7bf-4584-8d66-c8ba10830e03'
'nfo:fileLastAccessed' = '2021-06-24T16:00:18Z'
'nie:isPartOf' = 'urn:bnode:aecc5093-b7bf-4584-8d66-c8ba10830e03'
'nie:dataSource' = 'urn:bnode:306154d9-03d5-463a-aa26-2361f226b268'
'nie:byteSize' = '676903'
'nie:url' = 'file:///home/ivan/OneDrive/Cases/2021-04/5354807557/putty-log_TOR001.log'
'http://purl.org/dc/elements/1.1/source' = 'urn:bnode:306154d9-03d5-463a-aa26-2361f226b268'
'http://purl.org/dc/elements/1.1/date' = '2021-04-14T14:23:07Z'
'http://purl.org/dc/elements/1.1/date' = '2021-06-24T16:00:18Z'
'nrl:modified' = '70'
'nrl:added' = '2021-06-24T18:46:06Z'
'rdf:type' = 'http://www.w3.org/2000/01/rdf-schema#Resource'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nie#DataObject'
'rdf:type' = 'http://tracker.api.gnome.org/ontology/v3/nfo#FileDataObject'
[arch]$ tracker3 extract putty-log_TOR001.log
[arch]$ tracker3 extract log-ToR001.txt
@prefix nie: <http://tracker.api.gnome.org/ontology/v3/nie#> .
@prefix nfo: <http://tracker.api.gnome.org/ontology/v3/nfo#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<file:///home/ivan/OneDrive/Cases/2021-04/5354807557/log-ToR001.txt> nie:plainTextContent "2021-04-12T14:>
a nfo:PlainTextDocument .
Any thoughts how to force tracker3 to extract plain text info from files which extensions are different from the default ones?
Thank you!