pubsub/cache: PubSub Cache Management¶
Libervia runs transparently a cache for pubsub. That means that according to internal criteria, some pubsub items are stored locally.
The cache
subcommands let user inspect and manipulate the internal cache.
get¶
Retrieve items from internal cache only. Most end-users won’t need to use this command, as
the usual pubsub get
command will use cache transparently. However, it may be useful
to inspect local cache, notably for debugging.
The parameters are basically the same as for get.
example¶
Retrieve the last 2 cached items for personal blog:
$ li pubsub cache get -n urn:xmpp:microblog:0 -M 2
sync¶
Synchronise or resynchronise a pubsub node. If the node is already in cache, it will be deleted then re-cached. Node will be put in cache even if internal policy doesn’t request a synchronisation for this kind of nodes. Node will be (re-)subscribed to keep cache synchronised.
All items of the node (up to the internal limit which is high), will be retrieved and put in cache, even if a previous version of those items have been deleted by the purge command.
example¶
Resynchronise personal blog:
$ li pubusb cache sync -n urn:xmpp:microblog:0
purge¶
Remove items from cache. This may be desirable to save resource, notably disk space.
Note that once a pubsub node is cached, the cache is the source of trust. That means that
if cache is not explicitly bypassed when retrieving items of a pubsub node (notably with
the -C, --no-cache
option of get), only items found in cache will be
returned, thus purged items won’t be used or returned anymore even if they still exists on
the original pubsub service.
If you have purged items by mistake, it is possible to retrieve them either node by node using sync, or by resetting the whole pubsub cache with reset.
If you have a node or a profile (e.g. a component) caching a lot of items frequently, you may use this command using a scheduler like cron.
examples¶
Remove all blog and event items from cache if they haven’t been updated since 6 months:
$ li pubsub cache purge -t blog -t event -b "6 months ago"
Remove items from profile ap_gateway
if they have been created more that 2 months
ago:
$ li pubsub cache purge -p ap_gateway --created-before "2 months ago"
reset¶
Reset the whole pubsub cache. This means that all nodes and all them items will be removed from cache. After this command, cache will be re-filled progressively as if it where a new one.
Note
Use this command with caution: even if cache will be re-constructed with time, that means that items will have to be retrieved again, that may be resource intensive both for your machine and for the pubsub services which will be used. That also means that searching items will return less results until all desired items are cached again.
Also note that all items of cached nodes are retrieved, even if you have previously purged items, they will be retrieved again.
example¶
Reset the whole pubsub cache:
$ li pubsub cache reset
search¶
Search items into pubsub cache. The search is done on the whole cache, it’s not restricted
to a single node/profile (even if it may be if suitable filters are specified). Full-Text
Search can be done with -f FTS, --fts FTS
argument, as well as filtering on parsed
data (with -F PATH OPERATOR VALUE, --field PATH OPERATOR VALUE
, see below).
By default, parsed data are returned, with the 3 additional keys pubsub_service
,
pubsub_items
(the search being done on the whole cache, those data are here to get the
full location of each item) and node_profile
.
“Parsed data” are the result of the parsing of the items XML payload by feature aware
plugins. Those data are usually more readable and easier to work with. Parsed data are
only stored when a parser is registered for a specific feature, that means that a Pubsub
item in cache may not have parsed data at all, in which case an empty dict will be used
instead (and -P, --payload
argument should be used to get content of the item).
The dates are normally stored as Unix time in database, but the default output convert
the updated
, created
and published
fields to human readable local time. Use
--output simple
if you want to keep the float (or int) value.
XML item payload is not returned by default, but it can be added to the item_payload
field if -P, --payload
argument is set. You can also use the --output xml
(or
xml_raw
if you don’t want prettifying) to output directly the highlighted XML
— without the parsed data —, to have an output similar to the one of li pubsub get
.
If you are interested only in a specific data (e.g. item id and title), the -k KEY,
--key KEY
can be used.
You’ll probably want to limit result size by using -l LIMIT, --limit LIMIT
, and do
pagination using -i INDEX, --index INDEX
.
Filters¶
By default search returns all items in cache, you have to use filter to specify what you are looking after. We can split filters in 3 categories: nodes/items metadata, Full-Text Search query and parsed metadata.
Nodes/items metadata are the generic information you have on a node: which profile it belong too, which pubsub service it’s coming from, what’s the name or type of the node, etc.
Arguments there should be self-explanatory. Type (set with -t TYPE, --type TYPE
) and
subtype (set with -S SUBTYPE, --subtype SUBTYPE
) are values dependent of the
plugin/feature associated with the node, so we can’t list them in an exhaustive way here.
The most common type is probably blog
, from which a subtype can be comment
. An
empty string can be used to find items with (sub)type not set.
It’s usually a good idea to specify a profile with -p PROFILE, --profile PROFILE
,
otherwise you may get duplicated results.
Full-Text Search¶
You can specify a Full-Text Search query with the -f FTS_QUERY, --fts FTS_QUERY
argument. The engine is currently SQLite FTS5, and you can check its query syntax.
FTS is done on the whole raw XML payload, that means that all data there can be matched
(including XML tags and attributes).
FTS queries are indexed, that means that they are fast and efficient.
Note
Futures version of Libervia will probably include other FTS engines (support for PostgreSQL and MySQL/MariaDB is planned). Thus the syntax may vary depending on the engine, or a common syntax may be implemented for all engines in the future. Keep that in mind if you plan to use FTS capabilities in long-term queries, e.g. in scripts.
Parsed Metadata Filters¶
It is possible to filter on any field of parsed data. This is done with the -F PATH
OPERATOR VALUE, --field PATH OPERATOR VALUE
(be careful that the short option is an
uppercase F
, the lower case one being used for Full-Text Search).
Note
Parsed Metadata Filters are not indexed, that means that using them is less efficient than using e.g. Full-Text Search. If you want to filter on a text field, it’s often a good idea to pre-filter using Full-Text Search to have more efficient queries.
PATH
and VALUE
can be either specified as string, or using JSON syntax (if the
value can’t be decoded as JSON, it is used as plain text).
PATH
is the name of the field to use. If you must go beyond root level fields, you can
use a JSON array to specify each element of the path. If a string is used, it’s an object
key, if a number is used it’s an array index. Thus you can use title
to access the
root title key, or '"title"'
(JSON string escaped for shell) or '["title"]'
(JSON
array with the “title” string, escaped for shell).
Note
The extra fields pubsub_service
, pubsub_node
and node_profile
are added to
the result after the query, thus they can’t be used as fields for filtering (use the
direct arguments for that).
OPERATOR
indicate how to use the value to make a filter. The currently supported
operators are:
==
oreq
Equality operator, true if field value is the same as given value.
!=
orne
Inequality operator, true if the field value is different from given value.
>
orgt
Greater than, true if the field value is higher than given value. For string, this is according to alphabetical order.
Time Pattern can be used here, see below.
<
orlt
Lesser than, true if the field value is lower than given value. For string, this is according to alphabetical order.
Time Pattern can be used here, see below.
between
Given value must be an array with 2 elements. The condition is true if field value is between the 2 elements (for string, this is according to alphabetical order).
Time Pattern can be used here, see below.
in
Given value must be an array of elements. Field value must be one of them to make the condition true.
not_in
Given value must be an array of elements. Field value must not be any of them the make the condition true.
overlap
This can be used only on array fields.
If given value is not already an array, it is put in an array. Condition is true if any element of field value match any element of given value. Notably useful to filter on tags.
ioverlap
Same as
overlap
but done in a case insensitive way.disjoint
This can be used only on array fields.
If given value is not already an array, it is put in an array. Condition is true if no element of field value match any element of given value. Notably useful to filter out tags.
idisjoint
Same as
disjoint
but done in a case insensitive way.like
Does pattern matching on a string.
%
can be used to match zero or more characters and_
can be used to match any single character.If you’re not looking after a specific field, it’s better to use Full-Text Search when possible.
ilike
Like
like
but done in a case insensitive way.not_like
Same as
like
except that condition is true when pattern is not matching.not_ilike
Same as
not_like
but done in a case insensitive way.
For gt
/>
, lt
/<
and between
, you can use Time Pattern by using
the syntax TP(<time pattern>)
(see examples below).
Ordering¶
Result ordering can be done by a well know order, or using a parsed data field. Ordering
default to created
(see below), but this may be changed with -o ORDER [FIELD]
[DIRECTION], --order-by ORDER [FIELD] [DIRECTION]
.
ORDER
can be one of the following:
creation
Order by item creation date. Note that is this the date of creation of the item in cache (which most of time should correspond to order of creation of the item in the source pubsub service), and this may differ from the date of publication as specified with some feature (like blog). This is important when old items are imported, e.g. when they’re coming from an other blog engine.
modification
Order by the date when item has last been modified. Modification date is the same as creation date if the item has never been modified since it is in cache. The same warning as for
creation
applies: this is the date of last modification in cache, not the one advertised in parsed data.item_id
Order by XMPP id of the item. Notably useful when user-friendly ID are used (like it is often the case with blogs).
rank
Order item by Full-Text Search rank. This one can only be used when Full-Text Search is used (via
-f FTS_QUERY, --fts FTS_QUERY
). Rank is a value indicating how well an item match the query. This usually needs to be used withdesc
direction, so you get most relevant items first.field
This special order indicates that the ordering must be done on an parsed data field. The following argument is then the path of the field to used (which can be a plain text name of a root field, or a JSON encoded array). An optional direction can be specified as a third argument. See examples below.
examples¶
Search for blog items cached for the profile louise
which contain the word
Slovakia
:
$ li pubsub cache search -t blog -p louise -f Slovakia
Show title, publication date and id of blog articles (excluding comments) which have been
published on Louise’s blog during the last 6 months, order them by item id. Here we use an
empty string as a subtype to exclude comments (for which subtype is comment
):
$ li pubsub cache search -t blog -S "" -p louise -s louise@example.net -n urn:xmpp:microblog:0 -F published gt 'TP(6 months ago)' -k id -k published -k title -o item_id
Show all blog items from anywhere which are tagged as XMPP or ActivityPub (case insensitive) and which have been published in the last month (according to advertised publishing date, not cache creation date).
We want to order them by descending publication date (again the advertised publication date, not cache creation), and we don’t want more than 50 results.
We do a FTS query there even if it’s not mandatory, because it will do an efficient pre-filtering:
$ li pubsub cache search -f "xmpp OR activitypub" -F tags ioverlap '["xmpp", "activitypub"]' -F published gt 'TP(1 month ago)' -o field published desc -l 50