This adds some very simple metrics to all of the background jobs that
consume the event streams. Currently, the only "real" metric is a
counter tracking how many messages have been processed by that consumer,
but a lot of the value will come from being able to utilize the
automatic "up" metric provided by Prometheus to monitor and make sure
that all of the jobs are running.
I decided to use ports starting from 25010 for these jobs - this is
completely arbitrary, it's just a fairly large range of unassigned
ports, so shouldn't conflict with anything.
I'm not a fan of how much hard-coding is involved here for the different
ports and jobs in the Prometheus config, but it's also not a big deal.
RabbitMQ was used to support asynchronous/background processing tasks,
such as determining word count for text topics and scraping the
destinations or relevant APIs for link topics. This commit replaces
RabbitMQ's role (as the message broker) with Redis streams.
This included building a new "PostgreSQL to Redis bridge" that takes
over the previous role of pg-amqp-bridge: listening for NOTIFY messages
on a particular PostgreSQL channel and translating them to messages in
appropriate Redis streams.
One particular change of note is that the names of message "sources"
were adjusted a little and standardized. For example, the routing key
for a message caused by a new comment was previously "comment.created",
but is now "comments.insert". Similarly, "comment.edited" became
"comments.update.markdown". The new naming scheme uses the table name,
proper name for the SQL operation, and column name instead of the
previous unpredictable terms.
This adds settings into pyproject.toml for the isort tool to match up
with the styles I've generally been using, and then applies it to the
whole project (by running "isort -rc").
Most of these changes are very minor, but it's good to fix the few
inconsistencies that were around.
Using bootstrap() seems to cause issues with re-declaring the Prometheus
metrics (which happens in the tweens that we don't really need or want
anyway). There might be better ways to do this including not attaching
the tweens for scripts, but this seems to work fine (and was already
being done this way in the YouTube API consumer).
mypy 0.640 has made it so that it's no longer necessary to annotate the
return type for __init__ methods, since it's always None. The only time
it's necessary now is if the method doesn't have any arguments, since
this shows that the method should still be type-checked.
This isn't working very well in a lot of cases, shouldn't be used until
I've got some workarounds for a lot of the issues that I'm finding.
This reverts commit 369f273f8e.
As part of scraping a link, Embedly will often remove tracking vars from
the query, follow redirects, and so on. This will start using the url
returned back from an Embedly result to replace the one that was
originally submitted when it was different (though the original one will
still be kept in the original_url column).
This adds a consumer (in prod only) that uses Embedly's Extract API to
scrape the links from all new link topics and stores some of the data in
the topic's content_metadata column.