tildes

Commit Graph

Author	SHA1	Message	Date
Deimos	b011be34ef	Add simple metrics to event stream consumer jobs This adds some very simple metrics to all of the background jobs that consume the event streams. Currently, the only "real" metric is a counter tracking how many messages have been processed by that consumer, but a lot of the value will come from being able to utilize the automatic "up" metric provided by Prometheus to monitor and make sure that all of the jobs are running. I decided to use ports starting from 25010 for these jobs - this is completely arbitrary, it's just a fairly large range of unassigned ports, so shouldn't conflict with anything. I'm not a fan of how much hard-coding is involved here for the different ports and jobs in the Prometheus config, but it's also not a big deal.	6 years ago
Deimos	bcb5a3e079	Replace RabbitMQ uses with Redis streams RabbitMQ was used to support asynchronous/background processing tasks, such as determining word count for text topics and scraping the destinations or relevant APIs for link topics. This commit replaces RabbitMQ's role (as the message broker) with Redis streams. This included building a new "PostgreSQL to Redis bridge" that takes over the previous role of pg-amqp-bridge: listening for NOTIFY messages on a particular PostgreSQL channel and translating them to messages in appropriate Redis streams. One particular change of note is that the names of message "sources" were adjusted a little and standardized. For example, the routing key for a message caused by a new comment was previously "comment.created", but is now "comments.insert". Similarly, "comment.edited" became "comments.update.markdown". The new naming scheme uses the table name, proper name for the SQL operation, and column name instead of the previous unpredictable terms.	6 years ago
Deimos	416daf4d7d	Make embedly scraper skip inapplicable links And this is exactly why this needs to be refactored into a common behavior.	6 years ago
Deimos	31ab15fe51	Apply isort to make import styles consistent This adds settings into pyproject.toml for the isort tool to match up with the styles I've generally been using, and then applies it to the whole project (by running "isort -rc"). Most of these changes are very minor, but it's good to fix the few inconsistencies that were around.	7 years ago
Deimos	39665058d2	Embedly consumer: switch to get_appsettings() Using bootstrap() seems to cause issues with re-declaring the Prometheus metrics (which happens in the tweens that we don't really need or want anyway). There might be better ways to do this including not attaching the tweens for scripts, but this seems to work fine (and was already being done this way in the YouTube API consumer).	7 years ago
Deimos	bd350495a4	Re-queue topic for some consumers on link edit After editing a topic's link, we want to re-process it through the scrapers, setting the domain in its metadata, etc.	7 years ago
Deimos	ccb2fb5aa9	Remove obsolete __init__ return type annotations mypy 0.640 has made it so that it's no longer necessary to annotate the return type for __init__ methods, since it's always None. The only time it's necessary now is if the method doesn't have any arguments, since this shows that the method should still be type-checked.	7 years ago
Deimos	32bcbf1f95	Add timeout to Embedly scraper	7 years ago
Deimos	1e03c3df55	Revert using Embedly to canonicalize link topics This isn't working very well in a lot of cases, shouldn't be used until I've got some workarounds for a lot of the issues that I'm finding. This reverts commit `369f273f8e`.	7 years ago
Deimos	369f273f8e	Use Embedly result to canonicalize link topics As part of scraping a link, Embedly will often remove tracking vars from the query, follow redirects, and so on. This will start using the url returned back from an Embedly result to replace the one that was originally submitted when it was different (though the original one will still be kept in the original_url column).	7 years ago
Deimos	db9d485cc5	Add the Embedly Extract scraper and consumer This adds a consumer (in prod only) that uses Embedly's Extract API to scrape the links from all new link topics and stores some of the data in the topic's content_metadata column.	7 years ago

11 Commits (b011be34ef03b19bc8f29b77c169bcb02494cf09)