tildes

Commit Graph

Author	SHA1	Message	Date
Deimos	ed38ce5790	Type annotations: use standard generics (PEP 585) As of Python 3.9, it's no longer necessary to import things like List and Dict from the typing module, and we can just use the built-in types like this.	4 years ago
Deimos	31afe0a8ba	Update mypy to 0.910 This also involved installing some new packages for the type stubs for a few of the major third-party libraries. I also had to change some of the imports in some model files in strange ways, I'm not sure why some of these were necessary. I suspect this might be a bug in mypy, but I'm not sure if I'll be able to build a reproduction of it to be able to report it.	4 years ago
Deimos	5fbc72c44c	Add ability to process posts with Lua scripts This adds the backend pieces (no interface yet) to configure Lua scripts that will be applied to topics and comments due to different events. Initially, it only supports running a script when a new topic or comment is posted. For example, here is a Lua script that would prepend a new topic's title with "[Text] " or "[Link] " depending on its type, as well as replace its tags with either "text" or "link": function on_topic_post (topic) if (topic.is_text_type) then topic.title = "[Text] " .. topic.title topic.tags = {"text"} elseif (topic.is_link_type) then topic.title = "[Link] " .. topic.title topic.tags = {"link"} end end There can be a global script as well as group-specific scripts, and the scripts are sandboxed, with limited access to data as well as being restricted to a subset of Lua's built-in functions. The Lua sandboxing code comes from Splash (https://github.com/scrapinghub/splash). It will need to be modified, but this commit keeps it unmodified so that future changes can be more easily tracked by comparing to the original state of the file. The sandboxing also includes some restrictions on number of instructions and memory usage, but this might be more effectively managed on the OS level. More research will still need to be done on security and resource restrictions before this feature can be safely opened to users.	5 years ago
Deimos	b011be34ef	Add simple metrics to event stream consumer jobs This adds some very simple metrics to all of the background jobs that consume the event streams. Currently, the only "real" metric is a counter tracking how many messages have been processed by that consumer, but a lot of the value will come from being able to utilize the automatic "up" metric provided by Prometheus to monitor and make sure that all of the jobs are running. I decided to use ports starting from 25010 for these jobs - this is completely arbitrary, it's just a fairly large range of unassigned ports, so shouldn't conflict with anything. I'm not a fan of how much hard-coding is involved here for the different ports and jobs in the Prometheus config, but it's also not a big deal.	5 years ago
Alexander Bliskovsky	d75a6fc547	Add detection for IP addresses in the domain parser	6 years ago
Deimos	bcb5a3e079	Replace RabbitMQ uses with Redis streams RabbitMQ was used to support asynchronous/background processing tasks, such as determining word count for text topics and scraping the destinations or relevant APIs for link topics. This commit replaces RabbitMQ's role (as the message broker) with Redis streams. This included building a new "PostgreSQL to Redis bridge" that takes over the previous role of pg-amqp-bridge: listening for NOTIFY messages on a particular PostgreSQL channel and translating them to messages in appropriate Redis streams. One particular change of note is that the names of message "sources" were adjusted a little and standardized. For example, the routing key for a message caused by a new comment was previously "comment.created", but is now "comments.insert". Similarly, "comment.edited" became "comments.update.markdown". The new naming scheme uses the table name, proper name for the SQL operation, and column name instead of the previous unpredictable terms.	6 years ago
Deimos	73e94ca7c2	Activity sort: don't bump from deep-nested replies I'm not totally sure about this and it probably shouldn't always work this way, but I want to try it.	6 years ago
Deimos	416daf4d7d	Make embedly scraper skip inapplicable links And this is exactly why this needs to be refactored into a common behavior.	6 years ago
Deimos	31ab15fe51	Apply isort to make import styles consistent This adds settings into pyproject.toml for the isort tool to match up with the styles I've generally been using, and then applies it to the whole project (by running "isort -rc"). Most of these changes are very minor, but it's good to fix the few inconsistencies that were around.	6 years ago
Deimos	3dee9a9251	Fix issues with "uninteresting" replies This would crash if any (or all) of the replies returned None, since max() can't handle None values.	6 years ago
Deimos	5e1197b0c6	Base activity sorting on "interesting" activity This changes the "activity" topic-sorting method to look for "interesting" activity instead of everything, and adds a new "All activity" method that retains the previous behavior. Currently, "interesting activity" excludes any comments that have active Noise, Offtopic, or Malice labels, or any of their children. These checks are also done based on labeling activity, so for example if someone posts a new comment it will bump the thread initially, but if that comment is then labeled as Noise, the thread will "un-bump" and go back to its previous position in the Activity sort. There were also some other minor changes made to appearance to support adding another sorting option, such as shortening the displayed names on the "tabs", like showing "Votes" instead of "Most votes". This probably needs some further work, but is okay for now.	6 years ago
Deimos	39665058d2	Embedly consumer: switch to get_appsettings() Using bootstrap() seems to cause issues with re-declaring the Prometheus metrics (which happens in the tweens that we don't really need or want anyway). There might be better ways to do this including not attaching the tweens for scripts, but this seems to work fine (and was already being done this way in the YouTube API consumer).	6 years ago
Deimos	bd350495a4	Re-queue topic for some consumers on link edit After editing a topic's link, we want to re-process it through the scrapers, setting the domain in its metadata, etc.	6 years ago
Deimos	f4c4973dc0	Stop using a spritesheet for site-icons The site-icons spritesheet has already become unwieldy - it's almost 1MB, is mostly rarely-needed icons, and needs to be fully replaced and re-downloaded whenever a new icon is added. With HTTP/2 now being widely supported, spritesheets seem to be mostly obsolete, and I probably never should have done it that way in the first place. This commit changes over to simply using individual icon images, and rebuilds the CSS file whenever new icons are downloaded. This new CSS file will probably be somewhat large, but should gzip extremely well. This probably still needs some work to support cache-busting on the CSS file.	6 years ago
Deimos	1537785c2d	YoutubeScraper: handle API returning blank result	7 years ago
Deimos	845281796a	Add scraper for YouTube Data API A lot of the code in common between this and the EmbedlyScraper should probably be generalized out to a base class soon, but let's make sure this works first.	7 years ago
Deimos	ccb2fb5aa9	Remove obsolete __init__ return type annotations mypy 0.640 has made it so that it's no longer necessary to annotate the return type for __init__ methods, since it's always None. The only time it's necessary now is if the method doesn't have any arguments, since this shows that the method should still be type-checked.	7 years ago
Deimos	9056a71013	Add JPEG support to site icon downloader I didn't know if this would ever be used, and then a topic was posted almost immediately that needed it.	7 years ago
Deimos	9ae126ce26	Improve exception-handling in site icon downloader Handles some more errors that came up in practice when I applied this to the production data.	7 years ago
Deimos	2a19aa20ce	Add a consumer to automatically download favicons This adds a trigger to the scraper_results table which will add rabbitmq messages whenever a scrape finishes, as well as a consumer that picks up these messages, and uses Embedly data to download (and resize if necessary) the favicons from any sites that are scraped. These are downloaded into the input folder for the site-icons-spriter, so it should be able to use these to generate spritesheets.	7 years ago
Deimos	32bcbf1f95	Add timeout to Embedly scraper	7 years ago
Deimos	1e03c3df55	Revert using Embedly to canonicalize link topics This isn't working very well in a lot of cases, shouldn't be used until I've got some workarounds for a lot of the issues that I'm finding. This reverts commit `369f273f8e`.	7 years ago
Deimos	369f273f8e	Use Embedly result to canonicalize link topics As part of scraping a link, Embedly will often remove tracking vars from the query, follow redirects, and so on. This will start using the url returned back from an Embedly result to replace the one that was originally submitted when it was different (though the original one will still be kept in the original_url column).	7 years ago
Deimos	85cf709d6f	Topic metadata generator: skip deleted topics Not really a big deal, but deleted topics are getting sent back through this consumer when the clean_private_data script erases their data, since that changes the markdown and puts them into the topic.edited queue. There shouldn't be any reason to process deleted topics and re-add "blank" metadata (0 word count, no excerpt), so we can just skip them.	7 years ago
Deimos	db9d485cc5	Add the Embedly Extract scraper and consumer This adds a consumer (in prod only) that uses Embedly's Extract API to scrape the links from all new link topics and stores some of the data in the topic's content_metadata column.	7 years ago
Deimos	206606fd59	Topic metadata generator: update, don't replace Previously, any topic processed by this consumer would have its content_metadata completely replaced. This won't work once other consumers or processes start being able to set that data, since we don't know that this one will always run first. This commit updates the method the consumer uses so that it will keep any data that's already in the topic's content_metadata column if necessary. It would probably be good to generalize this method out somehow so that it can be used in other places more easily.	7 years ago
Deimos	9ab8ad56b4	Add license and copyright info to all source files This follows the REUSE practices to add license and copyright info to all source files: https://reuse.software/practices/2.0/ In addition, LICENSE.md was switched to a plaintext LICENSE file, to support the tag-value header as recommended. Note that files that are closer to configuration than code did not have headers added. This includes all Salt files, Alembic files, and Python files such as most __init__.py files that only import other files, since those are similar to header files which are not considered copyrightable.	7 years ago
Chad Birch	9e57129ddd	Display excerpt on collapsed comments Similar to the excerpts stored and displayed for text topics, this stores excerpts for comments and displays them when the comment is collapsed.	7 years ago
Deimos	09cf3c47f4	Apply Black code formatter This commit contains only changes that were made automatically by Black (except for some minor fixes to string un-wrapping and two format-disabling blocks in the user and group schemas). Some manual cleanup/adjustments will probably need to be made in a follow-up commit, but this one contains the result of running Black on the codebase without significant further manual tweaking.	7 years ago
Deimos	f388b86306	Skip deleted/removed comments in mentions consumer	7 years ago
Celeo	420ea5a15b	Add notifications for users being mentioned This detects mentions of users in comments using the same pattern as the markdown parsing uses to generate user links. Mentioned users are sent a notification, and mentions are added/deleted if needed on comment edits. As part of this, setup was done to generate rabbitmq messages for comment creation and edits, and the mentions are handled by an async consumer of these messages.	7 years ago
Deimos	e980ab3bda	Initial open-source release	7 years ago

32 Commits (17b43f66a18468be2dd55df68fd0f2122f75e136)