Browse Source

Use Embedly result to canonicalize link topics

As part of scraping a link, Embedly will often remove tracking vars from
the query, follow redirects, and so on. This will start using the url
returned back from an Embedly result to replace the one that was
originally submitted when it was different (though the original one will
still be kept in the original_url column).
merge-requests/37/head
Deimos 6 years ago
parent
commit
369f273f8e
  1. 4
      tildes/consumers/topic_embedly_extractor.py

4
tildes/consumers/topic_embedly_extractor.py

@ -66,6 +66,10 @@ class TopicEmbedlyExtractor(PgsqlQueueConsumer):
self.db_session.add(result) self.db_session.add(result)
# update the topic's link if embedly says the final url is different
if topic.link != result.data["url"]:
topic.link = result.data["url"]
new_metadata = EmbedlyScraper.get_metadata_from_result(result) new_metadata = EmbedlyScraper.get_metadata_from_result(result)
if new_metadata: if new_metadata:

Loading…
Cancel
Save