* Define project as a Go module and update dependency versions
Signed-off-by: Nikos Filippakis <me@nfil.dev>
* Update docs, configs and dockerfile to use latest Go version
Signed-off-by: Nikos Filippakis <me@nfil.dev>
* Add postgres database driver
Signed-off-by: Nikos Filippakis <me@nfil.dev>
This is an attempt to fix#133.
Previously, we just clobbered the recent GUIDs with the lastest response
every single time, assuming that Atom/RSS feeds would consistently return the
same items. This appears to not be the case. In the wild, the number of items
returned on a single request can vary (sometimes even being 1 or 2 when usually
it is 50!).
This patch alters *how many* and *which* GUIDs we keep between requests, in an
attempt to prevent sending old news for buggy RSS feeds.
In the wild it looks like some RSS feeds will occasionally return 0 items
to requests *but not return an error*. This previously meant we would clobber
our knowledge of recent GUIDs with the empty set. This meant that the next
successful poll would resend the **entire** RSS feed.
We previously relied on the published date and GUIDs to determine which items
were new. We now only rely on the GUID and not the published date as a lot of
RSS feeds don't have published dates. A lot of feeds don't have GUIDs either,
so we now fallback to the HTTP `Link` field, or worst-case, the item `Title`.
Some services, notably Reddit, basically force you to set a custom UA in order
to use their RSS feeds. All the common default non-browser UAs are HEAVILY
rate-limited such that you can't realistically use them.
Some RSS feeds will edit the published time of an item AFTER putting it out,
which resulted in RSS Bot sending the same article twice. We now remember the
"GUID" field for each item and de-dupe based on that. Normal timestamp
algorithm still applies.
This feels a lot better because now `OnPoll` works in a similar way to
`OnReceiveWebhook` (called on a `Service`) rather than have this strange
global-per-service-type struct.
Because apparently "Feed Reader" is the name of a thing popular enough that we
don't want to step on toes, so let's call it the name of a less popular thing.
Just need to send messages into rooms now for a first cut to be done. Notable
improvements to make:
- We currently do 1 goroutine per service. This could be bad if we have lots of these things running around.
- We do not cache the response to RSS feeds. If we have 10 independent services on the same feed URL, we will
hit the URL 10 times. This is similar to how we currently do 1 webhook/service, so it's plausible that in
the future we will want to have some kind of generic caching layer.
- We don't send messages to Matrix yet. We need a `Clients` instance but can't get at one. There's only ever
one, so I wonder if we should global it like we do with `GetServiceDB()` for ease of use?
- The polling interval is divorced from the actual feed repoll time. Ideally we would schedule the goroutine
only when we need it, rather than checking frequently, determining we have nothing to do, and going back
to sleep.