We've been using pts_lbsearch on the text file for a few weeks now, and
it's working fine. Checks generally seem to take about 10 ms, and that's
totally fine for the relatively uncommon events of registrations and
password changes.
This removes everything related to the previous Redis-based method,
which means we no longer need the second Redis server or the ReBloom
module.
This replaces the current method of using a Bloom filter in Redis to
check for breached passwords with searching the text file directly using
pts_lbsearch (https://github.com/pts/pts-line-bisect/).
I'm not removing the Redis-based method yet because I want to test the
performance of this first, but this is *far* simpler and doesn't have
the possibility for false positives like the Bloom filter does.
Installs the Nu Html Checker and starts using it to validate the home
page's HTML: https://validator.github.io/validator/
Also includes fixes to some lists that were nested in an invalid way.
This reverts commit cb7be83877.
HTML Tidy seems to have various gaps in its validation that we've found
already, including one that's pretty much a deal-breaker for Tildes's
HTML: it doesn't think that <menu> is a valid parent for <li>.
We're looking at alternative validators still.
Adds the HTML Tidy library to the dev version, along with the pytidylib
wrapper for it, and a couple of tests that use it to validate the HTML
of the home page.
Includes a fix to the GitLab "Planned features" link that Tidy considers
invalid because it includes some un-encoded characters.
Trying to change the mode of this file (which often already exists)
fails on Windows. It seems fine to just not set it and let it be set to
the default.
The generate_site_icons_css cronjob will create this file, but the site
won't work before it exists, so there's a (less than 5 min) gap where
the site is broken when first set up. This probably won't be noticeable
in dev/prod setups, but breaks things like CI setups where everything is
getting created freshly each time.
This makes sure that the file always exists on initial setup and
whenever the Salt states are re-run.
This adds some very simple metrics to all of the background jobs that
consume the event streams. Currently, the only "real" metric is a
counter tracking how many messages have been processed by that consumer,
but a lot of the value will come from being able to utilize the
automatic "up" metric provided by Prometheus to monitor and make sure
that all of the jobs are running.
I decided to use ports starting from 25010 for these jobs - this is
completely arbitrary, it's just a fairly large range of unassigned
ports, so shouldn't conflict with anything.
I'm not a fan of how much hard-coding is involved here for the different
ports and jobs in the Prometheus config, but it's also not a big deal.
This enables me to set a ban expiry time for a user (manually, in the
database). By doing so:
* The user's page will say that they're temporarily banned, and show the
date their ban will be lifted.
* If the user tries to log in, it will say they're temporarily banned,
and give a specific datetime that the ban will be lifted by.
* An hourly cronjob will lift any bans that have expired.
This is a prometheus exporter that allows checking IPv4 and IPv6
responses, among other things. This sets it up to make sure that the
site is responding over both IPv4 and IPv6, so that I can monitor and
set up an alert if either stops working.
This starts using webassets for the site-icons.css file inside the base
template so that a cache-busting "version" string is added after the
filename as a query variable (as was already being done with the other
CSS and JS files).
It also creates a new service that's triggered by a "path changed" event
on site-icons.css, which causes gunicorn to reload. This should mean
that whenever the site-icons.css file is updated by the cronjob that
generates it, gunicorn will automatically reload and update the
cache-busting string for the CSS file, causing users' browsers to update
to the newest version.
This is just using the recommendations from PGTune for a web application
being hosted on a server with the prod server's specs. I'm sure they're
not the best values, but should be better than the defaults.
This removes RabbitMQ as well as everything else attached to it:
Erlang; the Prometheus collector; the pg-amqp-bridge and all PostgreSQL
functions and triggers; and the amqpy Python package and the Tildes code
that used it.
Note that this commit does not actually uninstall or delete any of these
packages or services, so if you have a running instance that you want to
keep (instead of re-provisioning from scratch), you will need to
manually remove them if you want them completely gone.
RabbitMQ was used to support asynchronous/background processing tasks,
such as determining word count for text topics and scraping the
destinations or relevant APIs for link topics. This commit replaces
RabbitMQ's role (as the message broker) with Redis streams.
This included building a new "PostgreSQL to Redis bridge" that takes
over the previous role of pg-amqp-bridge: listening for NOTIFY messages
on a particular PostgreSQL channel and translating them to messages in
appropriate Redis streams.
One particular change of note is that the names of message "sources"
were adjusted a little and standardized. For example, the routing key
for a message caused by a new comment was previously "comment.created",
but is now "comments.insert". Similarly, "comment.edited" became
"comments.update.markdown". The new naming scheme uses the table name,
proper name for the SQL operation, and column name instead of the
previous unpredictable terms.
gunicorn 20.0.0 included a change so that it will no longer read server
configuration out of Paster files. Because of this, the settings for it
in development.ini and production.ini were no longer being used. This
resulted in the auto-reloading no longer working in dev, and the number
of workers being reduced back down to 1 in production. The socket/PID
may have been impacted as well.
This commit moves the configuration into command-line args used to
launch gunicorn, and uses a pillar variable to handle the args different
between dev and prod.
This makes it so that posts (both topics and comments) can no longer be
voted on after they're over 30 days old. An hourly cronjob makes this
"official" by updating a flag on the post indicating that voting is
closed. The daily clean_private_data script then deletes all individual
vote records for posts with closed voting, and the triggers on the
voting tables have been updated to not decrement the vote totals when
these deletions happen.
The net result of this is that Tildes only stores users' votes for a
maximum of 30 days, removing a lot of sensitive/private data that builds
up over the long term.
This didn't get updated when boussole was split out to its own
virtualenv, and was still being linked to the pip installs from the
application succeeding.
Previously, the virtualenvs were owned by root and the pip installs were
done as root as well. This worked fine, but it meant that I can't use
pip-tools' pip-sync function without sudo. This makes it simpler by
giving ownership to the app user (tildes in prod, vagrant in dev).
I'm going to start using pip-tools to manage dependencies:
https://github.com/jazzband/pip-tools
This makes updating the dependencies and virtualenv easier in a few
ways, and makes it simple to keep dev dependencies split out (so I can
stop installing them in production).
Now, to do a check and update all packages to their newest versions, the
main command is:
pip-compile --no-header --upgrade requirements.in
and again with requirements-dev.in to update that one as well. This will
update all the package versions in requirements.txt and
requirements-dev.txt. The virtualenv can then be updated to match those
versions by running:
pip-sync requirements.txt
(or requirements-dev.txt for dev environment). This currently needs to
be run with sudo, but I'm going to try to fix that shortly.
This installs PL/Python (specifically plpython3u), enables it in the
database, and creates a function id36_to_id that calls the Python
function with the same name inside the tildes.lib.id module. This will
enable doing queries similar to this, when I have a topic's ID36 from
the site:
SELECT * FROM topics WHERE topic_id = id36_to_id('asdf');
The fact that this was possible to set up without having to port the
id36_to_id logic to a different language is blowing my mind a little.
There are some really interesting possibilities from being able to
import all of the Python code into the database itself.
Changing these pillar values are the only actual changes to Tildes
code/config needed, but if you're upgrading an existing version from 10
to 12 you will need to do some manual steps. The below should cover it -
lines starting with a * are descriptions of things you need to do, while
the rest are actual commands to run:
sudo apt-get install postgresql-12
sudo systemctl stop postgresql@10-main.service
sudo systemctl stop postgresql@12-main.service
cd /var/lib/postgresql
sudo -u postgres /usr/lib/postgresql/12/bin/pg_upgrade -b /usr/lib/postgresql/10/bin/ -B /usr/lib/postgresql/12/bin/ -d /var/lib/postgresql/10/main/ -D /var/lib/postgresql/12/main/ -o '-c config_file=/etc/postgresql/10/main/postgresql.conf' -O '-c config_file=/etc/postgresql/12/main/postgresql.conf'
* Change pillar value to 12, and run salt
sudo systemctl stop postgresql@10-main.service
* Edit /etc/postgresql/12/main/postgresql.conf and change port to 5432
sudo systemctl restart postgresql@12-main.service
sudo -u postgres ./analyze_new_cluster.sh
* After verifying the new version seems to be working, clean up the old version:
sudo apt-get remove postgresql-10
sudo rm -rf /usr/lib/postgresql/10/
sudo rm -rf /var/lib/postgresql/10/
sudo rm -rf /etc/postgresql/10/
This adds the backend for scheduled topics, which can be set up to post
at a certain time and then (optionally) repeat on a schedule.
Currently, these topics must be text topics, and can have their title,
markdown, and tags set up. They can be configured to be posted by a
particular user, but if no user is chosen they will be posted by a
(newly added) generic user named "Tildes" that is intended to be used
for "owning" automatic actions like this.
Apparently add_header inside a location block doesn't... you know,
actually work. This should be reasonable, but I'd still rather only
allow the Stripe JS on the single page where it's necessary.
The previous method of doing this could cause redis to try to start up
(via restart) earlier than it should. By using require_in and watch_in,
it should now only start up in the first place once this service has
been started first, and it will also cause redis to restart if it ever
needs to run again in the future.
Vagrant on Windows has issues with creating symlinks inside shared
folders - it requires a permission that isn't granted to a user by
default. This can be fixed by changing security policies, but for our
purposes we don't need the symlinks anyway, and can run the tools
manually like this, instead of using the .bin/ symlinks.
Previously tild.es urls would proxy_pass through to the views inside the
Pyramid app, but this caused strange behavior in some cases. For
example, anything that caused a 404 response would end up in a broken
page that still appeared to be on the tild.es domain, but would be an
HTML-only page coming from the app, since the CSS and JS would not be
available.
This method is still a bit weird in some ways (now you'll end up on a
404 page at https://tildes.net/shortener/... instead), but I think it's
an improvement overall.
This changes the "activity" topic-sorting method to look for
"interesting" activity instead of everything, and adds a new "All
activity" method that retains the previous behavior.
Currently, "interesting activity" excludes any comments that have active
Noise, Offtopic, or Malice labels, or any of their children. These
checks are also done based on labeling activity, so for example if
someone posts a new comment it will bump the thread initially, but if
that comment is then labeled as Noise, the thread will "un-bump" and go
back to its previous position in the Activity sort.
There were also some other minor changes made to appearance to support
adding another sorting option, such as shortening the displayed names on
the "tabs", like showing "Votes" instead of "Most votes". This probably
needs some further work, but is okay for now.
This won't affect requests for static files or anything except ones that
get proxied to the app.
The current configuration is based on IP, and allows a rate of 4/sec,
with an additional burst of 5 above the limit permitted, and burst
requests allowed to go through immediately (nodelay). For more info:
https://www.nginx.com/blog/rate-limiting-nginx/
Sometimes the database initialization fails, generally due to some
earlier step in the setup having an issue. Even if a re-provision
resolves that issue, the database init wouldn't be re-run since it was
set up to only happen after the database was created.
This changes it so that it will try to select from the users table, and
if that fails it will re-run the initialization.