Youtube scraping broke earlier on a crazy duration of "P30W2DT8H2M32S"
(30 weeks?!), so I updated the parsing a little to be able to handle
that, and also not crash the consumer if it hits a duration that it
can't handle.
Now that all links in text have underlines by default, I think this
looks pretty strange for ~group and @user links, which are quite common
and unnecessary to have underlined all the time. This modifies the
markdown parser to add link-user and link-group classes to these links,
which allows them to be styled differently.
In addition, some of the markdown tests needed to be changed to use
BeautifulSoup instead of simple string-matching, since it's not as
simple to recognize links any more (and the order of attrs might
change).
This is a bit flimsy, but when I started looking at applying the
existing transformations to old posts, I found the Paradox forums as an
example of links that became broken after they were processed (because
"fixing" their links ends up breaking them).
This will give a way to exempt any other domains or urls that end up
being a problem, though over the long term it would probably be better
to make this database-based instead of code-based.
I did a bad job of testing how this would work, and didn't account
properly for people being considered not-logged-in while they're on
tild.es. This should fix the issues, but it re-adds the minor "data
leak" of people without an invite being able to determine some data,
such as existing users and groups. Since the site is going to be
publicly visible in the near future, I don't think this is a significant
concern.
Yet another issue with Bleach 3.0's linkification: when used via the
filter as part of sanitization (which is necessary right now due to
*another* issue where it escapes valid HTML tags), it doesn't properly
linkify urls that contain an ampersand.
As a (temporary?) workaround, this stops using Bleach's linkification
entirely and switches to cmark-gfm's "autolink" extension. These aren't
perfectly equivalent, and the switch results in two other issues that I
consider more minor than links including ampersands not working:
- autolink will initially create links for ftp:// urls and email
addresses. The final sanitization will remove these links due to the
protocol whitelist, but it will leave behind a bare <a> tag. So the
text will *appear* linked but not actually link to anything. If I
decide to stick with autolink, it should be pretty straightforward to
fix this by stripping all bare <a> tags from the final HTML.
- autolink doesn't create links for bare domains. For example, writing
"example.com" won't result in a link, it's necessary to write
"www.example.com" or include a protocol like "http://example.com".
As of version 3.0, the redis-py package no longer has a distinction
between its Redis and StrictRedis classes, and both behave the same
(StrictRedis is just an alias for Redis).
This would have continued working as-is, but we might as well switch it
back to the normal name now that StrictRedis doesn't have any benefit.
Until now, if people want their account deleted, I've just been banning
it. This will let me do it properly, but some additional cleanup should
be added in the future once I think through what's safe to get rid of
from deleted accounts.
Bleach supports adding html5lib filters to the cleaning process, which
is how its linkify() process works, as well as being how I implemented
the custom Tildes linkification for usernames and group names.
This commit switches to adding those filters into the clean() call,
which makes it so everything is now done in a single pass instead of
three. In addition, this also fixes the issues we were having with
bleach "fixing" things that it thought were invalid HTML (when they were
just people writing things inside angle brackets).
Updates the Black code-formatter to its newest version and applies it.
They made some small changes to how it handles numeric literals that
affected a few files - both always forcing a "0." prefix on float values
instead of allowing an implicit "0" to be there, as well as only adding
underscore separators if the number has at least 6 digits.
We're always using lowercase for all ltree usages (group paths, topic
tags), so it's best to just have the Ltree field do this conversion.
This fixes some minor issues like tag-filtering not working if the
casing of the tag was wrong.
A few other changes needed to be made as part of this, in places where I
was inadvertently passing an Ltree value into the Ltree field, instead
of a string.
This follows the REUSE practices to add license and copyright info to
all source files: https://reuse.software/practices/2.0/
In addition, LICENSE.md was switched to a plaintext LICENSE file, to
support the tag-value header as recommended.
Note that files that are closer to configuration than code did not have
headers added. This includes all Salt files, Alembic files, and Python
files such as most __init__.py files that only import other files, since
those are similar to header files which are not considered
copyrightable.
These disables no longer seem to be necessary, due to switching to
Prospector. Some may be related to newer versions of astroid, pylint, or
other reasons.
Previously this was also trying to catch ones at the beginning of new
paragraphs, but that seems to mostly just be causing unexpected issues
when people create ordered lists with a blank line between items. This
can probably be done properly in the future, but just restricting it to
the start of posts is probably better for now.
Black won't re-wrap comments because it has no way to determine when
that's a "safe" thing to do (that might break deliberate line-breaks).
So this is a quick pass through to re-wrap most multi-line comments and
docstrings to the new max line-length of 88.
This commit contains only changes that were made automatically by Black
(except for some minor fixes to string un-wrapping and two
format-disabling blocks in the user and group schemas). Some manual
cleanup/adjustments will probably need to be made in a follow-up commit,
but this one contains the result of running Black on the codebase
without significant further manual tweaking.
An example was recently added to the github cmark repo to show how to
set up the extensions from Python, so this is heavily based on that
code:
https://github.com/github/cmark/blob/master/wrappers/wrapper_ext.py
This should also fix a memory leak, since I wasn't manually freeing the
returned buffer (as the library recommends that you do).
Just a couple more tests for comment permissions that are more essential
to be working correctly - replying in locked threads, and viewing
removed comments. Also changes the "deleted comments lose all
permissions" test slightly to actually check all permissions instead of
a hard-coded (and obsolete) set of them.
Unfortunately some of the triggers aren't currently testable due to
the fact that postgresql's NOW() and similar functions will always
return the same value during the entire test transaction, but this at
least tests a couple more of the behaviors.
This moves some of the commonly-used fixtures (creating a topic, etc.)
into a separate module, which gets included into conftest.py (so the
fixtures are available everywhere) by treating it as a "plugin".
There was some special handling of apostrophes in two string-related
functions: the one for generating url slugs, as well as the one for
doing a word count. Both of these weren't handling "curly" apostrophes
(unicode char 0x2019) properly before, so they've both been updated now.
This detects mentions of users in comments using the same pattern as the
markdown parsing uses to generate user links. Mentioned users are sent a
notification, and mentions are added/deleted if needed on comment edits.
As part of this, setup was done to generate rabbitmq messages for
comment creation and edits, and the mentions are handled by an async
consumer of these messages.
This module is only being used by the breached-passwords Redis server,
which is separate. It's not relevant to any of the tests, so there's no
reason to load it.