Previously, either double or single tildes would work for strikethrough.
However, this had some strange edge cases, such as attempting to
strikethrough some words that contain a group reference. Since Tildes
uses single tildes for other purposes, it should be much less ambiguous
overall to just force double tildes for strikethrough like this.
This will affect a small number of past posts, in both good and bad
ways (getting rid of some strikethrough, but fixing some accidental
ones).
Renames BleachContext to HTMLSanitizationContext, and simplifies the
default case a little by just allowing None as the context instead of
needing an explicit DEFAULT enum member.
This was a little more strict before and would only skip linkification
if the entire path was digits and/or periods. However, I've seen it
still hitting some people if they write things like "~100k". It's very
unlikely that we're ever going to have a top-level group with a name
starting with a number, so let's just skip linkification for all
instances where a number is the first thing.
Markdown won't merge subsequent quoted paragraphs into a single
blockquote unless the blank line between them also has a ">" on it. Most
people don't expect this behavior when quoting a multi-paragraph
section, and end up with a bunch of separated blockquotes.
This should fix that issue by default, but still allows people to keep
their blockquotes separated by adding at least one more newline between
the two quoted paragraphs (so they have at least two blank lines), among
various other methods.
Now that all links in text have underlines by default, I think this
looks pretty strange for ~group and @user links, which are quite common
and unnecessary to have underlined all the time. This modifies the
markdown parser to add link-user and link-group classes to these links,
which allows them to be styled differently.
In addition, some of the markdown tests needed to be changed to use
BeautifulSoup instead of simple string-matching, since it's not as
simple to recognize links any more (and the order of attrs might
change).
Yet another issue with Bleach 3.0's linkification: when used via the
filter as part of sanitization (which is necessary right now due to
*another* issue where it escapes valid HTML tags), it doesn't properly
linkify urls that contain an ampersand.
As a (temporary?) workaround, this stops using Bleach's linkification
entirely and switches to cmark-gfm's "autolink" extension. These aren't
perfectly equivalent, and the switch results in two other issues that I
consider more minor than links including ampersands not working:
- autolink will initially create links for ftp:// urls and email
addresses. The final sanitization will remove these links due to the
protocol whitelist, but it will leave behind a bare <a> tag. So the
text will *appear* linked but not actually link to anything. If I
decide to stick with autolink, it should be pretty straightforward to
fix this by stripping all bare <a> tags from the final HTML.
- autolink doesn't create links for bare domains. For example, writing
"example.com" won't result in a link, it's necessary to write
"www.example.com" or include a protocol like "http://example.com".
Bleach supports adding html5lib filters to the cleaning process, which
is how its linkify() process works, as well as being how I implemented
the custom Tildes linkification for usernames and group names.
This commit switches to adding those filters into the clean() call,
which makes it so everything is now done in a single pass instead of
three. In addition, this also fixes the issues we were having with
bleach "fixing" things that it thought were invalid HTML (when they were
just people writing things inside angle brackets).
This follows the REUSE practices to add license and copyright info to
all source files: https://reuse.software/practices/2.0/
In addition, LICENSE.md was switched to a plaintext LICENSE file, to
support the tag-value header as recommended.
Note that files that are closer to configuration than code did not have
headers added. This includes all Salt files, Alembic files, and Python
files such as most __init__.py files that only import other files, since
those are similar to header files which are not considered
copyrightable.
Previously this was also trying to catch ones at the beginning of new
paragraphs, but that seems to mostly just be causing unexpected issues
when people create ordered lists with a blank line between items. This
can probably be done properly in the future, but just restricting it to
the start of posts is probably better for now.
This commit contains only changes that were made automatically by Black
(except for some minor fixes to string un-wrapping and two
format-disabling blocks in the user and group schemas). Some manual
cleanup/adjustments will probably need to be made in a follow-up commit,
but this one contains the result of running Black on the codebase
without significant further manual tweaking.
An example was recently added to the github cmark repo to show how to
set up the extensions from Python, so this is heavily based on that
code:
https://github.com/github/cmark/blob/master/wrappers/wrapper_ext.py
This should also fix a memory leak, since I wasn't manually freeing the
returned buffer (as the library recommends that you do).