So if you’ve ever tried to leave a comment, on here or any other site that uses Disqus (“Discuss”) for commenting, you’ll notice that it has a little array of formatting buttons below the comment box that allow you to.. format your comment. Make things bold, underlined, or insert links. Cool, right? Yes, but if you look closely, you’ll notice that they’re inserting the appropriate HTML tag for whatever you just clicked. Which makes me wonder… how does Disqus deal with other HTML tags?
How Disqus Gets Displayed
Disqus, at least on here, is
<iframe>'d, essentially telling the browser to load a webpage inside another webpage.
It makes a request to the frame’s url, computes the styling, scripting, what have you, then drops the final result down into the
In theory, this means that anything malicious can’t directly get to my site (with the default permissions), but they could compromise something on the Disqus side.
I wrote a comment that had two ‘unusual’ tags: a
<span> with a
style attribute, and a
<script> that would display an alert box if it ran, the simplest of all XSS checks: type some
<script> into an input and see what it does.
<span> tags were interpreted, but the
style attribute was removed, for some reason.
<script> tag was not interpreted, and displayed the actual tag markup, though the page source didn’t seem to explicitly escape it, using
>) as is common.
Additionally, the Disqus moderation dashboard reported “No issues detected” with that comment.
Disqus is doing some input sanitization, obviously, though I can’t figure out the extent of it. Right now it looks safe, but.. that’s only with my very surface-level testing, I bet someone could break it hard enough if they wanted (That’s not an invitation to try, at least in a malicious intent, by the way!).
How Other Websites Manage Comment Formatting
Bulletin Board Code, which, if you couldn’t tell by the name, goes way back, is an HTML-like markup code, with entities like
[url] replacing HTML’s
The nice thing about this is that it’s still relatively easy to interpret, and HTML devices will ignore it because… it’s not HTML. Many forums and such still use this, actually.
The same format that I’m writing right now.
Markdown is a very simple markup language that doesn’t use tags or entities, but special characters for formatting, such as
# Heading, and so on.
Markdown is a little more advanced in the ‘parsing’ category, but, in my opinion, a lot easier to learn how to use.
Sites like Reddit use a light-weight form of Markdown in their comments, and Discord uses almost the full Markdown suite for formatting your messages.
Markdown is cool in that it’s usually being transformed to HTML anyways (see, right here!), so by inserting raw HTML into it, that will be reflected back in the output, allowing you to style things that Markdown itself doesn’t specify.
This isn’t part of the spec, but rather the consequence of a non-sanitizing parser.
The parser that Hugo, my framework uses, Goldmark, will sanitize Markdown input, replacing any raw HTML with a
<!-- Raw HTML omitted --> HTML comment instead, producing no visible output.
The Dangers Of User Input
In a perfect world, nobody would try to break yourself. However, we’re not in a perfect world. There’s as many malicious actors out there as good ones, if not more. This is why you never pass user input straight into something else without filtering it first. Forgetting to do this is how SQL injection happens, how the self-retweeting tweet happened, how one user, who puts one script, in one comment, can end up stealing data from anyone else who just visits the site. I hope you weren’t logged in, all your session cookies are belong to us.
Always sanitize your input. Always escape ‘dangerous’ characters (real definition depends on where said input is being used) so that the computer knows “I mean X but don’t think that I’m actually saying X”.
Disqus, while you’re doing your job (as far as I can see), I think you could do one better.