Skip to content

Conversation

dmbaturin
Copy link
Contributor

Making safe mode the default is a noble idea. However, an incompatible change without a backwards compatibility option, after years of unchanging behaviour, is a serious headache.

cmark 0.28 is still in a lot of still supported OSes/distros. If you use cmark as a part of a build process and your Markdown comes from a trusted/reviewed source, you have to choose between old and new because the old one will complain about an invalid option if you run cmark --unsafe, and the new one will auto-escape HTML without it.

Environment variables can be an effective backwards compatibility mechanism: for the new versions it will disable safe mode, and for old versions it will have no ill effects.

This patch adds CMARK_UNSAFE environment variable in addition to the --unsafe option, with the same effect.

$ ./build/src/cmark 
<blink>hello world</blink> ^D

<p><!-- raw HTML omitted -->hello world<!-- raw HTML omitted --></p>

$ CMARK_UNSAFE=1 ./build/src/cmark 
<blink>hello world</blink> ^D

<p><blink>hello world</blink></p>

I'm open to discussion regarding the naming and implementation.

@jgm
Copy link
Member

jgm commented Jun 21, 2020

I'm not sure I like the idea of providing an environment variable backdoor around a security feature. It means that people might inadvertently think they're running in safe mode, because they're not aware of environment variable settings.

If you're able to set an environment variable in the calling script, you should also be able to check the cmark version and make the cmark call conditional on the result, no?

@dmbaturin
Copy link
Contributor Author

I'm not sure I like the idea of providing an environment variable backdoor around a security feature.

Well, if someone you don't trust can change your environment variables, you have a much bigger problem than possible XSS already.

If you're able to set an environment variable in the calling script, you should also be able to check the cmark version and make the cmark call conditional on the result, no?

Of course you can. We are working on Turing-complete machines, don't we? We can solve any computable problem. ;)
However, there is no universally popular command line utility for comparing versions. You'll have to resort to parsing it by hand with awk or something, which means you are making even more assumptions about the environment and Windows users are not completely out of luck.

I'm not sure I like the idea of providing an environment variable backdoor around a security feature.

Let's put it bluntly: cmark, like many programs these days, is essentially abusing semver by staying at 0.x despite being five years old and relied upon by many people.
This should have been a version change to 2.0, with appropriate deprecation warnings, but the maintainers chose to make it a sneaky change because it's 0.x and users should not expect it to stay the same.
But, if something stayed the same for FIVE years, it's a very reasonable expectation.
When I make a breaking change, I provide a compatibility mechanism or a deprecation period whether I like doing it or not, and I wish more people did the same.

@dmbaturin
Copy link
Contributor Author

dmbaturin commented Jun 21, 2020

One possible alternative is to reintroduce the --safe option and make cmark run in safe mode if that option is given, even if the CMARK_UNSAFE environment variable if it's set. This way people can make sure they are running in safe mode if it's important for them.

Besides, abrupt removal of that option might not have been good idea either—it will break people's scripts on upgrade.
Keeping options even if they became the new default is a widely accepted practice.

@nwellnhof
Copy link
Contributor

If you want to support both old and new versions, you can simply check whether the --unsafe option is supported:

unsafe_opt=''
if cmark --unsafe </dev/null >/dev/null; then
    unsafe_opt='--unsafe'
fi
cmark $unsafe_opt ...

I agree that --safe should be reintroduced, but simply as a no-op.

@jgm
Copy link
Member

jgm commented Jun 22, 2020

I'd be happy to reintroduce --safe as a no-op; that should have been done originally.

As for the versioning: the decision was made early on to track the version number of the commonmark spec, at least prior to 1.0, which is why we are still 0.x. The breaking change was announced in the changelog.

@jgm
Copy link
Member

jgm commented Jun 22, 2020

I'm on the fence about the environment variable idea, and I'd be happy to get input from more people. In general, with security simpler is better.

jgm added a commit that referenced this pull request Jun 22, 2020
@nwellnhof
Copy link
Contributor

First of all, I don't see how adding an environment variable helps with the original issue. Older binaries will always fail with --unsafe and can't know about new environment variables. The best you can do is a feature test.

@dmbaturin
Copy link
Contributor Author

@nwellnhof Old binaries use unsafe mode by default and never needed --unsafe.
Thus if you change a script to call CMARK_UNSAFE=1 cmark instead of just cmark, then

  • if the host has cmark <=0.28, then it's unsafe by default and doesn't need that variable
  • if the host has cmark >=0.29, then the variable will make it behave like old cmark

and the script will behave the same regardless of cmark version.

The explicit --safe option can be used as a way to prevent anyone from sneakily changing the behaviour by setting environment variables outside the script.

  • if --safe is given, force safe mode and ignore environment variables
  • if neither --safe nor --unsafe is given, use safe mode unless CMARK_UNSAFE is set

Regarding security implications... I agree that in the library API, safe mode should be the default because people are likely to use it in web application backends for rendering user comments/posts/etc. That's completely unattended and the risk of XSS is high.
However, how many people run command line programs on files without looking at them?

@nwellnhof
Copy link
Contributor

if the host has cmark >=0.29, then the variable will make it behave like old cmark

Only if the version is strictly greater than 0.29.0. Version 0.29.0 will never behave like older versions.

@dmbaturin
Copy link
Contributor Author

@nwellnhof Of course it won't. But fixing it for future versions is better than nothing, isn't it?

@nwellnhof
Copy link
Contributor

But fixing it for future versions is better than nothing, isn't it?

Since 0.29.0 is out, it's impossible to make sure that all cmark binaries in the wild will have the same default for safe/unsafe. Adding an environment variable could fix this for all versions except 0.29.0, but if you have to support different versions, you can't rule out that someone happens to have 0.29.0 installed. So you need some kind of feature test anyway. It isn't much more complex than setting an environment variable and it should work for all cmark versions. Why add a workaround that doesn't work on all versions and has security implications?

@dmbaturin
Copy link
Contributor Author

First five years security wasn't a concern, but now it suddenly is. So much of a concern in fact that disregarding compatibility completely is justifiable and no deprecation period was possible.

You have shown complete disregard for compatibility. I'm trying to do at least something to alleviate it. It's your right to refuse. Do whatever you want.

@dmbaturin dmbaturin closed this Jun 23, 2020
@nwellnhof
Copy link
Contributor

You have shown complete disregard for compatibility.

That's not fair. Personally, I always kept an eye on cmark development to minimize breaking changes. The safe/unsafe change is the first major breaking change that I know of. It was done deliberately in accordance with the stability guarantees stated by the project because it was deemed an important enough issue. I also tried to help by posting a rather simple workaround.

@dmbaturin
Copy link
Contributor Author

@jgm I noticed that you are also the maintainer of pandoc. Pandoc is still unsafe by default.
I'm curious if we should expect similar change in pandoc and how it will be implemented, or why it's not a security issue for pandoc.

@jgm
Copy link
Member

jgm commented Jun 23, 2020

I can attest that @nwellnhof has definitely been a force to minimize breaking changes.

I did consider the backwards compatibility issue before agreeing to this change, and the PR comments show my reluctance on that score. I hadn't thought of something like your environment variable approach at the time -- if I had, I might have done that. But I'm pretty convinced by @nwellnhof's argument: any robust script will have to reckon with the possibility of encountering cmark 0.29.0, so the environment variable approach is a half-measure.

@dmbaturin
Copy link
Contributor Author

@jgm Ok, let's forget about it and focus on pandoc.
I'm more interested to know if I can help with the fact that pandoc is still unsafe by default.
Do you have plans to make pandoc delete raw HTML by default or you think these security considerations do not apply to it?

@jgm
Copy link
Member

jgm commented Jun 24, 2020

This isn't the place to discuss pandoc. If you genuinely care about this issue, come over to the pandoc-discuss mailing list and raise it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants