Hacker News new | past | comments | ask | show | jobs | submit login
BatBadBut: You can't securely execute commands on Windows (flatt.tech)
77 points by explodingwaffle on April 9, 2024 | hide | past | favorite | 42 comments



Is there any reason Windows couldn't add an equivalent of execvpe for arguments and environment to be passed as arrays, which newer programs could then use directly? The OS could handle safely re-quoting as a string for older programs that need compatibility, rather than leaving it up to the language or programmer to hopefully do right. Which seems pretty difficult, based on the fact that seven major languages got a CVE today - plus a possible exploit in every C application that is doing this by hand.

The API could even be a more modern pointer+length interface rather than null termination, to sidestep that class of mistakes/exploits (CWE-170).

https://www.daviddeley.com/autohotkey/parameters/parameters.... is a great read on how fragmented this all seems to be.


Regrettably, Windows (MSVC) includes _exec functions, but with the behavior that it seems to rejoin arguments into a single string, which then gets split again.

> Spaces embedded in strings may cause unexpected behavior; for example, passing _exec the string "hi there" will result in the new process getting two arguments, "hi" and "there".

https://learn.microsoft.com/en-us/cpp/c-runtime-library/exec...


This is worse than not having these functions at all...

Oh yeah, pass those arguments as a list, then we'll completely ignore that and fuck your shit up. Err, I mean you need to quote them! Even though they are passed as separate arguments.


Knowing Windows, this is probably land mine field of backwards compatibility and Microsoft does not want to touch it. You think you can write rules to ensure backwards compatibility in this, I would almost guarantee you can't.


> Is there any reason Windows couldn't add an equivalent of execvpe for arguments and environment to be passed as arrays, which newer programs could then use directly?

I fail to see how this would help. If i understand correctly, the issue is how cmd.exe interprets the args, not how the args get to it.


I think both sides play a role - aiui cmd.exe (and other programs) interpret the arguments in an inconsistent way, but this is only relevant because there is no standard way to pass multiple arguments. I had something like this in mind:

- Create `CreateProcessArgv`, a version of `CreateProcess` that takes `argv` rather than `lpCommandLine` (like `execv*`)

- Create `GetCommandLineArgv`, an alternative to `GetCommandLine` that returns an `argv`

- Create `ProcessCreatedWithArgv` so a program can prefer either `GetCommandLine` or `GetCommandLineArgv` (for compatibility with those that have their own quoting, such as cmd)

Then child processes can use `GetCommandLineArgv` with no overhead if the parent invoked with `CreateProcessArgv`, otherwise `CreateProcess` and `GetCommandLine` will continue to work with no overhead. There would be a compatibility layer in the kernel to either split `lpCommandLine` or quote `argv` for `CreateProcess`+`GetCommandLineArgv` or `CreateProcessArgv`+`GetCommandLine` combinations. Probably need a way to opt out of taking `lpCmdLine` in `WinMain`.

Seems not-impossible, but also a bit of a pipe dream...


This creates compatibility issues with applications that inspect the command line of running programs and for example restart them with the same command line. It also probably ties in into a lot of general program-execution use cases like the Task Scheduler.


Because only the OS will actually know if it is running a batch script and wraps it in a call to cmd.exe or an executable. If the OS was passed an actual list of arguments it can then encode them the correct way for either path and handle all of the complexity of each escaping for all users.


> And unfortunately, the cmd.exe has different escaping rules compared to the usual escaping mechanism.

This "usual escaping mechanism" is a bit of a weasel word. Windows passes a single null-terminated character string to a process. Every application run-time must parse that into arguments itself.

I think what "usual escaping mechanism" refers to is the algorithm implemented in the Microsoft Visual C Run Time which takes the command line string and produces a char *argv[] for the main function.

There is no telling what uses that exact algorithm and what doesn't. Programs built with Microsoft languages probably do; obviously VC and VC++.


The “usual escaping mechanism” may also refer to CommandLineToArgvW, a Windows API function that implements this. I think it is a pretty common way to implement the parsing side, though I’m not sure if the Visual C runtime does something different.

https://learn.microsoft.com/en-us/windows/win32/api/shellapi...


Some more details on this, apparent CommandLineToArgvW implements a slightly different parsing algorithm that the Visual C++ runtime:

https://learn.microsoft.com/en-us/cpp/cpp/main-function-comm...

Also, here is another implementation of this algorithm in C#, used in .NET:

https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...


This seems more like a gap in the windows API than in the programming languages.


Is linux really any different? I think the escaping rules are just better known.


The primary linux interfaces for invoking programs with arguments (both in libc and the syscall level) have each argument be its own string, so it's possible to invoke a program with arguments such that no escaping or unescaping happens at all. If you want escaping, you have to either invoke /bin/sh (and give it your escaped command+argument string as an unescaped argument), or use 'system()' (which is literally defined to just be a short-hand for that /bin/sh invocation). The kernel works entirely in the unescaped proper list form, which allows you to even do horrible things like make arg 0 not be the invoked binary, or not even have a 0th arg.


Sure, but nothing is stopping /bin/sh from doing stupid things with its argument list once it gets them, which from what i understand is the equivalent of what is happening on windows.


If a process uses the execv* methods to spawn a new process (as should most good implementations), it doesn't use /bin/sh in any way and thus itself cannot cause wrong things to happen; the spawned process could of course still do arbitrary computation on its inputs including introducing vulnerabilities, but that'd be strictly not the caller's fault and the caller couldn't do anything about it - there's only one format a given list of arguments can be passed to execv* and the spawned process still gets the arguments separately, whereas on Windows the spawned process can forego the standard unescaping completely.

At the core really is that on Linux the arguments provided as a list of separate arguments is The Format of arguments, so it can be exposed and used without question, whereas on Windows the native format is a single string which can still be used to achieve the same things, but now the callee must necessarily know what way the caller expects multiple arguments (if it does at all) and stdlibs so far had just been assuming one format where bat files have a different one.


The difference is that this bug exists at the border of two distinct components.

Suppose `/bin/sh` concatenated all arguments together, then split them back apart. That would be a stupid thing to do, but that stupidity would be entirely contained within `/bin/sh`. A bug report for `/bin/sh` could clearly point to the broken component and state that it needs to be fixed. This is possible because the `execve` API provides a list of strings. Any extra (concatenate, split) pairs must exist on one side or another of the border imposed by `execve`.

Here, there's a mismatch between two entirely separate components. The `CreateProcess` API accepts an arbitrary string. The `GetCommandLine` function returns that same arbitrary string. The (concatenate,split) pair must straddle the border between the two processes, with concatenation done on the side that calls `CreateProcess`, and splitting done on the side that calls `GetCommandLine`. A developer for the parent process can shrug and say that it's the fault of the child process for not parsing arguments correctly. A developer for the subprocess can shrug and say that it's the fault of the parent process for not providing arguments in the expected form.


The difference is if you’re using execve and friends /bin/sh will never be implicitly invoked from under you.


From https://man7.org/linux/man-pages/man2/execve.2.html

pathname must be either a binary executable, or a script starting with a line of the form: #!interpreter [optional-arg]

which is the equivalent of Windows starting CMD.EXE to execute a batch file. The only difference WRT the shell being invoked implicitly is how a script is detected (file name extension vs. first line of content), but that doesn't seem to be relevant when it comes to the shell mis-interpreting its inputs.


It is still pretty relevant, as the .bat file contents can't prevent improper arguments from executing arbitrary code, whereas a #!/bin/sh file invoked with arbitrary arguments will not do any code execution other than what the file itself asks for. And, even still, the #! form will still pass the arguments as separate elements, i.e. a file containing "#!python3" invoked via an execve argv of ["the-file", "arg1 \"foo ^%`'\\", "arg2"] will result in a total invocation of ["python3", "the-file", "arg1 \"foo ^%`'\\", "arg2"] (JSON-formatted here for clarity reasons, there's no backslash-escaping happening anywhere in reality).


It's similar if you're invoking through the shell, which should be avoided for precisely this reason, but if you're launching a program directly (through execv or CreateProcess) there is one big difference:

On Linux the parsing is done on the caller's side of the interface, so the caller knows what quoting rules to apply – could be Python's rules if they're using Python to construct the argument array, bash's rules if they're using bash, etc.

On Windows, the parsing is done on the receiver's side of the interface, so the caller can't know how it's supposed to be quoted unless they have special knowledge of a specific receiver and its parsing rules.


True, but even if Microsoft pushes an update, it will take decades before all Windows are patched.


Nice to see Rust (and Haskell too) released a fix immediately but Java is "affected, but won't fix". :)


Who‘s still using batch files and cmd.exe now that we have PowerShell?


PowerShell has that whole execution policy thing, so you can't just launch a PowerShell script unless you wrap it in a batch file or some other script.

I could see how allowing the user to whitelist individual scripts would make sense, but as far as I can tell that's not how it works? A blanket policy of "all scripts are forbidden unless wrapped with fragile and shady-looking hacks" doesn't seem particularly useful.


Disabling script signing on dev machines and requiring signatures on production scripts sounds like perfectly reasonable behavior to me. I know a lot of people are scared of pki but it’s way easier than people think. Signing things is a one liner, I keep certs on a portable HSM and it’s really low friction.


Unless you suddenly get sick and your HSM is unavailable?

Unless you get a 2nd person on the team (working remotely), and they want to be able to sign scripts as well?

Unless you get some sort of automated CI/CD system?


You can still turn off the script signing requirement without running a script (right?). Presumably this will be logged to the Windows Event Log, so there should be a mechanism that watches logs for this and alerts someone to investigate.


It's configurable with quite a few options.

But I guess here we have some of the underlying problem.

If something just executes whatever you throw at it, people complain.

If something doesn't just execute whatever your throw it, people complain as well. ;)


None of the options seem very useful, though.

Why block execution of PowerShell scripts when batch files, WSH scripts and plain executables can still run? You could try to prevent those other kinds of scripts from even getting onto the machine, I guess, but then why wouldn't you simply do the same for PowerShell scripts?

The AllSigned policy where it asks you explicitly about trusting new publishers[0] seems like what I'm asking for, except that it apparently requires the certificate to be installed in Trusted Root Certificate Authorities! That's way more trust than should be necessary.

The only option that seems to make sense (aside from Unrestricted) is buying a certificate from an existing CA that's already trusted, so that users don't need to trust you with acting as a CA, but that's quite expensive.

[0] https://www.hanselman.com/blog/signing-powershell-scripts


Guilty as charged.

Sadly some software I use is so old that the only way to call Powershell scripts is via a batch script...


pwsh is slower, doesn't execute on double-click, and old bats exist


Why learn new thing when already have thing?


That's a lot of preconditions to meet in order to exploit.


Which is why the vuln is

> bad, but not the worst


Would LOVE to hear what other side of an "airtight" hatchway this one is.


It’s not an admin vulnerability, so there’s no hatchway. The real issue here is blindly passing user-provided input to a batch script, possibly from the Internet, and if you’re doing that then you’ve got much bigger problems. If you’re doing it using an account with any kind of privileges, you’re kinda asking to get broken into.


That isn't the problem. It is completely possible to process untrusted data in a batch script. (Even if it likely isn't the best tool for the job) the problem is that the method of getting that untrusted data to a batch script is incredibly complex and was being done wrong by a number of programming languages.

For example imagine that I have a shell script to write an entry to a guestbook. Maybe I call it from my webapp like this:

    # webapp.py
    subprocess.run(['guestbook', untrusted_msg])
On Linux this is perfectly fine. I can then write my guestbook script like

    #!/bin/bash
    echo "$1" >> guestbook.txt
As far as I am aware there are no security issues here. The user can pass whatever they want as the message and other than some mess in the `guestbook.txt` file they can't cause any harm.

However this doesn't work well on Windows because in order to escape the arguments you need to know how the `guestbook` program parses its arguments. Right now basically all languages assume that the caller will use `CommandLineToArgvW`. However if `guestbook` is a batch file a different parsing mechanism is used and remote code execution can occur before the batch script even starts executing.

Basically in order to properly escape the arguments the caller needs to know what is being called. The current APIs don't have a way to know this so they can't do it right in all cases.


This is a weird coincidence – I ran into this exact problem two days ago, the day before the post was published.

I was just trying to write a simple batch script that accepted filenames as arguments and was surprised to find that there is no safe way to do so, as they're always passed through shell expansion, so if you have a filename like "foo %PATH% bar.txt" (which is allowed) the script will receive it with the PATH variable expanded and cannot get at the actual filename.

Also, passing arguments to programs is unsafe on Windows even if you don't go through the shell, because the quoting rules are entirely up to the program being invoked. The CreateProcess function[0] accepts a string, not an array, so you have to quote the arguments – but you can't do this quoting correctly unless you know exactly what program you're invoking and what grammar it has chosen for parsing its lpCommandLine string.

The article mentions that "many programming languages guarantee that the command arguments are escaped properly", but there is no universal "escaped properly" on Windows. There is escaped properly for the C runtime's parser[1], or escaped properly for CommandLineToArgv[2] which parses "in a way that is similar to" the C runtime, or escaped properly for .NET which has its own set of rules[3] – but there is no guarantee that any particular program is using any of these ways; any program can use whatever rules it likes!

Raymond Chen has written[4] about this as well.

PowerShell has an interesting workaround[5] of sorts: If you specify "-EncodedCommand" and "-EncodedArguments" it lets you pass base64-encoded strings when you "require complex quotation marks or curly braces".

[0] https://learn.microsoft.com/en-us/windows/win32/api/processt...

[1] https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-c...

[2] https://learn.microsoft.com/en-us/windows/win32/api/shellapi...

[3] https://learn.microsoft.com/en-us/dotnet/api/system.environm...

[4] https://devblogs.microsoft.com/oldnewthing/20100917-00/?p=12...

[5] https://learn.microsoft.com/en-us/powershell/module/microsof...



From a quick read through the source code, .NET seems to use CommandLineToArgv, so the behavior is the same. Additionally, the behavior of internal parsing in the CRT and CommandLineToArgv is very similar, and from my experience, it's easy to escape arguments in a way where both interpret them the same way (it's dumb that there's a difference between the two in the first place, but it only really comes up with manual input).

In practice, with the exception of `cmd.exe`, which is an old beast that cannot be redeemed due to backwards compatibility, there is a consistent way to round-trip argv to more-or-less all programs one encounters in the wild. It's not a guarantee and I'm sure you could find a program which does something weird, but you could find the same in the POSIX world. In both cases, we can probably agree that it's the mistake of the program that it's parsing arguments in a non-standard way.


I'm seeing a 404 error?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: