Skip to main content

Never run "python" in your "Downloads" folder

 One of the wonderful things about Python is the ease with which you can start writing a script - just drop some code into a .py file, and run python my_file.py. Similarly it’s easy to get started with modularity: split my_file.py into my_app.py and my_lib.py, and you can import my_lib from my_app.py and start organizing your code into modules.

However, the details of the machinery that makes this work have some surprising, and sometimes very security-critical consequences: the more convenient it is for you to execute code from different locations, the more opportunities an attacker has to execute it as well...

Python needs a safe space to load code from

Here are three critical assumptions embedded in Python’s security model:

  1. Every entry on sys.path is assumed to be a secure location from which it is safe to execute arbitrary code.
  2. The directory where the “main script” is located is always on sys.path.
  3. When invoking python directly, the current directory is treated as the “main script” location, even when passing the -c or -m options.

If you’re running a Python application that’s been installed properly on your computer, the only location outside of your Python install or virtualenv that will be automatically added to your sys.path (by default) is the location where the main executable, or script, is installed.

For example, if you have pip installed in /usr/bin, and you run /usr/bin/pip, then only /usr/bin will be added to sys.path by this feature. Anything that can write files to that /usr/bin can already make you, or your system, run stuff, so it’s a pretty safe place. (Consider what would happen if your ls executable got replaced with something nasty.)

However, one emerging convention is to prefer calling /path/to/python -m pip in order to avoid the complexities of setting up $PATH properly, and to avoid dealing with divergent documentation of how scripts are installed on Windows (usually as .exe files these days, rather than .py files).

This is fine — as long as you trust that you’re the only one putting files into the places you can import from — including your working directory.

Your “Downloads” folder isn’t safe

As the category of attacks with the name “DLL Planting” indicates, there are many ways that browsers (and sometimes other software) can be tricked into putting files with arbitrary filenames into the Downloads folder, without user interaction.

Browsers are starting to take this class of vulnerability more seriously, and adding various mitigations to avoid allowing sites to surreptitiously drop files in your downloads folder when you visit them.1

Even with mitigations though, it will be hard to stamp this out entirely: for example, the Content-Disposition HTTP header’s filename* parameter exists entirely to allow the the site to choose the filename that it downloads to.

Composing the attack

You’ve made a habit of python -m pip to install stuff. You download a Python package from a totally trustworthy website that, for whatever reason, has a Python wheel by direct download instead of on PyPI. Maybe it’s internal, maybe it’s a pre-release; whatever. So you download totally-legit-package.whl, and then:

~$ cd Downloads
~/Downloads$ python -m pip install ./totally-legit-package.whl

This seems like a reasonable thing to do, but unbeknownst to you, two weeks ago, a completely different site you visited had some XSS JavaScript on it that downloaded a pip.py with some malware in it into your downloads folder.

Boom.

Demonstrating it

Here’s a quick demonstration of the attack:

~$ mkdir attacker_dir
~$ cd attacker_dir
~/attacker_dir$ echo 'print("lol ur pwnt")' > pip.py
~/attacker_dir$ python -m pip install requests
lol ur pwnt

PYTHONPATH surprises

Just a few paragraphs ago, I said:

If you’re running a Python application that’s been installed properly on your computer, the only location outside of your Python install or virtualenv that will be automatically added to your sys.path (by default) is the location where the main executable, or script, is installed.

So what is that parenthetical “by default” doing there? What other directories might be added?

Anything entries on your $PYTHONPATH environment variable. You wouldn’t put your current directory on $PYTHONPATH, would you?

Unfortunately, there’s one common way that you might have done so by accident.

Let’s simulate a “vulnerable” Python application:

# tool.py
try:
    import optional_extra
except ImportError:
    print("extra not found, that's fine")

Make 2 directories: install_dir and attacker_dir. Drop this in install_dir. Then, cd attacker_dir and put our sophisticated malware there, under the name used by tool.py:

# optional_extra.py
print("lol ur pwnt")

Finally, let’s run it:

~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine

So far, so good.

But, here’s the common mistake. Most places that still recommend PYTHONPATH recommend adding things to it like so:

export PYTHONPATH="/new/useful/stuff:$PYTHONPATH";

Intuitively, this makes sense; if you’re adding project X to your $PYTHONPATH, maybe project Y had already added something, maybe not; you never want to blow it away and replace what other parts of your shell startup might have done with it, especially if you’re writing documentation that lots of different people will use.

But this idiom has a critical flaw: the first time it’s invoked, if $PYTHONPATH was previously either empty or un-set, this then includes an empty string, which resolves to the current directory. Let’s try it:

~/attacker_dir$ export PYTHONPATH="/a/perfectly/safe/place:$PYTHONPATH";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt

Oh no! Well, just to be safe, let’s empty out $PYTHONPATH and try it again:

~/attacker_dir$ export PYTHONPATH="";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt

Still not safe!

What’s happening here is that if PYTHONPATH is empty, that is not the same thing as it being unset. From within Python, this is the difference between os.environ.get("PYTHONPATH") == "" and os.environ.get("PYTHONPATH") == None.

If you want to be sure you’ve cleared $PYTHONPATH from a shell (or somewhere in a shell startup), you need to use the unset command:

~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine

Setting PYTHONPATH used to be the most common way to set up a Python development environment; hopefully it’s mostly fallen out of favor, with virtualenvs serving this need better. If you’ve got an old shell configuration that still sets a $PYTHONPATH that you don’t need any more, this is a good opportunity to go ahead and delete it.

However, if you do need an idiom for “appending to” PYTHONPATH in a shell startup, use this technique:

export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_1"
export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_2"

In both bash and zsh, this results in

$ echo "${PYTHONPATH}"
new_entry_1:new_entry_2

with no extra colons or blank entries on your $PYTHONPATH variable now.

Finally: if you’re still using $PYTHONPATH, be sure to always use absolute paths!

Related risks

There are a bunch of variant unsafe behaviors related to inspecting files in your Downloads folder by doing anything interactive with Python. Other risky activities:

  • Running python ~/Downloads/anything.py (even if anything.py is itself safe) from anywhere - as it will add your downloads folder to sys.path by virtue of anything.py’s location.
  • Jupyter Notebook puts the directory that the notebook is in onto sys.path, just like Python puts the script directory there. So jupyter notebook ~/Downloads/anything.ipynb is just as dangerous as python ~/Downloads/anything.py.

Get those scripts and notebooks out of your downloads folder before you run ’em!

But cd Downloads and then doing anything interactive remains a problem too:

  • Running a python -c command that includes an import statement while in your ~/Downloads folder
  • Running python interactively and importing anything while in your ~/Downloads folder

Remember that ~/Downloads/ isn’t special; it’s just one place where unexpected files with attacker-chosen filenames might sneak in. Be on the lookout for other locations where this is true. For example, if you’re administering a server where the public can upload files, make extra sure that neither your application nor any administrator who might run python ever does cd public_uploads.

Maybe consider changing the code that handles uploads to mangle file names to put a .uploaded at the end, avoiding the risk of a .py file getting uploaded and executed accidentally.

Mitigations

If you have tools written in Python that you want to use while in your downloads folder, make a habit of preferring typing the path to the script (/path/to/venv/bin/pip) rather than the module (/path/to/venv/bin/python -m pip).

In general, just avoid ever having ~/Downloads as your current working directory, and move any software you want to use to a more appropriate location before launching it.

It’s important to understand where Python gets the code that it’s going to be executing. Giving someone the ability to execute even one line of arbitrary Python is equivalent to giving them full control over your computer!

Why I wrote this article

When writing a “tips and tricks” article like this about security, it’s very easy to imply that I, the author, am very clever for knowing this weird bunch of trivia, and the only way for you, the reader, to stay safe, is to memorize a huge pile of equally esoteric stuff and constantly be thinking about it. Indeed, a previous draft of this post inadvertently did just that. But that’s a really terrible idea and not one that I want to have any part in propagating.

So if I’m not trying to say that, then why post about it? I’ll explain.

Over many years of using Python, I’ve infrequently, but regularly, seen users confused about the locations that Python loads code from. One variety of this confusion is when people put their first program that uses Twisted into a file called twisted.py. That shadows the import of the library, breaking everything. Another manifestation of this confusion is a slow trickle of confused security reports where a researcher drops a module into a location where Python is documented to load code from — like the current directory in the scenarios described above — and then load it, thinking that this reflects an exploit because it’s executing arbitrary code.

Any confusion like this — even if the system in question is “behaving as intended”, and can’t readily be changed — is a vulnerability that an attacker can exploit.

System administrators and developers are high-value targets in the world of cybercrime. If you hack a user, you get that user’s data; but if you hack an admin or a dev, and you do it right, you could get access to thousands of users whose systems are under the administrator’s control or even millions of users who use the developers’ software.

Therefore, while “just be more careful all the time” is not a sustainable recipe for safety, to some extent, those of us acting on our users’ behalf do have a greater obligation to be more careful. At least, we should be informed about the behavior of our tools. Developer tools, like Python, are inevitably power tools which may require more care and precision than the average application.

Nothing I’ve described above is a “bug” or an “exploit”, exactly; I don’t think that the developers of Python or Jupyter have done anything wrong; the system works the way it’s designed and the way it’s designed makes sense. I personally do not have any great ideas for how things could be changed without removing a ton of power from Python.

One of my favorite safety inventions is the SawStop. Nothing was wrong with the way table saws worked before its invention; they were extremely dangerous tools that performed an important industrial function. A lot of very useful and important things were made with table saws. Yet, it was also true that table saws were responsible for a disproportionate share of wood-shop accidents, and, in particular, lost fingers. Despite plenty of care taken by experienced and safety-conscious carpenters, the SawStop still saves many fingers every year.

So by highlighting this potential danger I also hope to provoke some thinking among some enterprising security engineers out there. What might be the SawStop of arbitrary code execution for interactive interpreters? What invention might be able to prevent some of the scenarios I describe below without significantly diminishing the power of tools like Python?

Stay safe out there, friends.

Comments

Popular posts from this blog

Capture and compare stdout in python unit tests

A recent fan of TDD, I set out to write tests for whatever comes my way. And there was one feature where the code would print messages to the console. Now - I had tests written for the API but I could not get my head around ways to capture these messages in my unittests. After some searching and some stroke of genius, here's how I accomplished capturing stdout.

On working remote

The last company I worked for, did have an office space, but the code was all on Github, infra on AWS, we tracked issues over Asana and more or less each person had at least one project they could call "their own" (I had a bunch of them ;-)). This worked pretty well. And it gave me a feeling that working remote would not be very different from this. So when we started working on our own startup, we started with working from our homes. It looked great at first. I could now spend more time with Mom and could work at leisure. However, it is not as good as it looks like. At times it just feels you are busy without business, that you had been working, yet didn't achieve much. If you are evaluating working from home and are not sure of how to start, or you already do (then please review and add your views in comments) and feel like you were better off in the office, do read on. Remote work is great. But a physical office is better. So if you can, find yourself a co-working s

The economics of crypto investing

If you believe in the greater fool theory, there is no other market as speculative and volatile as the crypto market today. We are perhaps living in the biggest bubble of our times. I am not bullish on this market in particular. I am bullish on the mania. 90% of the cryptos we see today will crash. They are just tokens with no tangible value generation capability. However, I believe that the mania and euphoria will stay. Having said that, should one consider investing in this market? Certainly! The risk/reward is lovely, potential upsides and margins are huge and with 3-5% of your net worth, the bet on the mania is worth it. How does one choose where to invest? If you follow the stock markets, you are expected to do thorough Fundamental Analysis before investing. Expect the same for the crypto market. I invest in large caps. I invest in index funds. And I invest over and over again. Markets rise, always. Extrapolating the same strategy - invest in indices - the top 10 tokens by