profile-pic

Jordan Hook

Software Developer


Tutorial
malware  dotnet core  C#  docker  containers  isolation  

If you haven't already done so, please read Part 1 to learn about how we developed the scraper used in this tutorial.

When working with dangerous files such as malware it's important to take proper safety precautions. Accidentally opening or executing a malicious file on your system could have dire consequences even on a virtual machine. In order to protect yourself whilst using the scraper we are going to make a few changes to the code of our scraper and how we use it. However, with all of this said please note it is still strongly recommended you use a full virtual machine at the very least when working with malware as there are chances you could still get infected!

Part 1: Code Changes

The first code change we are going to make is how the files are named after they are downloaded. Our original code was preserving the original file names and types based on the URL where is was stored. This is dangerous depending how you plan on accessing the files later as you could accidentally execute one the files by clicking on it.

string targetPath = this.outputDirectory + "/" + sampleParts[sampleParts.Length - 1];

To fix this issue at the very least on Windows based operating systems we are going to be renaming the files but, there is also another reason for renaming the files. Our current setup could be downloading duplicate files or even overwriting different samples based on the file name. To ensure we are only receiving unique file names per download we are going to be hashing each file and using the hash as the name of the file. 

If want to understand more about hashing and cryptography I recommend visiting this sciencedirect.com as they have some good articles about it. For the purpose of this tutorial we will be implementing SHA-256. In order to do this we will need to complete the following code changes:

  • Implement function to calculate SHA-256 hash of an entire file
  • Update existing code to download files using unique names (we are going to take a random number approach)
  • Calculate the hash of each file
  • Rename each file with its own unique hash

To get started lets first write the code to generate a unique hash for each file. Luckily for us, and in most cases there are built in implementations of common hashing algorithms in most languages so our code should look something like this:

using System.Text;
using System.Security.Cryptography;

private string getFileHash(string filePath) {
    using (SHA256 sha = SHA256.Create()) {

        // Read file and compute hash as bytes
        byte[] shaHash = sha.ComputeHash(File.ReadAllBytes(filePath));

        // Convert hash to string
        StringBuilder sb = new StringBuilder();
        for(int i = 0; i < shaHash.Length; i++) {
            sb.Append(shaHash[i].ToString("x2"));
        }

        return sb.ToString();
    }
}

Next we need to ensure our samples are being downloaded with unique names so that we don't accidentally overwrite them. To accomplish this, we will be using random numbers to help generate the temporary file names. Now before you ask or mention it below, I do know that there is built-in functionality for generating temporary file names however, we are trying to develop an application that will isolate itself to only specified sections of the file system so we know where are samples are at all times even if something goes wrong and the application crashes.

The code to implement this will be added to a few different places in the scapeSamples() function. Because of this I am only going to show some of the new code added below. For the full implantation please visit the GitHub project linked below.

Random r = new Random();
.
.
.
string targetPath = this.outputDirectory + "/sample-" + randomNumber;

// Keep adding to filename until a unique filename is found
while(File.Exists(targetPath)) {
    targetPath += r.Next(1, 9);
}
.
.
.
string shaHash = getFileHash(targetPath);

// Check if we have a duplicate sample
if(File.Exists(this.outputDirectory + "/" + shaHash)) {
    File.Delete(targetPath);
} else {
    // Rename file
    File.Move(targetPath, outputDirectory + "/" + shaHash);
}

Now all we need to do is test out our code. I am going to run the application in my samples directory using 4 threads.

dotnet run /samples/ 4

And to see if the new files have been renamed to hashes we can list the files in our output directory:

ls
256d41b2aeef68c0194363352566cd87e6e3f2a8237c7de87eceb4ab12be67d4
3455b6723080f369bea2244aa5e14813e9b02108a24b5e54afb275fcd22396f1
7fc8129cb56bfc714c30ae1e622611b7bb7a17e5130cfe1cae596b9d1306d0c0
c33518e3c6aa31336694ad8aaa515e6214aef5cf8c4160af0150e7bf2321cc15
fccd46f6f077a5c57d5de06e06a21f89a9007048dcf339e844cabfa1bea804f4
Part 2: Isolated Environments

The whole reason of rewriting this application using dotnet core was for the ability to run it in a linux based environment. For this part of the tutorial we are going to be working on running the application within an docker container to provide further isolation from the host operating system. With this being said please note Docker containers do not provide full virtualization such as what a virtual machine via VMware or VirtualBox could provide you however, it is a step that ensures we can run the scraper from any operating system that is capable of running docker containers. In other words you could still run your virtual machine when analyzing malware and execute the scraper or even samples within containers to help protect your analysis machine. 

In order to run our project in a docker container we are going to pre-build and create a docker file.

# dotnet build
Microsoft (R) Build Engine version 15.9.20+g88f5fadfbe for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Restore completed in 53.8 ms for /Users/jhook/Desktop/dotnet/scraper/scraper.csproj.
  scraper -> /Users/jhook/Desktop/dotnet/scraper/bin/Debug/netcoreapp2.2/scraper.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:01.34
# touch Dockerfile

Please note the path of the scraper.dll file as it is important later. 

Now that we have our application built we need to create to configure our docker image to create a container utilizing dotnet core and our application. To do this we will use the official Microsoft image for docker and the path noted above to our applications binaries.

FROM mcr.microsoft.com/dotnet/core/runtime:2.2
WORKDIR /app

COPY bin/Debug/netcoreapp2.2/publish/ app/

ENTRYPOINT ["dotnet", "app/scraper.dll"]

Our Dockerfile does the following:

  • Creates a container using the Microsoft dotnet core runtime
  • Create a directory called app in the root of the file system
  • Copies our application binaries to the app directory 
  • Executes our application by calling the dotnet runtime

Now we need to build our docker image

docker build -t scraper .
Sending build context to Docker daemon  296.4kB
Step 1/4 : FROM mcr.microsoft.com/dotnet/core/runtime:2.2
 ---> 136d49fe5bd7
Step 2/4 : WORKDIR /app
 ---> Running in d7b5487adc8a
Removing intermediate container d7b5487adc8a
 ---> 787fd38d8b4d
Step 3/4 : COPY bin/Debug/netcoreapp2.2/publish/ app/
 ---> 96619b278ea7
Step 4/4 : ENTRYPOINT ["dotnet", "app/scraper.dll"]
 ---> Running in 567430bd3a45
Removing intermediate container 567430bd3a45
 ---> d667d4c9f89e
Successfully built d667d4c9f89e
Successfully tagged scraper:latest

Running the command above in the same directory of our Dockerfile will create a new image for us called scraper. 

Now we can run our image within a container by executing the command below

docker run --name=scraper -v /Users/jhook/Desktop/dotnet/scraper/samples:/samples scraper /samples 4

After a few moments you should begin to see new samples appearing in your samples folder with SHA hashes as names. If you want to run this command in the background you can run use the -d parameter to the run command to run the process in detached mode.

Part 3: Conclusion

Running our sample scraper within a docker container is a huge goal and will allow us to scrape for samples on any docker ready device. Our next steps could be automating this process to run on a schedule or even adding additional layers of processing. For example, in another tutorial I may add another step in our pipeline to automatically perform basic analysis on our samples. If you have any suggestions or ideas on additions to this project please comment below!

Full Source (GitHub)