cover

Contents

  • Things to remember
  • Anti-Virus or next generation anti-virus
  • Endpoint detection and response
  • File static analysis
  • File emulation
  • Machine learning
    • How machine learning fits into the picture
    • Machine learning on the cloud vs endpoint
  • Runtime behavior analysis
    • DLL hooking
    • Kernel callbacks
    • ETW and ETW TI
  • Putting things into perspective
  • The idea of evasion
  • After thoughts

Things to remember

This article goes through the common mechanisms used by modern AV/EDR products for monitoring and detecting malicious activity. Although I have tried to include necessary information, all these topis are vast and require further study to completely get into grips with. So I encourage you to search and learn more about all the listed topics.

Anti-virus or next generation anti-virus

Anti-virus are the most common products that are present on an endpoint to detect and prevent malicious activity. When we talk about anti-virus most people immediately rub them off and think they don’t detect anything because they are signature based, if that was the case we wouldn’t be having an anti-virus industry. Long time ago anti-virus were signature based but as time went on and the sophistication of malware increased so did the mechanisms employed by anti-virus to detect these malware.

Modern anti-viruses employ sophisticated techniques like memory scanning, behavioral detection, cloud enabled machine learning based detection and more. Below is a picture from ESET and the kind of techniques they employ to detect malware.

image_eset_detection_mechanisms

Terms like DNA protection, advanced machine learning are not going to be used when an anti-virus was signature based. Signature scanning is still a part of the detection mechanism but not the only tool.

Endpoint detection and response

An EDR (Endpoint Detection and Response) differentiates itself from an anti-virus in a way that it provides real time threat intelligence and feed of the system. So even if the system allows the file to execute all the events generated by the malicious process will still be logged. This allows for custom rules to be created which can prevent certain malicious activities. These events generated can be monitored in real time and any suspicious activity can be seen as an intrusion. An EDR can come with an anti-virus or without it i.e it can just have a event monitoring component or include a detection and prevention component also.

File static analysis

File static analysis is something that is performed as soon as you drop a a file on disk. This is where the file system mini filter driver provides the notification and the EDR/AV starts its analysis.

In file static analysis the characteristics of the file like number of imports, imported API, checksum, certificate, file metadata etc are all extracted from the file. The number of characteristics extracted can be a lot. These characteristics can be later fed into a machine learning model to decide if a file is malicious or not. File static analysis can involve looking for signatures of shellcode/payloads in the file but that may depend on the specific product since signature analysis can also be performed during the file emulation phase when the file has unpacked itself.

File emulation

File emulation is a feature that may or may not be present (on the endpoint) in AV/EDR depending on how their detection engine works. But if it is present file emulation will be performed after static analysis.

Emulation allows for checking the behavior of the file without the file making changes to the system. File emulation can check for network activity, api usage, memory allocation and also look for malicious code inside the process space. The data retrieved from the file emulation can also be fed into a machine learning algorithm to decide if the file is malicious. Detection can be based on the combined results from file static analysis and file emulation.

Machine learning

I have talked about file static analysis and file emulation but to fully understand what happens in these two phases it is important to get an idea of what machine learning is and how it is used.

How machine learning fits into the picture

Machine learning is a process in which mathematical algorithms are used to make predictions, find patterns or make informed decisions based on the data, therefore terms like predictive analytics, pattern recognition and statistical analysis are all used in relevance to machine learning . What kind of machine learning algorithm is employed will depend on the kind of problem that needs to be solved. Machine learning algorithms can be used in supervised, unsupervised and reinforcement Learning. Classifying a file as malicious or benign employs supervised learning. In supervised learning the data used to train the machine learning model is labeled data. That means for each input in the dataset, there is a corresponding output label. The algorithm learns to map the input (features collected from PE file during static and dynamic analysis) to the output (malicious or benign file).

Once the model is trained it can be use this mapping to predict the output for new, unseen data (newer files). The number of features used to train the machine learning model can be from 100s to 1000s of features or more. Below is a diagram that gives an overview of how things work.

machine_learning_in_malware_detection

Machine learning on the cloud vs endpoint

Because of computational limitation like memory, CPU power large machine learning models cannot be deployed on the endpoint, which leads to the division between cloud and endpoint machine learning based detections. Most AV/EDR products will have a ML model deployed on the cloud which will be used to analyze a file and decide if the file is malicious but since this can take minutes, nowadays many AV/EDR products have a small yet powerful machine learning model deployed on the endpoint. In many cases if the the local ML model cannot decide if the file is malicious or not, it may try to send the file to the cloud.

Although cloud can be used to process a file and make decision it can also be used to process information collected at the endpoint and make a decision alongside the endpoint ML model. So the cloud will be utilized in someway if the internet connectivity is there. If there is no internet connectivity then the decision making is done based on the endpoint ML model.

Runtime behavior analysis

Runtime behavior analysis detects runtime activity of the process. If the file is able to bypass file emulation or file static analysis then runtime behavior analysis is another hurdle that will be there till the file stops execution (or is stopped by the AV/EDR).

For runtime behavior analysis things that need to be kept in mind are:

  • DLL hooking
  • Kernel callbacks
  • ETW and ETW TI

DLL hooking

DLL hooking is a process in which the AV/EDR product hooks specific APIs from DLLs loaded inside the process space of a process. Below is a diagram that shows what happens when VirtualAlloc API is executed by the process.

image_api_execution_sequence

When we call VirtualAlloc() the execution goes to kernel32.dll. The kernel32.dll has a jump to the actual VirtualAlloc() API inside kernelbase.dll. The VirtualAlloc() api performs some operations and then calls NtAllocateVirtualMemory inside ntdll.dll. The ntdll contains stub code, which basically moves the 4 byte SSN (system service number) to the eax register. Finally the syscall instruction is executed. The syscall instruction transfers control to the kernel mode where the kernel handles the requested operation.

Note: The SSN number is unique for each API in the ntdll. The SSN number is what directs the execution in the kernel to the correct system service routine that actually does what we asked for from the user mode like allocating memory or writing to memory.

An AV/EDR product can perform inline hooking at various levels (inside kernel32.dll, kernelbase.dll or ntdll.dll) or any other APIs inside other DLLs which might be used by malware for its operations. But typically APIs inside ntdll.dll are hooked since that is the last stage from where the execution transfers to kernel mode. The diagram below shows the hooked version of NtAllocateVirtualMemory.

image_api_hooking

Note: The hook can be present at the start or after certain number of instructions. So this needs to be taken into account while checking for hooks.

Kernel callbacks

Kernel callbacks is a mechanism that the windows operating system provides to drivers in the kernel that they can use to get notified about certain events. A program needs to register a function using the API provided by the kernel in order to get notified about the event. When the event occurs (i.e process creation, handle duplication, thread creation etc.) the program that registered a callback for this event will be notified and it can perform operations that it wants. There are different types of event for which callbacks can be registered as mentioned in the table below:

Type Operation API
Process notification callback Called when process is created or deleted PsSetCreateNotifyRoutine(), PsSetCreateNotifyRoutineEx()
Thread notification callback Called when thread is created or deleted PsSetCreateThreadNotifyRoutine(), PsSetCreateThreadNotifyRoutineEx()
Registry notification callback Called for registry related events CmRegisterCallbackEx(), CmRegisterCallback
Handle notification callback Called for thread, process, desktop handle operation ObRegisterCallbacks()
Image notification callback Called when an image is loaded or mapped to memory PsSetLoadImageNotifyRoutine(), PsSetLoadImageNotifyRoutineEx()

So whenever a driver registers a callback to the kernel for the above mentioned events, the driver will be notified for that event. AV/EDR drivers can use the above functionality to inject their DLLs in process, track loading of DLLS, track registry activity. If we take an example of process injection in which a malware tries to use VirtualAllocEx, WriteProcessMemory, CreateRemoteRemoteThread to inject into a process. In this CreateRemoteThread will notify the AV/EDR driver of thread creation if kernel callbacks are being leveraged. WriteProcessMemory will notify the AV/EDR driver of memory related activity if ETW is leveraged. Therefore an AV/EDR can make it decisions and also perform memory scanning if it believes the activity to be malicious.

ETW and ETW TI

ETW (Event Tracing for Windows) is a mechanism implemented by windows to log events on the systems. The image below shows the architecture of the ETW.

image_etw_architecture

ETW has different components Controller, Providers, Consumers. The ETW Controller manages the overall tracing process. It starts and stops the ETW sessions and enables or disables providers in a session. Providers are applications or components inside applications that are responsible for generating the event. Consumers are application or components that receive the events generated by providers and do required operations . An ETW sessions can deliver the event data live to the consumer or log it for later processing and analysis.

logman is a controller that is available on windows which can be used to create trace session, enable or disable them. It can also be used to list the available providers. Below are some of the providers available after running the command logman query providers in the terminal.

image_logman_query_providers

As we can see Microsoft-Windows-Threat-Intelligence being one of the providers. Microsoft-Windows-Threat-Intelligence is what ETW TI (Threat Intelligence) refers. It provides many useful events that can be monitored to understand any malicious activity. The events Microsoft-Windows-Threat-Intelligence can generate can be seen in the image below.

image_threat_intelligence_events

The event keywords basically represent the specific type of event that the provider is capable of serving. So WRITEVM_REMOTE is an event that will be generated when someone tries to write to the memory space of a different process (using WriteProcessMemory/NtWriteProcessMemory). It does not matter if someone uses indirect or direct syscall to write to remote process memory, the tracing of the call is done at the kernel level and ETW TI can provide notification for that.

So imagine a situation where you use indirect syscall to write your shellcode to a process, if the AV/EDR is using ETW TI it can track this activity. Now there are 2 problems 1. You are writing to remote memory which can be inferred as malicious 2. ETW TI can provide the stack data which can be used to unwind the stack to know that you are using indirect syscall which is malicious anyway.

Similarly ALLOCVM_LOCAL refers to memory allocation in the current process and ALLOCVM_REMOTE refers to memory allocation in a remote process. ETW TI is powerful way of tracking malicious activity like process injection, process memory manipulation and other kind of attacks.

Putting things into perspective

image_file_life_cycle_detection

The idea of evasion

When we talk about evasion, one thing to keep in mind is that evasion is not a single technique that we can implement like DLL unhooking or memory patching and after that no one can detect our payload. Layers of detection mechanisms are implemented by AV/EDR products to track activity of a file from its existence on disk to its existence in memory during execution. So evasion functionality implemented also should correspond to the detection mechanisms we are dealing with. If we know what AV/EDR we are dealing with, then evasion functionality directed towards that particular product can be implemented otherwise a general approach needs to be taken. So as an example functionalities like delayed execution, indirect system calls for certain APIs, and making the PE file look as legitimate as possible are some general evasion techniques that need to be implemented if we want our payload to get past the detections. Protection mechanisms implemented are complicated given the complexity of attacks. But one thing to remember is AV/EDR can only employ functionalities provided by the operating system. And every detection mechanism implemented can have a limitation.

After thoughts

After looking through all the information above, it should be pretty clear that there is reason, why most people making videos about malware development turn off the anti-virus runtime detection or cloud protection before executing the payload At last below is an image from Crowdstrike’s blog and why it should be clear malware development for offsec is difficult and the complexity is going to increase more and more.

image_crowdstrike