acm-header
Sign In

Communications of the ACM

ACM News

Predicting Cyberattacks


View as: Print Mobile App Share:

A growing number of researchers are working to predict cyberattacks hoping that once identified, they can blocked.

Credit: Getty

As computer attacks become more advanced and widespread, better ways of intercepting them are needed. In the midst of the coronavirus pandemic, for example, malicious activity has soared, as hackers exploit the crisis by targeting housebound employees whose computer systems are often less secure than company networks.

More sophisticated ransomware attacks, which block a user's access to their own files and ask for a ransom to unlock them, struck hospitals and health services that may be more likely to pay up, due to stretched resources. The website of the public health department in Illinois, for example, was taken offline by such an attack, while thousands of patient records were locked at a company that conducts new medicine trials based in London, U.K.

Security software typically aims to detect malicious activity once it has happened, so the damage typically has already been done. That's why a growing number of researchers are now focusing on predicting attacks instead. "It would be great, because it means that we can be more proactive in blocking [an attack] rather than reactively fixing the problems later," says Pete Burnap, professor of data science and security at Cardiff University in the U.K.

Existing anti-virus software typically uses a database containing known malware code signatures—identifiers representing code unique to a virus—to detect new malicious activity. That database needs to be updated as new malware is produced. Furthermore, hackers can change features of the code they use to make it undetectable. "It's becoming more and more difficult to have code signatures as the only way to identify certain types of malware, because the evasive technology available to malware authors is continually increasing," says Burnap.

Rather than look for code signatures, Burnap and his colleagues developed a system that analyzes behavior instead. "When a piece of software runs on a computer system, it uses the CPU, it uses memory, it sends data in and out of the network, it starts processes and so on," says Burnap. "That is much more difficult to obfuscate and hide."

Using deep learning recurrent neural networks (RNNs), Burnap and his team were able to capture behavioral features that distinguished malware from cleanware by training their system on 2,285 samples of malware obtained from online collections, and 2,345 examples of clean software from trusted sources. They also modelled sequences of behavior that correspond with malicious activity. They chose to use RNNS since they have a form of memory that allows them to remember previous steps of a sequence when making a decision about the next step.

When their system was tested with 206 trusted and 316 malicious samples that it had never encountered, the researchers found malicious activity could be predicted with 94% accuracy within the first five seconds of execution. Another test focusing solely on ransomware, in which 3,000 samples were used, could detect an attack with the same accuracy just 1 second after it started. This model had never encountered ransomware before, and so was likely looking for malicious delivery methods rather than activity. "At the time of writing, it was the first paper that we'd seen that used this sort of early-stage prediction method," says Burnap.

The team is aiming for the system to be used on laptops and PCs as an alternative to current anti-virus software. However, they aren't sure if the behavioral concept will still hold as operating systems, software, and malware continue to evolve. "What we're trying to do [now] is understand how we can detect that the model that we've got for behavior monitoring is not actually working as well as it should anymore, then understand why that might be," says Burnap.

Another team has been using RNNs to predict the exact next step of a cyberattack. Gianluca Stringhini, co-director of the Security Lab at Boston University, has been working with colleagues on complex multi-step attacks that often target large corporations. An attacker, for example, may first perform a port scan to identify vulnerable systems on a computer, then try to break into a Web server and escalate the attack to other parts of the system.    

The Boston University team used an existing intrusion detection system that could detect single steps of an attack to train their system. Using 4,495 individual security breach events, they trained their RNN model to recognize sequences of events so it could predict which action would come next after observing some steps in an attack. "It's an extremely difficult task," says Stringhini. "It's not just binary classification in which we have a 50% probability of being right; we are actually predicting one step among over 4,000."

When the system was then tested on a database of 3.4 billion recent intrusions that it had never encountered before, the team found it could predict the next event with a precision of 93% after observing five or more steps. "[Our] contribution is that now we can be one step ahead, instead of just detecting something as it's happening or, even worse, after it happened," says Stringhini.

The team is currently looking at how they can tailor the system to individual organizations. By training it using data of previous malicious activity from a specific company, for example, it would be able to better meet its needs and limit damage. "If we predict that an attacker will go after a certain asset, for example a specific server, we could start blocking connections or rate limiting connections to protect it," says Stringhini.

In the future, it may be possible to predict attacks even earlier by looking for warning signs. Burnap thinks network reconnaissance tactics could help, since attackers often assess a system before launching an attack. However, predicting that an attack will follow with a high degree of a certainty is difficult.

"You need enough data to be able to do machine learning training and validate that," says Burnap. "It's a challenge, but certainly big corporations could do it."

Sandrine Ceurstemont is a freelance science writer based in London, U.K.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account