acm-header
Sign In

Communications of the ACM

ACM TechNews

Csi Computer Science: Your Coding Style Can Give You Away


View as: Print Mobile App Share:
Looking to identify the authors of source code by their coding styles.

A new code stylometry uses natural language processing and machine learning to determine the authors of source code based on coding style.

Credit: protocol80.com

Researchers at Drexel University, the University of Maryland, the University of Goettingen, and Princeton University have developed a code stylometry using natural language processing and machine learning to determine the authors of source code based on coding style.

The researchers say the technology could be applicable to a wide range of situations in which ascertaining the originating coder is important, such as to help identify the author of malicious source code.

The researchers say they developed abstract syntax trees derived from language-specific syntax and keywords, which capture a syntactic feature set that "was created to capture properties of coding style that are completely independent from writing style." They tested the code stylometry by gathering publicly available data from Google's Code Jam, taking solutions to several identical problems for a group of users as a training dataset in order to learn the style of each coder. The researchers then looked blindly at solutions the same coders wrote to another problem and tried to identify the author of each.

The code stylometry achieved 95-percent accuracy in identifying the author of anonymous code.

In addition, the researchers found coding style is more well-defined through solving harder problems. "This might indicate that as programmers become more advanced, they build a stronger coding style compared to newbies," according to the researchers.

From ITWorld.com
View Full Article

 

Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account