How to protect your privacy in open-source projects

Alina Matyukhina

Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors typically hide their identity through nicknames and avatars, however they have no protection against authorship attribution techniques.

By using the authorship attribution method today it becomes possible to create links between binaries originating from the same group of authors (Black Hat USA 2015 "Big Game Hunting: The Peculiarities of Nation-State Malware Research") or find stylistic fingerprints in the source code (DEFCON 26/2018 "De-anonymizing Programmers from Source Code and Binaries").

This technique, however, may be used in threatens to chill the free speech (contribution) of software developers activists. Such chilling effect can be seen in several cases, in which developers being treated as individuals of suspicion, intimidated by authorities and/or persuaded into removing their software from the Internet.

In light of this threat to the freedom and privacy of individual programmers around the world, in this session we show how analysts can identify the author of software and how this process can be deceiving. We present two attacks on current attribution systems: author imitation and author hiding. The first attack can be applied on user identity in open-source projects. The attack transforms syntactical representation of attacker’s source code to a version that mimics the victim’s coding style while retaining functionality of original code. This is particularly concerning for open-source contributors who are unaware of the fact that by contributing to open-source projects they reveal identifiable information that can be used to their disadvantage. For example, one can easily see that by imitating someone’s coding style it is possible to implicate any software developer in wrongdoing. To resist this attack we discuss multiple approaches of hiding a coding style of software author before contribute to open-source.

This work was conducted in collaboration with Natalia Stakhanova, Mila Dalla Preda, and Celine Perley.

Medio Inglés Inteligencia Artificial Open Source / Free Software Ciberseguridad / Privacidad Big Data / Data Science Temática social

Jueves 14/03/2019

10:00 - 10:50

Track 3 (4.1.D03)

Sobre el ponente

Alina Matyukhina

Canadian Institute for Cybersecurity

Alina Matyukhina is a cyber security researcher and PhD candidate at Canadian Institute for Cybersecurity (CIC). Her research work focuses on applying machine learning, computational intelligence, and data analysis techniques to design innovative security solutions. Before joining CIC, she worked as a research assistant at Swiss Federal Institute of Technology where she took part in cryptography and security research projects. Alina is a member of the Association for Computing Machinery, the IEEE Computer Society. She is presenting her research at several security and software engineering conferences including HackFest Canada, ISACA Security & Risk, BSides Ottawa, PyCon and DroidConSF.