Version Tomorrow is the first day of the rest of your life
lecture: Mass Surveillance abusing Computational Linguistics and Human Language Technology
Mind-sets, state-of-the-art methods and practices according to official documents including leaks
Even though the Snowden revelations for the first time clearly show that mass surveillance of communications is carried out on a global level, little knowledge seems to be available to the general public as well as the hacker community how these so-called COMINT operations actually work. The talk focuses on mass surveillance based on methods known from research of the interdiscplinary field of Computationanal Linguistics (or Natural Language Processing) and demonstrates how generation of selectors for mass surveilling text messages can occur.
However, little knowledge is available concerning the nature of these words, which are supposed to be the needles the secret services search for in the haystack. Even though different leaks exist, showing different forms of hard selectors (like telephone numbers or email addresses of high-ranking policians or economic actors), there's also evidence for soft selectors, targeting no specific persons, but whole groups or populations.
There's no clear knowledge available of how such selectors are defined. However, combining insights from Snowden and WikiLeaks documents, taking into account the "neutral" state-of-the-art research in the field of Computational Linguistics and Human Language Technology, but also specific surveillance research (e.g., from the EU-sponsored INDECT project including their scaring assumptions about human nature) and own experiments done by the author during his research, a much deeper understanding of the selectors' nature can be achieved. As such, the talk helps to demistify the COMINT work and turns its supposed "secret knowledge" to the public. The talk also shows that the application of such methods by principle suffer from catastrophic false-positive rates, thus affecting lots of innocent people in any case, even though governments wants us to believe the opposite, telling us stories of precise searches being carried out only.
The speaker by education is Computational Linguist, Sociologist and Neuroinformatician. As activist of the Chaos Computer Club in Switzerland (CCC-CH), he not only fought a surveillance law introducing extended mass surveillance – including selector search of communications in Switzerland, but also focused his master thesis on exactly this topic.
- The speaker's related Master Thesis (DE)
- The speaker's personal website with related texts
- Slides on a (related) talk "Mindsets and Methods of Mass Surveillance and application in Switzerland" [PDF]
- Slides on a (related) talk "Computerlinguistik und Massenüberwachung: State-of-the-Art nach Snowden-Fundus, INDECT-Papers & Co." (DE w/ lots of EN excerpts)
- Slides on a (related) talk "Masterarbeit: Computerlinguistik und Massenüberwachung -- Im Lichte der Enthüllungen Snowdens" (DE w/ some EN excerpts)
- Speaker's page in SHA2017 wiki