OKCupid Data Leak – Framing the Debate

You’ve probably heard by now that a ‘researcher’ by the name of Emil Kirkegaard released the sensitive data of 70,000 individuals from OKCupid on the Open Science framework. This is an egregious violation of research ethics and we’re already beginning to see mainstream media coverage of this unfolding story. I’ve been following this pretty closely as it involves my PhD alma mater Aarhus University. All I want to do here is collect relevant links and facts for those who may not be aware of the story. This debacle is likely going  become a key discussion piece in future debates over how to conduct open science. Jump to the bottom of this post for a live-updated collection of news coverage, blogs, and tweets as this issue unfolds.

Emil himself continues to fan flames by being totally unapologetic:

An open letter has been formed here, currently with the signatures of over 150 individuals (myself included) petitioning Aarhus University for a full statement and investigation of the issue:

https://docs.google.com/document/d/1xjSi8gFT8B2jw-O8jhXykfSusggheBl-s3ud2YBca3E/edit

Meanwhile Aarhus University has stated that Emil acted without oversight or any affiliation with AU, and that if he has claimed otherwise they intend to take (presumably legal) action:

 

I’m sure a lot more is going to be written as this story unfolds; the implications for open science are potentially huge. Already we’re seeing scientists wonder if this portends previously unappreciated risks of sharing data:

I just want to try and frame a few things. In the initial dust-up of this story there was a lot of confusion. I saw multiple accounts describing Emil as a “PI” (primary investigator), asking for his funding to be withdrawn, etc. At the time the details surrounding this was rather unclear. Now as more and more emerge it seems to paint a rather different picture, which is not being accurately portrayed so far in the media coverage:

Emil is not a ‘researcher’. He acted without any supervision or direct affiliation to AU. He is a masters student who claims on his website that he is ‘only enrolled at AU to collect SU [government funds])’. I’m seeing that most of the outlets describe this as ‘researchers release OKCupid data’. When considering the implications of this for open science and data sharing, we need to frame this as what it is: a group of hacktivists exploiting a security vulnerability under the guise of open science. NOT a university-backed research program.

What implications does this have for open science? From my perspective it looks like we need to discuss the role oversight and data protection. Ongoing twitter discussion suggests Emil violated EU data protection laws and the OKCupid terms of service. But other sources argue that this kind of scraping ‘attack’ is basically data-gathering 101 and that nearly any undergraduate with the right education could have done this. It seems like we need to have a conversation about our digital rights to data privacy, and whether those are doing enough to protect us. Doesn’t OKCupid itself hold some responsibility for allowing this data be access so easily? And what is the responsibility of the Open Science Foundation? Do we need to put stronger safeguards in place? Could an organization like anonymous, or even ISIS, ‘dox’ thousands of people and host the data there? These are extreme situations, but I think we need to frame them now before people walk away with the idea that this is an indictment of data sharing in general.

Below is a collection of tweets, blogs, and news coverage of the incident:


Tweets:

Brian Nosek on the Open Science Foundations Response:

More tweets on larger issues:

 

Emil has stated he is not acting on behalf of AU:


 

News coverage:

Vox:

http://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release?utm_campaign=vox&utm_content=chorus&utm_medium=social&utm_source=twitter

Motherboard:

http://motherboard.vice.com/read/70000-okcupid-users-just-had-their-data-published

ZDNet:

http://www.zdnet.com/article/okcupid-user-accounts-released-for-the-titillation-of-the-internet/

Forbes:

http://www.forbes.com/sites/emmawoollacott/2016/05/13/intimate-data-of-70000-okcupid-users-released/#2533c34c19bd

http://www.themarysue.com/okcupid-profile-leak/

Here is a great example of how bad this is; Wired runs stury with headline ‘OKCupid study reveals perils of big data science:

OkCupid Study Reveals the Perils of Big-Data Science

This is not a study!  It is not ‘science’! At least not by any principle definition!


Blogs:

https://ironholds.org/blog/when-science-goes-bad-consent-data-and-doubling-down-on-the-internet/

https://sakaluk.wordpress.com/2016/05/12/10-on-the-osfokcupid-data-dump-a-batman-analogy/

http://emilygorcenski.com/blog/when-open-science-isn-t-the-okcupid-data-breach

Here is a defense of Emil’s actions:
https://artir.wordpress.com/2016/05/13/in-defense-of-emil-kirkegaard/

 

2 thoughts on “OKCupid Data Leak – Framing the Debate

  1. I hope this starts some more prominent conversation about ethics rules of data collection in the digital world. While it’s particularly egregious to not anonymize the data before releasing, Emil is definitely not the first to cull personal info from social media without any participant consent.

    Online behavior is still not expected to be publicly accessible by the lay population. And even with data for which there is no reasonable expectation of privacy, does the new ability to quickly gather thousands of data points per person make all those innocuous details add up to a very rich and private picture of someone? Not many subjects know what data of theirs is publicly accessible – even fewer know what is possible to do with that public data nowadays. Observational research may need to be redefined and reconsidered.

  2. Note: an early version of this article cited an anonymous source in Aarhus University regarding Emil’s affiliations and possible motivations. Several commentators took offense to this and asked that it be removed. For the interest of the debate I have removed all references to this source until more information is available.

    Apologies for any distress caused by this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s