OKCupid Data Leak – Framing the Debate

You’ve probably heard by now that a ‘researcher’ by the name of Emil Kirkegaard released the sensitive data of 70,000 individuals from OKCupid on the Open Science framework. This is an egregious violation of research ethics and we’re already beginning to see mainstream media coverage of this unfolding story. I’ve been following this pretty closely as it involves my PhD alma mater Aarhus University. All I want to do here is collect relevant links and facts for those who may not be aware of the story. This debacle is likely going  become a key discussion piece in future debates over how to conduct open science. Jump to the bottom of this post for a live-updated collection of news coverage, blogs, and tweets as this issue unfolds.

Emil himself continues to fan flames by being totally unapologetic:

An open letter has been formed here, currently with the signatures of over 150 individuals (myself included) petitioning Aarhus University for a full statement and investigation of the issue:


Meanwhile Aarhus University has stated that Emil acted without oversight or any affiliation with AU, and that if he has claimed otherwise they intend to take (presumably legal) action:


I’m sure a lot more is going to be written as this story unfolds; the implications for open science are potentially huge. Already we’re seeing scientists wonder if this portends previously unappreciated risks of sharing data:

I just want to try and frame a few things. In the initial dust-up of this story there was a lot of confusion. I saw multiple accounts describing Emil as a “PI” (primary investigator), asking for his funding to be withdrawn, etc. At the time the details surrounding this was rather unclear. Now as more and more emerge it seems to paint a rather different picture, which is not being accurately portrayed so far in the media coverage:

Emil is not a ‘researcher’. He acted without any supervision or direct affiliation to AU. He is a masters student who claims on his website that he is ‘only enrolled at AU to collect SU [government funds])’. I’m seeing that most of the outlets describe this as ‘researchers release OKCupid data’. When considering the implications of this for open science and data sharing, we need to frame this as what it is: a group of hacktivists exploiting a security vulnerability under the guise of open science. NOT a university-backed research program.

What implications does this have for open science? From my perspective it looks like we need to discuss the role oversight and data protection. Ongoing twitter discussion suggests Emil violated EU data protection laws and the OKCupid terms of service. But other sources argue that this kind of scraping ‘attack’ is basically data-gathering 101 and that nearly any undergraduate with the right education could have done this. It seems like we need to have a conversation about our digital rights to data privacy, and whether those are doing enough to protect us. Doesn’t OKCupid itself hold some responsibility for allowing this data be access so easily? And what is the responsibility of the Open Science Foundation? Do we need to put stronger safeguards in place? Could an organization like anonymous, or even ISIS, ‘dox’ thousands of people and host the data there? These are extreme situations, but I think we need to frame them now before people walk away with the idea that this is an indictment of data sharing in general.

Below is a collection of tweets, blogs, and news coverage of the incident:


Brian Nosek on the Open Science Foundations Response:

More tweets on larger issues:


Emil has stated he is not acting on behalf of AU:


News coverage:










Here is a great example of how bad this is; Wired runs stury with headline ‘OKCupid study reveals perils of big data science:

OkCupid Study Reveals the Perils of Big-Data Science

This is not a study!  It is not ‘science’! At least not by any principle definition!





Here is a defense of Emil’s actions:



The Wild West of Publication Reform Is Now

It’s been a while since I’ve tried out my publication reform revolutionary hat (it comes in red!), but tonight as I was winding down I came across a post I simply could not resist. Titled “Post-publication peer review and the problem of privilege” by evolutionary ecologist Stephen Heard, the post argues that we should be cautious of post-publication review schemes insofar as they may bring about a new era of privilege in research consumption. Stephen writes:

“The packaging of papers into conventional journals, following pre-publication peer review, provides an important but under-recognized service: a signalling system that conveys information about quality and breath of relevance. I know, for instance, that I’ll be interested in almost any paper in The American Naturalist*. That the paper was judged (by peer reviewers and editors) suitable for that journal tells me two things: that it’s very good, and that it has broad implications beyond its particular topic (so I might want to read it even if it isn’t exactly in my own sub-sub-discipline). Take away that peer-review-provided signalling, and what’s left? A firehose of undifferentiated preprints, thousands of them, that are all equal candidates for my limited reading time (such that it exists). I can’t read them all (nobody can), so I have just two options: identify things to read by keyword alerts (which work only if very narrowly focused**), or identify them by author alerts. In other words, in the absence of other signals, I’ll read papers authored by people who I already know write interesting and important papers.”

In a nutshell, Stephen turns the entire argument for PPPR and publishing reform on its head. High impact[1] journals don’t represent elitism; rather they provide the no name rising young scientist a chance to have their work read and cited. This argument really made me pause for a second as it represents the polar opposite of almost my entire worldview on the scientific game and academic publishing. In my view, top-tier journals represent an entrenched system of elitism masquerading as meritocracy. They make arbitrary, journalistic decisions that exert intense power over career advancement. If anything the self-publication revolution represents the ability of a ‘nobody’ to shake the field with a powerful argument or study.

Needless to say I was at first shocked to see this argument supported by a number of other scientists on Twitter, who felt that it represented “everything wrong with the anti-journal rhetoric” spouted by loons such as myself. But then I remembered that in fact this is a version of an argument I hear almost weekly when similar discussions come up with colleagues. Ever since I wrote my pie-in-the sky self-publishing manifesto (don’t call it a manifesto!), I’ve been subjected (and rightly so!) to a kind of trial-by-peers as a de facto representative of the ‘revolution’. Most recently I was even cornered at a holiday party by a large and intimidating physicist who yelled at me that I was naïve and that “my system” would never work, for almost the exact reasons raised in Stephen’s post. So lets take a look at what these common worries are.

The Filter Problem

Bar none the first, most common complaint I hear when talking about various forms of publication reform is the “filter problem”. Stephen describes the fear quite succinctly; how will we ever find the stuff worth reading when the data deluge hits? How can we sort the wheat from the chaff, if journals don’t do it for us?

I used to take this problem seriously, and try to dream up all kinds of neato reddit-like schemes to solve it. But the truth is, it just represents a way of thinking that is rapidly becoming irrelevant. Journal based indexing isn’t a useful way to find papers. It is one signal in a sea of information and it isn’t at all clear what it actually represents. I feel like people who worry about the filter bubble tend to be more senior scientists who already struggle to keep up with the literature. For one thing, science is marked by an incessant march towards specialization. The notion that hundreds of people must read and cite our work for it to be meaningful is largely poppycock. The average paper is mostly technical, incremental, and obvious in nature. This is absolutely fine and necessary – not everything can be ground breaking and even the breakthroughs must be vetted in projects that are by definition less so. For the average paper then, being regularly cited by 20-50 people is damn good and likely represents the total target audience in that topic area. If you network to those people using social media and traditional conferences, it really isn’t hard to get your paper in their hands.

Moreover, the truly ground breaking stuff will find its audience no matter where it is published. We solve the filter problem every single day, by publically sharing and discussing papers that interest us. Arguing that we need journals to solve this problem ignores the fact that they obscure good papers behind meaningless brands, and more importantly, that scientists are perfectly capable of identifying excellent papers from content alone. You can smell a relevant paper from a mile away – regardless of where it is published! We don’t need to wait for some pie in the sky centralised service to solve this ‘problem’ (although someday once the dust settles i’m sure such things will be useful). Just go out and read some papers that interest you! Follow some interesting people on twitter. Develop a professional network worth having! And don’t buy into the idea that the whole world must read your paper for it to be worth it.

The Privilege Problem 

Ok, so lets say you agree with me to this point. Using some combination of email, social media, alerts, and RSS you feel fully capable of finding relevant stuff for your research (I do!). But your worried about this brave new world where people archive any old rubbish they like and embittered post-docs descend to sneer gleefully at it from the dark recesses of pubpeer. Won’t the new system be subject to favouritism, cults of personality, and the privilege of the elite? As Stephen says, isn’t it likely that popular persons will have their papers reviewed and promoted and all the rest will fade to the back?

The answer is yes and no. As I’ve said many times, there is no utopia. We can and must fight for a better system, but cheaters will always find away[2]. No matter how much transparency and rigor we implement, someone is going to find a loophole. And the oldest of all loopholes is good old human-corruption and hero worship. I’ve personally advocated for a day when data, code, and interpretation are all separate publishable, citable items that each contribute to ones CV. In this brave new world PPPRs would be performed by ‘review cliques’ who build up their reputation as reliable reviewers by consistently giving high marks to science objects that go on to garner acclaim, are rarely retracted, and perform well on various meta-analytic robustness indices (reproducibility, transparency, documentation, novelty, etc). They won’t replace or supplant pre-publication peer review. Rather we can ‘let a million flowers bloom’. I am all for a continuum of rigor, ranging from preregistered, confirmatory research with pre and post peer review, to fully exploratory, data driven science that is simply uploaded to a repository with a ‘use at your peril’ warning’. We don’t need to pit one reform tool against another; the brave new world will be a hybrid mixture of every tool we have at our disposal. Such a system would be massively transparent, but of course not perfect. We’d gain a cornucopia of new metrics by which to weight and reward scientists, but assuredly some clever folks would benefit more than others. We need to be ready when that day comes, aware of whatever pitfalls may bely our brave new science.

Welcome to the Wild West

Honestly though, all this kind of talk is just pointless. We all have our own opinions of what will be the best way to do science, or what will happen. For my own part I am sure some version of this sci-fi depiction is inevitable. But it doesn’t matter because the revolution is here, it’s now, it’s changing the way we consume and produce science right before our very eyes. Every day a new preprint lands on twitter with a massive splash. Just last week in my own field of cognitive neuroscience a preprint on problems in cluster inference for fMRI rocked the field, threatening to undermine thousands of existing papers while generating heated discussion in the majority of labs around the world. The week before that #cingulategate erupted when PNAS published a paper which was met with instant outcry and roundly debunked by an incredibly series of thorough post-publication reviews. A multitude of high-profile fraud cases have been exposed, and careers ended, via anonymous comments on pubpeer. People are out there, right now finding and sharing papers, discussing the ones that matter, and arguing about the ones that don’t. The future is now and we have almost no idea what shape it is taking, who the players are, or what it means for the future of funding and training. We need to stop acting like this is some fantasy future 10 years from now; we have entered the wild west and it is time to discuss what that means for science.

Authors note: In case it isn’t clear, i’m quite glad that Stephen raised the important issue of privilege. I am sure that there are problems to be rooted out and discussed along these lines, particularly in terms of the way PPPR and filtering is accomplished now in our wild west. What I object to is the idea that the future will look like it does now; we must imagine a future where science is radically improved!

[1] I’m not sure if Stephen meant high impact as I don’t know the IF of American Naturalist, maybe he just meant ‘journals I like’.

[2] Honestly this is where we need to discuss changing the hyper-capitalist system of funding and incentives surrounding publication but that is another post entirely! Maybe people wouldn’t cheat so much if we didn’t pit them against a thousand other scientists in a no-holds-barred cage match to the death.

Integration dynamics and choice probabilities

Very informative post – “Integration dynamics and choice probabilities”

Pillow Lab Blog

Recently in lab meeting, I presented

Sensory integration dynamics in a hierarchical network explains choice probabilities in cortical area MT

Klaus Wimmer, Albert Compte, Alex Roxin, Diogo Peixoto, Alfonso Renart & Jaime de la Rocha. Nature Communications, 2015

Wimmer et al. reanalyze and reinterpret a classic dataset of neural recordings from MT while monkeys perform a motion discrimination task. The classic result shows that the firing rates of neurons in MT are correlated with the monkey’s choice, even when the stimulus is the same. This covariation of neural activity and choice, termed choice probability, could indicate sensory variability causing behavioral variability or it could result from top-down signals that reflect the monkey’s choice. To investigate the source of choice probabilities, the authors use a two-stage, hierarchical network model of integrate and fire neurons tuned to mimic the dynamics of MT and LIP neurons and compare the model to what they find…

View original post 436 more words


Depressing Quotes on Science Overflow – Reputation is the Gateway to Scientific Success

If you haven’t done so yet, go read this new E-life paper on scientific overflow, now. The authors interviewed “20 prominent principal investigators in the US, each with between 20 and 60 years of experience of basic biomedical research”, asking questions about how they view and deal with the exponential increase in scientific publications:

Our questions were grouped into four sections: (1) Have the scientists interviewed observed a decrease in the trustworthiness of science in their professional community and, if so, what are the main factors contributing to these perceptions? (2) How do the increasing concerns about the lack of robustness of scientific research affect trust in research? (3) What concerns do scientists have about science as a system? (4) What steps can be taken to ensure the trustworthiness of scientific research?

Some of the answers offer a strikingly sad view of the current state of the union:

On new open access journals, databases, etc:

There’s this proliferation of journals, a huge number of journals… and I tend not even to pay much attention to the work in some of these journals. (…) And you’re always asked to be an editor of some new journal. (…) I don’t pay much attention to them.

On the role of reputation in assessing scientific rigor and quality:

There are some people that I know to be really rigorous scientists whose work is consistently well done (…). If a paper came from a certain lab then I’m more likely to believe it than another paper that might have come from a different lab whose (…) head might be somebody that I know tends to cut corners, over-blows their conclusions, doesn’t do rigorous experiments, doesn’t appreciate the value of proper controls.

If I know that there’s a very well established laboratory with a great body of substantiated work behind it I think there is a human part of me that is inclined to expect that past quality will always be predicting future quality I think it’s a normal human thing. I try not to let that knee–jerk reaction be too strong though.

If I don’t know the authors then I will have to look more carefully at the data and (…) evaluate whether (…) I feel that the experiments were done the way I would have done them and whether there were some, if there are glaring omissions that then cast out the results (…) I mean [if] I don’t know anything I’ve never met the person or I don’t know their background, I don’t know where they trained (…) I’ve never had a discussion with them about science so I’ve never had an opportunity to gauge their level of rigour…

Another interviewee expressed scepticism about the rapid proliferation of new journals:

The journal that [a paper] is published in does make a difference to me, … I’m talking about (…) an open access journal that was started one year ago… along with five hundred other journals, (…) literally five hundred other journals, and that’s where it’s published, I have doubts about the quality of the peer review.

The cancer eating science is plain to see. If you don’t know the right people, your science is going to be viewed less favorably. If you don’t publish in the right journals, i’m not going to trust your science. It’s a massive self-feeding circle of power. The big rich labs will continue to get bigger and richer as their papers and grant applications will be treated preferentially. This massive mess of heuristic biases is turning academia into a straight up pyramid scheme. Of course, this is but a small sub-sample of the scientific community, but I can’t help but feel like these views represent a widespread opinion among the ‘old guard’ of science. Anecdotally these comments certainly mirror some of my own experiences. I’m curious to hear what others think.

The Bayesian Reproducibility Project

Fantastic post by Alexander Etz (@AlxEtz), which uses a Bayes Factor approach to summarise the results of the reproducibility project. Not only a great way to get a handle on those data but also a great introduction to Bayes Factors in general!

The Etz-Files

The Reproducibility Project was finally published this week in Science, and an outpouring ofmedia articles followed. Headlines included “More Than 50% Psychology Studies Are Questionable: Study”, “Scientists Replicated 100 Psychology Studies, and Fewer Than Half Got the Same Results”, and “More than half of psychology papers are not reproducible”.

Are these categorical conclusions warranted? If you look at the paper, it makes very clear that the results do not definitively establish effects as true or false:

After this intensive effort to reproduce a sample of published psychological findings, how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science, even if it is not appreciated in daily practice. (p. 7)

Very well said. The point of this project was not…

View original post 3,933 more words


Short post – my science fiction vision of how science could work in the future

6922_072dSadly I missed the recent #isScienceBroken event at UCL, which from all reports was a smashing success. At the moment i’m just terribly focused on finishing up a series of intensive behavioral studies plus (as always) minimizing my free energy, so it just wasn’t possible to make it. Still, a few were interested to hear my take on things. I’m not one to try and commentate an event I wasn’t at, so instead i’ll just wax poetic for a moment about the kind of Science future i’d like to live in. Note that this has all basically been written down in my self-published article on the subject, but it might bear a re-hash as it’s fun to think about. As before, this is mostly adapted from Clay Shirky’s sci-fi vision of a totally autonomous and self-organizing science.

Science – OF THE FUTURE!

Our scene opens in the not-too distant future, say the year 2030. A gradual but steady trend towards self-publication has lead to the emergence of a new dominant research culture, wherein the vast majority of data first appear as self-archived digital manuscripts containing data, code, and descriptive-yet-conservative interpretations on centrally maintained, publicly supported research archives, prior to publication in traditional journals. These data would be subject to fully open pre-and post-publication peer review focused solely on the technical merit and clarity of the paper.

Having published your data in a totally standardized and transparent format, you would then go on write something more similar to what we currently formulate for high impact journals. Short, punchy, light on gory data details and heavy on fantastical interpretations. This would be your space to really sell what you think makes those data great – or to defend them against a firestorm of critical community comments. These would be submitted to journals like Nature and Science who would have the strictly editorial role of evaluating cohesiveness, general interest, novelty, etc. In some cases, those journals and similar entities (for example, autonomous high-reputation peer reviewing cliques) would actively solicit authors to submit such papers based on the buzz (good or bad) that their archived data had already generated. In principle multiple publishers could solicit submissions from the same buzzworthy data, effectively competing to have your paper in their journal. In this model, publishers must actively seek out the most interesting papers, fulfilling their current editorial role without jeopardizing crucial quality control mechanisms.

Is this crazy? Maybe. To be honest I see some version of this story as almost inevitable. The key bits and players may change, but I truly believe a ‘push-to-repo’ style science is an inevitable future. The key is to realize that even journals like Nature and Science play an important if lauded role, taking on editorial risk to highlight the sexiest (and least probable) findings. The real question is who will become the key players in shaping our new information economy. Will today’s major publishers die as Blockbuster did – too tied into their own profit schemes to mobilize – or will they be Netflix, adapting to the beat of progress?  By segregating the quality and impact functions of publication, we’ll ultimately arrive at a far more efficient and effective science. The question is how, and when.

note: feel free to point out in the comments examples of how this is already becoming the case (some are already doing this). 30 years is a really, really conservative estimate :) 


[VIDEO] Mind-wandering, meta-cognition, and the function of consciousness

Hey everyone! I recently did an interview for Neuro.TV covering some of my past and current research on mind-wandering, meta-cognition, and conscious awareness. The discussion is very long and covers quite a diversity of topics, so I thought i’d give a little overview here with links to specific times.

For the first 15 minutes, we focus on general research in meta-cognition, and topics like the functional and evolutionary signifigance of metacognition:

We then begin to move onto specific discussion about mind-wandering, around 16:00:

I like our discussion as we quickly get beyond the overly simplistic idea of ‘mind-wandering’ as just attentional failure, reviewing the many ways in which it can drive or support meta-cognitive awareness. We also of course briefly discuss the ‘default mode network’ and the (misleading) idea that there are ‘task positive’ and ‘task negative’ networks in the brain, around 19:00:

Lots of interesting discussion there, in which I try to roughly synthesize some of the overlap and ambiguity between mind-wandering, meta-cognition, and their neural correlates.

Around 36:00 we start discussing my experiment on mind-wandering variability and error awareness:

A great experience in all, and hopefully an interesting video for some! Be sure to support the kickstarter for the next season of Neuro.TV!

JF also has a detailed annotation on the brainfacts blog for the episode:

“0:07″ Introduction
“0:50″ What is cognition?
“4:45″ Metacognition and its relation to confidence.
“10:49″ What is the difference between cognition and metacognition?
“14:07″ Confidence in our memories; does it qualify as metacognition?
“18:34″ Technical challenges in studying mind-wandering scientifically and related brain areas.
“25:00″ Overlap between the brain regions involved in social interactions and those known as the default-mode network.
“29:17″ Why does cognition evolve?
“35:51″ Task-unrelated thoughts and errors in performance.
“50:53″ Tricks to focus on tasks while allowing some amount of mind-wandering.

Monitoring the mind: clues for a link between meta cognition and self generated thought

Jonny Smallwood, one of my PhD mentors, just posted an interesting overview of some of his recent work on mind-wandering and metacognition (including our Frontiers paper). Check it out!

The Mind Wanders

It is a relatively common experience to lose track of what one is doing: We may stop following what someone is saying during conversation, enter a room and realise we have forgotten why we came in, or lose the thread of our own thoughts leaving us with a sense that we had reached a moment of insight that is now lost forever. One important influence on making sure that we can stay on target to achieve our goals is the capacity for meta-cognition, or the ability to accurately assess our own cognitive experience. Meta cognition is important because it allows us the opportunity to correct for errors if and when they occur. I have recently become interested in this capacity for accurately assessing the contents of thought and along with two different groups of collaborators have begun to explore its neural basis.

We were interested in whether meta-cognition is a…

View original post 1,192 more words