#MethodsWeDontReport – brief thought on Jason Mitchell versus the replicators

This morning Jason Mitchell self-published an interesting essay espousing his views on why replication attempts are essentially worthless. At first I was merely interested by the fact that what would obviously become a topic of heated debate was self-published, rather than going through the long slog of a traditional academic medium. Score one for self publication, I suppose. Jason’s argument is essentially that null results don’t yield anything of value and that we should be improving the way science is conducted and reported rather than publicising our nulls. I found particularly interesting his short example list of things that he sees as critical to experimental results which nevertheless go unreported:

These experimental events, and countless more like them, go unreported in our method section for the simple fact that they are part of the shared, tacit know-how of competent researchers in my field; we also fail to report that the experimenters wore clothes and refrained from smoking throughout the session.  Someone without full possession of such know-how—perhaps because he is globally incompetent, or new to science, or even just new to neuroimaging specifically—could well be expected to bungle one or more of these important, yet unstated, experimental details.

While I don’t agree with the overall logic or conclusion of Jason’s argument (I particularly like Chris Said’s Bayesian response), I do think it raises some important or at least interesting points for discussion. For example, I agree that there is loads of potentially important stuff that goes on in the lab, particularly with human subjects and large scanners, that isn’t reported. I’m not sure to what extent that stuff can or should be reported, and I think that’s one of the interesting and under-examined topics in the larger debate. I tend to lean towards the stance that we should report just about anything we can – but of course publication pressures and tacit norms means most of it won’t be published. And probably at least some of it doesn’t need to be? But which things exactly? And how do we go about reporting stuff like how we respond to random participant questions regarding our hypothesis?

To find out, I’d love to see a list of things you can’t or don’t regularly report using the #methodswedontreport hashtag. Quite a few are starting to show up- most are funny or outright snarky (as seems to be the general mood of the response to Jason’s post), but I think a few are pretty common lab occurrences and are even though provoking in terms of their potentially serious experimental side-effects. Surely we don’t want to report all of these ‘tacit’ skills in our burgeoning method sections; the question is which ones need to be reported, and why are they important in the first place?

Could a papester button irreversibly break down the research paywall?

A friend just sent me the link to the Aaron Swartz Memorial JSTOR liberator. We started talking about it and it led to a pretty interesting idea.

As soon as I saw this it clicked: we need papester. We need a simple browser plugin that can recognize, download and re-upload any research document automatically (think zotero) to BitTorrent (this was Aaron’s original idea, just crowdsourced). These would then be automatically turned into torrents with an associated magnet link. The plugin would interact with a lightweight torrent client, using a set limit of your bandwidth (say 5%) to constantly seed back any files you have in your (zotero) library folder. Also, it would automatically use part of the bandwidth to seed missing papers (first working through a queue of DOIs of papers that were searched for by others and then just for any missing paper in reverse chronological order), so that over time all papers would be on BitTorrent. The links would be archived by google; any search engine could then find them and the plug-in would show the PDF download link.

Once this system is in place, a pirate-bay/reddit mash-up could help sort the magnet links as a meta-data rich papester torrent tracker. Users could posts comments and reviews, which would themselves be subject to karma. Over time a sorting algorithm could give greater weight to reviews from authors who consistently review unretracted papers, creating a kind of front page where “hot” would give you the latest research and “lasting” would give you timeless classics. Separating the sorting mechanism – which can essentially be any tracker – and the rating/meta-data system ensures that neither can be easily brought down. If users wish they could compile independent trackers for particular topics or highly rated papers, form review committees, and request new experiments to address flagged issues in existing articles. In this way we would ensure not only an everlasting and loss-protected research database, but irreversibly push academic publishing into an open-access and democratic review system. Students and people without access to scientific knowledge could easily find forgotten classics and the latest buzz with a simple sort. We need an “research-reddit” rating layer  – why not solve Open Access and peer review in one go?

Is this feasible? There are about 50 million papers in existence[1]. If we estimate about 500 kilobytes on average per paper, that’s 25 million MB of data, or  25 terabytes. While that may sound like a lot, remember that most torrent trackers already list much more data than this and that available bandwidth increases annually. If we can archive a ROM of every videogame created, why not papers? The entire collection of magnet links could take up as little as 1GB of data, making it easy to periodically back up the archive, ensure the system is resilient to take-downs, and re-seed less known or sought after papers. Just imagine it- all of our knowledge stored safely in an completely open collection, backed by the power of the swarm, organized by reviews, comments, and ratings, accessible to all. It would revolutionize the way we learn and share knowledge.

Of course there would be ruthless resistance to this sort of thing from publishers. It would be important to take steps to protect yourself, perhaps through TOR. The small size of the files would facilitate better encryption. When universities inevitably move to block uploads, tags could be used to later upload acquired files quickly on a public-wifi hotspot. There are other benefits as well- currently there are untold numbers of classic papers available online in reference only. What incentive is there for libraries to continue scanning these? A papester-backed uploader karma system could help bring thousands of these documents irreversibly into the fold. Even in the case that publishers found some way to stifle the system, as with Napster  the damage would be done. Just as we were pushed irrevocably towards new forms of music consumption – direct download, streaming, donate-to-listen – big publishers would be forced toward an open access model to recover costs. Finally such a system might move us closer to a self-publishing ARXIV model. In the case that you couldn’t afford open access, you could self-publish your own PDF to the system. User reviews and ratings could serve as a first layer of feedback for you to improve the article. The idea or data – with your name behind it – would be out there fast and free.

edit:

Another cool feature would be a DOI search. When a user searches for a paper that isn’t available, papster would automatically add that paper to a request queue.

edit2/disclaimer:

This is a thought experiment about an illegal solution and it’s possible consequences and benefits. Do with it what you will but recognize the gap between the theoretical and the actual!

1.
Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308 free pre-print available from author here