The Collaboration Project

The Collaborative Society

Cameron Neylon

View video interview here: http://www.collaborativesociety.org/2016/04/05/camron-neylon/

My name is Cameron Neyon. My background is in Biophysics and Molecular Biology, so I'm a scientist by training. I've recently become Advocacy Director here at PLoS [Public Library of Science]. The reason I'm here is because I've always been interested in how technology can facilitate and connect and improve the way we do research, improve the way we do science, particularly in connecting what we do to where it can be used. The distinction between sort of just throwing things over the fence and hoping that someone might read them versus actually trying to configure them so they can find the places where they can be used, exploited and turned into real results.

So, PLoS is a non-profit organization. It's main activity is as a scientific publisher. It publishes several journals and a lot of articles. But it also has a history as an advocacy organization. It started its life, in fact, as a petition, an open letter that over 30,000 people signed demanding access to the outputs of publically-funded research.

What is the key problem?

In the research space, the incentives we have at the moment are quite perverse. They actually set-up in a way that discourages people from effectively communicating the research they do. And it discourages them from doing their research effectively because that's not the important thing. The key, to me, can we think about setting-up incentives, incentive structures, that support the kind of behavior that we think is the right thing. But the real problem is our institutions of research, our culture of research, isn't even industrial, it's medieval. The process by which a person becomes a researcher is to join a guild. You create and apprentice work, which is your PhD, and then you become a journeyman. You have to travel from place to place to try and find a patron. The patrons are the journals or the publishers in which you place your work and by whose supposed quality you are judged.

So the traditional way of doing this and it's a way that's been applied in scientific research, certainly, for about 30, 40, 50 years. A researcher will write an article, that will be a document that describes their experiments, their analysis and their conclusions. That would be sent to a journal and that journal would then send that article out to be reviewed by a number of that researcher's peers, members of the community, experienced scientists. And usually we're talking about two or three people here. They will then look at this article and decide whether they think it's correct, what the weaknesses are, ask for improvements. And you go through this cycle of asking for improvements, the author's maybe doing more experiments or maybe just redoing the analysis or maybe just making stronger arguments.

And that process takes time. And it has some specific weaknesses. One weakness is that there are relatively few people involved so you don't have the full range of potential expertise you'd like to bring to ask those questions. There might be a problem that is quite specific and quite difficult to spot unless you really know this particular domain and if a piece of research involves three or four or five or six different domains, as much research does today, and you've only got two or three reviewers, then you've already got a problem in understanding the spread. It takes time and it takes months, possibly years, to go through this process. And the other perhaps really important problem, particularly in terms of a networked society and networked information, is that it's a very binary judgement. You choose whether or not to publish -- the whole thing. And that doesn't address the question of whether some part of it is worthwhile, some portion of it is useful.

A paper that might get rejected because the conclusions aren't supported by the analysis might be a paper I need because of the data or the methodology. And if I have to wait two years to not see that paper, then there's a real problem. So that doesn't answer the question of if there are problems, there are weaknesses. And we have a system that was good and it was probably the best we could do if you had information on paper being sent around by postal services and having to be physically printed on, have a limited number of pages, it costs money and it's expensive so you have to choose not to print some things.

So that problem goes away with the web. We no longer have to choose not to print things because printing things, putting things online, making millions of copies, is dirt cheap. It's effectively free. But that doesn't solve the problem of quality review. Is this research up to scratch? Was the methodology done correctly? Is the data analysis up to the right standards? Is it relevant? Is it important? Is it something I should spend my time reading? Those are all important questions, but they're all different questions. And so what we could do, what, again, the capacity of the network gives us, is the ability to put these things online and then let many people address those questions to the piece of work.

We know that works in some contexts. We know that there are places like StackExchange, software question and answer site, and other places where that kind of review, that kind of quality assessment, is done effectively and is done publically and quickly. Most StackOverflow questions get answered in two hours and the ranking of answers is achievable in days, not the months that we wait for peer review. So we know these systems can work, but of course we also have a lot of traditional, cultural systems in place in the research community and it's difficult to change those. And there are lots of assumptions that we have about what works and what doesn't work that are based in this old, analog world.

Could online collaboration be a method?

There are wonderful examples of citizen science where by putting a problem online and letting anyone who wants to engage with that problem and help to solve it, you cannot just turn something into a game and have lots of people turning a wheel or clicking a mouse and helping contribute to problem, though that's really valuable in itself, but beyond that that these people in many cases become really engaged with the problem and come up with solutions that we're not seeing by the professional researchers or problems that we're not seeing by the professional researchers. So by placing information online, you give people the opportunity to both engage with it, to understand it, to use it for themselves, but also for other people to build the systems that will let wide ranges of people engage with that research and contribute to it.

One of the interesting things around what works, experiments that have worked and haven't worked in terms of research online, is that the ones that have worked have had the characteristic of not trying to move too far. And very often they specifically address the questions of what kind of rewards and results will people get and how does that fit into the normal set of things. So, one of the most successful examples of an online research project is the polymath project developed by Tim Gowers and it was an incredibly successful experiment in bringing together a large number of mathematicians to solve a very difficult problem in a way and at a pace, which Tim himself and Tim is one of the world's greatest living mathematicians, didn't think was possible. He felt it would have taken him 12 months to 18 months to figure out whether the approach he was suggesting was viable.

And this disparate group of people solved the problem, in a different way, in six weeks. So it showed that it was possible, but one of the things that was really critical about the success of the project and the engagement of the project, one was that it didn't take that long. One was that it was fairly tightly contained. It didn't take people away from where they work for too long. But one of the critical things was that, right at the front, right at the beginning it said 'when we've done this we will write a paper.' Actually, 'we'll write several papers.' And they did. And the people who were involved got their names on the paper. And so it was a very conventional form of reward. It was something that could move fast, it could be more efficient, it could be more effective.

If you look at something like Galaxy Zoo, the project that involved something that was immediately appealing, very low friction, enabled people to engage very quickly. The pictures of galaxies are attractive, beautiful and the things you can do, the way you feel you're contributing is immediate. And that project generated a lot of papers. So for the researchers it was worth the risk. They did it because it was cool, but the reward was a lot of research papers that came out the back end of it. So it's difficult to move quickly with a large group of people into a space where the rewards are different. It's a real problem. And to exploit the network you have to have scale. You have to have many people involved. You have to bring the chance of discovering the person with the right expertise to near certainty. And the question of how can we configure our research so that we enable the connections to be made that go outside of our immediate sphere is one of the most interesting questions about the technology of how we communicate research.

When did the possibilities of the Internet start to interest you?

I grew up in the late '70s, early '80s. Just young enough to have a computer as a child, Commodore 64. And I remember getting my first e-mail account in, whatever it was, I guess 1991, 1992, starting off as a university student. And it not being very useful because who else had an e-mail account? I moved to do my PhD, started in 1995, and somewhere in that transition from '94 to '95 e-mail became the primary way of communicating, at least within the sort of the research community and pretty much everyone you might try and contact would have an e-mail address. But journals and information was still mostly just print. There were a few things starting to appear online. But 1995 was interesting because it was the year that the first full genome sequence became available. The paper was actually published the year later, but the sequence, I think, was available in 1995. And it was the sequence of the E. Coli bacteria. And I remember because I was doing molecular biology and we were trying to clone genes out E. Coli.

This was an amazing thing. This was going to make our lives so much easier. And we tried to download it. And, again, we were using NCSA Mosaic and we had to manually increase the memory allocation of the browser because this text file was five megabytes. That was 1995. By 2001, when the human genome sequence, billions of bases, became available, the web had gone from this kind of slightly broken tool which you could use to download a few things here and there, maybe get a journal article, to the place where everything was, where all the information was. Where you knew as a scientist every piece of important information was supposed to be available from your desk. Every person was supposed to be available from your e-mail client. And somewhere in there, in that process from 1995 to 2001 when I guess most of us weren't really looking, we went from a network which was very highly connected, but which was difficult to use, the tools weren't very good and the capacity of the tools was not very high, to a point where everyone could connect, everyone had e-mail, everyone was on the web in one form or another. And the tools, this was still before Facebook and still before the social web, but the tools to make it easy to put stuff online were there.

And what changed was the scale of the network. It's the connectivity, that's the core thing. The thing that changes us is the assumption of connectivity and indeed what can sometimes make us arrogant is that assumption that the connectivity is there even when it isn't. But the ability to operate in a way where you can connect with people globally with relevant expertise, high levels of expertise, is what really changes things for me. I work from home and my community, the people who I engage with who relate to the work I do, is online. There are many important people in my life who I've never physically met. And they're at my fingertips whenever I need them. And that's something that just would not have been possible 10, 15 years ago. In the middle of the '90s it was, yes you had your communities, but they were smaller. They were less widespread, less diverse. And I think there is a qualitative change in our capacity to do things today. If I have a problem, a deep, specific, technical problem, and I throw out a random question on Twitter, I'll probably get the right answer back within about 10 minutes. And that changes our capacities as human beings to actually do things, in my view.

How do you see the future development of the web?

I would hesitate to suggest I know much about what the web is going to look like in 5 years, certainly not in 10 years' time. I think there are some trends you can pull out. Some of the key things are that there is more information going online and there is more instrumentation of systems. People have been talking about the Internet of Things for a while. I think we're just about turning the corner to the point where, in certain places, the world is instrumented to the point where we will have data that will change the way we think about how we interact with the world. I think the web will be more immersive and we'll see less distinction between the physical and the online world. Some people are exploring that through visual overlay in heads-up displays and those kinds of things. That's interesting. It's kind of appealing. I think there's some interesting questions to work through about how we set boundaries between the physical outside world and our internal world. What happens when we start expressing our internal digital world onto the outside physical world. I think we've got some stuff to work through there that I don't know quite how that's going to work out. And we've clearly got some really serious questions as a society around privacy and what it means. What it means to be private. What it means for a piece of information to be private. And those two things are quite different. What it means to aggregate private information. The questions to which I don't have answers.

The other big trend that I see and I think it's really interesting is one of people wanting to give up very personal information. One of the really interesting trends I've observed in the research space, as we get more and more concerned as researchers and administrators around the privacy of clinical trials and that kind of information and concern about protecting the reasonable rights of trial participants and people involved in medical experiments. At the same time, there are huge patient advocacy movements growing up that are focused on people who care very deeply, their motivation for being involved in these experiments is to insure that the maximum possibility for people to benefit is there. You've got people with really serious diseases managing them and describing how they feel, how bad their diarrhea is, on a day-to-day basis because they see the value in aggregating the data that will help someone else. And these are terminal diseases. But by giving this information away, they will help someone to manage, on a day-to-day basis, their own condition in the future. That notion of donation of my personal privacy to benefit someone else I think is a really interesting one.

And that's spreading beyond the scope of just people who have disease into people who want to contribute to the greater development, to the provision of data. Jeff Hammerbacher's famous phrase 'the unreasonable power of data.' If you have huge amounts of data you can do things that you hadn't really thought about before. And we don't have those huge amounts of data for some of the most important personal questions, around personal health, around nutrition, around how drugs affect individuals, around our genetics and disease prevalence. So people are choosing to make that information available and people are building the legal and infrastructure frameworks to make it possible. And what happens when we have a critical mass of personal, medical, genetic data online as a commons, online as a free, public resource. That's a really interesting question because it blows away all of our assumptions about why we want privacy because we have to create a space where we're not going to be private to enable us to manage and predict and control things that are very private.

I think there's a whole series of intersecting spaces there. As data becomes more prevalent, data becomes just the assumption that we know where every car in San Francisco is. And how fast it's traveling. And what it's emissions are. And how that's affecting a person walking down the street. That changes the way we manage our interaction with society. And so it's going to be very interesting to watch those two, sort of, spaces interact.

I don't think the web necessarily makes us better people. It's the challenge of human character. It's not going to change that much. So when I talk about things being instrumented and having emissions data, I use the word 'data' advisedly. It's not information. It's the raw pieces that we might use to generate information. And it's certainly a long way from wisdom. So we've got a lot of WIP stages to work through before we get to actually understanding how our own responses to the information and knowledge we have actually affects our behavior and being reflexive about how we interact with the world.

CollaborativeSociety.org is a site which explores the thinking of researchers, academicians and thought-leaders on the topic of collaboration, among other things. Thanks to Alfred Birkegaard and Katja Carlsen for providing the video content. The contribution of The Collaboration Project is these transcripts.