A Versioning experiment
Updating Previously Published Articles: A Versioning Experiment of IEEE Software
The last few years have seen a major increase in interest and availability of more open, accessible, collaborative scientific publication. This phenomenon is characterized by the accessibility online and for free of a growing number of articles and publications themselves, by the diffusion of such articles on the web, making them readily available to readers, by the existence of collaborative writing, not only on platforms like Wikipedia and other wikis. It could also take the form of continuous commenting, i.e. commenting of articles after their publication, and updating the article along the way. Gone with the wind.
This article describes an on-going experiment in collaborative addition/updating of previously published articles. We introduce first a brief overview of what “open science” means nowadays, followed by a more complete view of the technical and social context in which such a process could exist, then we describe in details the experiment itself. Finally, we formulate the expectations we may have for the experiment. In the updated version of this article, to come, we will add observations, results and conclusions.
1. Open Science
- What we mean
This experiment is made within the current movement toward more open science. That means make articles more openly available, a movement that seems irreversible now : European Union member states … agreed on an ambitious new open access (OA) target. All scientific papers should be freely available by 2020. A wider view is to make the whole process of scientific publication more open to contributions and commenting.
It also means, to a lesser extent, because less central and less widespread yet, make all data, methodological details and protocols openly available (Fosteropenscience.eu), labs and experiments, impact studies and even policy decisions.
Another aspect is accessibility of knowledge creation; the fact that anybody could contribute to the creation of knowledge, not only recognized Ph.D., members of official labs and institutions. This is sometimes called citizen science. It is on one hand the simple recognition of a matter of fact. “Amateurs” have always contributed to the observation and data collection phases of the science process. It is a political view meaning that the opinion and contribution of anybody could be of value and should be welcomed in order to avoid only parties that are able or interested in financing science be allowed to steer it. It remains to be seen how this aspect could be of any importance in Software, even though, it is obvious that practitioners produce a large part of our knowledge in the field, and always have. More on this view in Towards Bottom-Up, Stakeholder-Friven Research Funding.
Finally, Fecher & Friesike describe five different views of Open Science :
- Infrastructure, concerned with technological support;
- Public, concerned with accessibility of knowledge creation;
- Measurement, concerned with impact measurement;
- Democratic, concerned with access to knowledge and
- Pragmatic, concerned with collaborative research.
Needless to say, this experiment is related first to the pragmatic school, but also with the infrastructure and measurement.
Some of the main references in explaining the opportunity of open/collaborative scientific activity are Open Access, by Peter Suber, and Reinventing Discovery by Peter Suber.
- Similarities with open source software development.
It may be useful to keep in mind the Open Source movement in software. Despite a range of objectives, even political views in open software, the practical aspects of it seem appropriate to describe the open science phenomenon, and help us in establishing better practices, if any practice at all. Notably of interest to this experiment, is the basic development process in open source software: a more or less mature piece of a software is made open for anyone to contribute. A Vefirication & Validation mechanism is put in place so that only acceptable additions/modifications are made to the software. Similar to wikis, quality is also assured by the possibility for anyone to review the software. In open source software development, it is expected that given enough participants, the quality of the software will be improved, the saying being “given enough eyes, all bugs are shallow”. Could those practices works and be useful for scientific publication?
- The expected benefits of such practices are numerous.
Some benefits of the practices described above are financial, if the articles can be accessed openly for free, rather than through costly subscription to editors. The usual argument being that since a large part of science is publicly funded, having to buy the results is paying twice for the same thing.
Others benefits may be scientific, given that by opening the articles, but also the data and other relevant information to the community, lets more of the scientific community in each field access and reuse the information faster.
Quality of contents could also be more widely based on intrinsic value, rather than on the reputation of the authors, with some appropriate process, of course. If the same practices can be applied to scientific publishing, error correction, data, replicating experiments, etc. could be added directly to original “articles” instead of only be published at various, difficult to reconcile places, or not even published at all due to the lack of recognition for such additional information.
Another potential benefit of openly available date and protocols would be to facilitate reproducibility of studies, a recognized weakness in today scientific world as illustrated by a recent article in Nature
2. Existing practices in collaborative science
The types of collaborative science of interest for this project are continuous commenting and specific initiatives and scientific wikis. This is different form continuous publishing, where articles are added to electronic issues as they get approved through the arbitration process. IEEE already uses and promotes this model.
Continuous commenting is the practice, made possible by the social media it seems, for readers of a scientific article to comment, beyond the prepublication arbitration process.
An example of an advanced platform for supporting the open science process is the Open Science Framework: https://osf.io/
- Collaborative science initiatives
Probably the best known book on collaborative science initiatives is Reinventing Discovery: The New Era of Networked Science published by Michael Nielson, in 2011. He describes a number of initiatives where collaboration among experts:
- Polymath.org where mathematicians participate in solving together important math problems) to
- Galaxyzoo.org, where people are invited to classify galaxies
Other initiatives include the work done by both worldwide researchers and private pharmaceutical companies on the drug PZQ, using typical software open-source process. Other than the success of finding the correct product, the authors insist on how the process accelerated the findings by adding the ideas of a large number of experts who did not know each other before.
- Scientific wikis.
Scientific wikis covers a wide range of various practices, ranging from open publishing through a wiki to collaborative writing and updating of articles on specialized topics in a scientific field. For instance, this portal for open science:
- Potential advantages of collaborative scientific wikis
- Wikis keep track of all contributions, small or important.
- That includes identification of the authors of all updates. In Wikipedia, most contributions are associated only to IP addresses, but in a context of scientific publication, we can expect authors to seek recognition, and so would agree to register and have a personalized identification associated with all contributions.
- Simple, well known technology!
- Problems with collaborative publishing.
Some potential problems of collaborative publishing are the following:
- Validation of the contents. The process, would still call upon the peers to provide quality assurance. Instead of a small number of peers reviewing and approving of the text before publication, the “whole” community of peers would, in theory, provide continuous reviewing of the contents in a collaborative, wiki-type, production of articles. As mentioned above, the process is well know and respected in Open Source Software communities.
- Identification of contributions. The mediawiki engine keeps track of all individual contributions. By asking contributors to register in the wiki, we can then associate each contribution to individuals.
- Compiling individual contributions to wikis. Our own group, and others have done some work, on the identification and compilation of contributions in wikis.
- Recognition. Recognition of this type of contribution is well beyond the scope of this experiment. All we can contribute is the compiling of individual’s contributions.
3. This experiment
The experiment seeks to promote, observe and analyze the collective updating of published articles. The experiment should provide lessons regarding versioning. Versioning, the process of republishing articles after appropriate updates, is almost nonexistent.
The purpose of this experiment is to see if it is currently possible, given the right context, to collectively create an updated version of some articles. The right context is organized in order to facilitate the meeting of the objective. The rationale being that if it is not possible under the best of circumstances, the idea may have to be delayed.
- Criteria for choosing the articles
- Susceptible to be updated and/or completed.
- Susceptible to generate sufficient interest in a large enough community so that substantial contributing could be hope for.
Considering the criteria for choosing the articles, and the availability of the EIC and Contributor from IEEE-Software, and the expertise of the members of Grisou in the field of software engineering, the choice of IEEE-Software seemed like a perfect storm.
Transform the articles in wiki articles. In order to facilitate the modification of articles, the wiki format seems best. It has proven its potential largely and is well known worldwide. The mediawiki format has been chosen precisely because of the phenomenal success of Wikipedia, making it the best known of such wiki tools.
- Rules for contributing
We have chosen to force contributors to identify themselves, and login to make any modification. This is due to our need to study closely the details of the contributions. It seems also more than likely than in a routine versioning process, contributors would want to have their work recognized. Finally, having persons login is a common way to avoid robots and other virtual creatures mess up with our work.
- Role of the original authors
Some thought was given to the role of the original authors of the articles in the experiment. Should they be involved at all? If so, would they be declared editors of the articles ? Should that give them a veto right on modifications proposed?
It was decided, for this experiment, mostly for practical reasons, to proceed as follows: the original authors are invited to participate in the editing, as anybody else. Robert Dupuis, would be the main editor. His roles are to avoid any ridiculous input to be added to the articles, but also to be the first arbitrator of any conflict in the views of contributors. In such a case, if a misunderstanding cannot be resolved, both views would be included as far as possible. He would also protect the integrity and consistency of the article.
The original authors will be involved, along with other members of the editorial committee of the experiment, in deciding whether or not the final state of the articles are to be declared “new versions”.
- Compiling contributions
Grisou has been developing software to compile and report the contributions of individuals to wikis. It can scan Wikipedia for instance, and compile all contributions done by any individual.
In Wikipedia, many of the contributions are identified by an IP address. This should not be a problem in the case of this experiment, or of any larger scale collaborative scientific article publication, since contributors would most probably want their work to be recognized.
In this case, we ask contributors to register, first to avoid vandalism, but mostly to be able to describe and analyze contributions properly.
- Collaborators on this experiment
This experiment is done within the efforts of the IEEE-Computer Society to explore innovative publishing paradigms. Since the Society “shall promote cooperation and exchange of technical information among its members », it seems appropriate for it to be at the forefront of such practices.
The effort was initiated by the VP Publications Board in 2014, Jean-Luc Gaudiot, now 2017 President of the Society. It is now conducted under the 2016 VP Publications Board David S. Ebert, and 2017 VP Greg Byrd. Other important contributors within the Computer Society are Diomidis Spilennis, Editor in Chief of IEEE-Software, Xabier Larrucea, member of the IEEE Software initiatives team, along with Robert Dupuis, member of the versioning committee. The Grisou research group from UQAM (grisou.uqam.ca), whose members have worked on similar projects for the last few years, is the operating body leading the experiment. Its members are listed above. Daniel LeBlanc, CEO of a Montreal web design and development company, Ctrlweb.ca has provided the technical support necessary for the project.
4. Expected results
Above all, the experiment seeks to be a first estimate of the feasibility of collaborating versioning of articles. Since this is new, we expect to learn a lot about the what, when, who, and how of collaborative versioning. That is why the context, choice of articles, delays and advertisement of the experiment among groups interested in each article have been put in place.
Measuring success of such an experiment is not easy. If the process results in one or more new “versions” of articles, as accepted by consensus of the editorial team, we could declare that this part of the experiment is a success.
Demonstration that versioning may be linked in this experiment to proof of concept of using a wiki to produce it.
On another level, if enough lessons were learned on the process so that other experiments can be designed, that would be another type of success.
- Observing the contribution.
We will systematically observe the whole of the event, based mainly on the following aspects
- Who edits what?
- Where? What parts of the articles generate more interest?
- What types of modifications are made? Additions, modifications and deletions?
- Who modifies the text of who? Pretty soon, a network of contributors could be established and studied.
- Who changes the structure, the basic blocks of an article?
- Defining what is a new version.
We hope this first experiment will contribute to help defining what is a new version of an article. This notion seems intuitively quite simple, but very difficult to define objectively, and no such definition seems to exist. Could it be based on a percentage of the text that is modified? Could it be through a consensus of a group of observers, an editorial committee of sorts? Should the original authors be part of such a consensus? They have some interest in approving such a thing, since the new version would be considered, at some level, as a new publication.
Could it be partly after a predetermined period? Or when ctivity slows down below a predetermined or consensual level ?
- How will the contributions be recognized? Could the new version just be considered a new publication? Should it also be referenced as a group: V1.0, V2.0, etc.?
5. Longer term
The follow-up of this experiment could be done both in deepening the exploration and by widening it to other publications.
- Follow-up, a second experimentation based on the lessons learned here. Pursuing the concept of versioning within the IEEE-Computer Society publications at large. Exploring the technical and methodological aspects of versioning. How could the versions be referred to? What recognition would be given to the contributors? How could the concept be applied to conference articles? One potential experiment could be: at the scene of the conference, participants modify the article previously submitted and accepted for the conference.
- Other possibilities include reproducing the experiment as follows:
- Within IEEE-Software, either with articles previously published or brand new ones;
- Within another publication in a similar field or not, according to the availability of participants;
- A longer-term potential benefit of this versioning practice could be to facilitate and welcome the publication of follow-up and replicating studies. Such contributions are rare right now and are not even welcome in most Journals. If there exists the possibility of adding a chapter to an existing article, adding replicating results, etc. it could be beneficial to both the original authors, who would see their article made more important by additional results, and to the new authors, who could benefit from the potential renown of the original article.
- P. Knoth and N. Pontika - Own work, CC BY 3.0, https://en.wikipedia.org/w/index.php?curid=50529549
- Towards Bottom-Up, Stakeholder-Friven Research Funding, Sören Auer & Holger Braun-Thürmann, available at: http://www.informatik.uni-leipzig.de/~auer/publication/OpenScience.pdf
- B. Feeher & S. Frieske, “Open Science: One Term, Five Schools of Thought”, available at: http://book.openingscience.org/basics_background/open_science_one_term_five_schools_of_thought.html
- P. Suber, Open Access, available for download at https://mitpress.mit.edu/books/open-access.
- M. Nielsen, Reinventing Discovery- The New Era of Networked Science, Princeton University Press, Princeton, 2011.
- M. Baker, « Is There a Reproductibility Crisis ? », Nature, Vol. 533, May 2016, pp. 452-454.
- M. Woelfle , P. Olliaro and M. T. Todd, « Open Science is a Research Accelerator», Nature Chemistry, October 2011, http://www.nature.com/nchem/journal/v3/n10/full/nchem.1149.html