TOPIC EXTRACTION CHALLENGE

History

This topic extraction challenge seeks to broaden the base for a collective investigation of the performative nature of topic extraction approaches: To what extent are structures emerging from the application of topic extraction approaches indeed a representation of thematic structures in science or artifacts produced by the methods used? A first step in this direction was made in a collaborative effort including Kevin Boyack, Wolfgang Glänzel, Jochen Gläser, Frank Havemann, Michael Heinz, Rob Koopman, Carl Lagoze, Ismael Rafols, Andrea Scharnhorst, Bart Thijs, Nees Jan van Eck, Theresa Velden, Ludo Waltmann, Shenghui Wang, and Shiyan Yan: From 2014-2016 we used a data set of 111,616 articles in Astronomy and Astrophysics from the Web of Science to compare the results of our respective topic extraction approaches. This work is published as a special issue of the journal Scientometrics (forthcoming).

Going back further, this collaboration emerged from a project by Frank Havemann, Michael Heinz and Jochen Gläser, which started in 2009. It was funded by the German Ministry of Education and Research, and focused on measuring the epistemic diversity of research. In order to measure the epistemic diversity of a research field, the field must be delineated, and topics in the field identified. The project found that in contrast to many other purposes of field delineation and topic identification, measurements of diversity are highly sensitive to variation in the delineation of fields or topics. Consequently, the discussion between the project and its advisory board (members: Kevin Boyack, Wolfgang Glänzel, Ismael Rafols, Andrea Scharnhorst, Michel Zitt) soon focused on the question of state of the art in field delineation and topic identification. They invited further colleagues to a series of workshops on approaches to topic identification, and at some point, the idea emerged to learn more about our respective approaches by applying them to the same data set and to compare the outcomes.