We invite you to join the topic extraction challenge and learn about the state of art in topic extraction in bibliometrics through systematic comparison of topic extraction approaches applied by the various groups in the field and beyond. Over the last two years, six research teams worked together to compare their approaches to the identification of thematic structures in the Astronomy and Astrophysics literature, based on a shared set of bibliographic data of 111,616 journal articles. The outcomes of this comparative exercise are published in a forthcoming special issue of Scientometrics. Now that Clarivate Analytics kindly agreed to make this data set available to interested researchers in the bibliometrics community, we suggest to extend this comparative approach.
The challenge is not to develop the best partitioning of the data set. We believe this to be impossible because there is no single best solution for two reasons. First, the structure of a body of knowledge is in the eye of the beholder, i.e. more than one valid thematic structure can be constructed depending on the perspective applied to the knowledge. Second, topical structures are reconstructed for specific purposes, so if at all, there might be a best method for a given purpose. Therefore, we challenge you to use this opportunity to gain as much information as possible about your own approach and the reasons why it produced a particular solution, and to find out how it differs from solutions produced by other approaches. We challenge you to comparatively discuss advantages and disadvantages of approaches to topic identification and thus to contribute to a cumulative body of knowledge on the suitability of data models and algorithms for the identification of topics.
How to obtain the data set is described here. Submitted solutions will be published here on this website (topic-challenge.info) and can be downloaded for comparisons. We will seek to make further tools available for comparison in the near future. If there are enough participants, we plan to run sessions on the comparative exercise at the next ISSI conferences and dedicated workshops. We hope that many of you will take up the challenge and thus contribute to cumulative progress in bibliometrics.
Kevin Boyack, Wolfgang Glänzel, Jochen Gläser, Frank Havemann, Andrea Scharnhorst, Bart Thijs, Nees Jan van Eck, Theresa Velden, Ludo Waltman