The data set can be obtained from Clarivate Analytics by sending an email to Jason Rollins, email@example.com. By using the data, you agree to the following license:
While participating in the Web of Science comparative topic identification exercise, you will be provided with access to the Clarivate Analytics Web of Science comparative topic identification exercise dataset. You may access and use this dataset from March 1, 2017 through December 31, 2018 only for the exercise above, subject to the “Clarivate Analytics Terms”, including the 'Web of Science: Custom Data Set Product Terms´ in the 'Product / Service Terms´, available on our Terms of Business site http://clarivate.com/tob/. By accessing and/or using our data, you are legally bound by and hereby consent to these terms. If you do not agree to these terms, then you may not access or use our data. Any extension or further use of our data beyond December 31, 2018 is strictly prohibited unless you receive prior written permission from Clarivate Analytics.
You can submit your solution to the topic extraction challenge by sending the solution file in csv format to Theresa Velden, firstname.lastname@example.org. To be accepted, the solution file needs to be formatted as described here and accompanied by two additional files, one that describes the solution and one that documents how the solution was generated.
Please provide a file solution.csv with each row referring to a document in the data set, identified by the documents UT number (UT = Web of Science Unique Article Identifier). The second entry in a row specifies the topic the document has been assigned to. If applicable, a third entry specifies the strength of this assignment. For clarity, please include in the file a header row with the column names.
Note: If your solution includes the strength of assignment to a topic, please explain the permissable range of values and interpretation in the documentation. If a solution allows for topic overlap and a document has been assigned to several topics, each of these assignments is to be listed on a separate line. If a document has not been assigned to any topic, the ClusterID is left empty.
Please provide a file solution.txt with the following information:
Note: The question of what documents are covered and not covered by a solution can be pretty involved. Sometimes documents are excluded during preprocessing, or during data modeling, or even later in the process, due to assumptions made e.g. about a reasonable topic size. If your solution covers less than 100% of documents in the original data set, please share your insights on what number of documents were excluded from your solution at what step and for what reasons.
Please provide a file approach.txt that describes the approach you used, in particular:
Further, please write a few paragraphs to describe the background for your approach, e.g. what considerations went into its design or selection of algorithms used, and for what purposes you are using its results.
Note: You can find an example of how to describe topic extraction approaches in a systematic manner in section 3 and table 2 of: Velden, Boyack, Gläser, Koopman, Scharnhorst & Wang. (forthcoming) "Comparison of Topic Extraction Approaches and Their Results" Special Issue of Scientometrics [preprint].
You are invited to present your solution to the topic extraction challenge and to participate in the discussion of how to compare and evaluate solutions through a variety of venues.
We are planning a special session on the topic extraction challenge at the upcoming ISSI conference, from 16-20 October 2017 in Wuhan, China. Please note that the paper submission deadline for the conference is April 10, 2017. We invite your contributions, by submitting a paper on the comparision of topic extraction approaches and results. In particular we encourage you to submit work-in-progress papers that present your solution to the topic extraction challenge using the Astro Data Set.
We are looking for authors who would like to discuss their own topic extraction results for the Astro Data Set, share what they learned through the exercise about their own approach and how it compares to other approaches, discuss methods for comparing topic extraction approaches, or provide insights on the challenges of evaluating results given a multiplicity of potential ground truths and purposes of topic extraction. Please contact us if interested in contributing.
You are welcome to subscribe to our mailing list. This way you will be notified when new solutions to the topic extraction challenge get added to the website, or further opportunities to engage with the topic extraction challenge arise. To be added to the mailing list, please contact us, providing your name and email address.