Automatic Keyphrase Extraction from Text: A Walk-through

August 29, 9:00 am - 12:15 pm (CEST)

Speakers: Eirini Papagiannopoulou; Ricardo Campos and Grigorios Tsoumakas

Tutorial website: https://intelligence-csd-auth-gr.github.io/KE_Tutorial/

Keyphrases are multipurpose knowledge gems, rendering keyphrase extraction a very important document processing task. They constitute a concise summary of documents that is extremely useful both for human inspection and machine consumption, in support of a number of tasks in the field of Natural Language Processing, Machine Learning and Information Retrieval. To this regard, the aim of this tutorial is two-fold: (a) to provide an overview of the automatic keyphrase extraction task and (b) to familiarize participants with the keyphrase extraction process. Specifically, we will provide a well‐structured review of the existing work, offer interesting insights on the different evaluation approaches, highlight open issues such as the need for evaluation approaches that take the semantic similarity of predicted and golden keyphrases, present a comparative experimental study of popular techniques, and familiarize the audience with the keyphrase extraction process via a demo presentation using jupyter notebook, which will be available to the audience during the practical part of the tutorial. We expect the tutorial to help newcomers and veterans alike navigate the large amount of prior art and grasp its evolution

Eirini Papagiannopoulou is a final-year Ph.D. student in School of Informatics from the Aristotle University of Thessaloniki (AUTH) in Greece. Her reseach is on the field of text mining and natural language processing. She also holds a BSc in Informatics from the University of Ioannina, Greece (2011) and an MSc in Informatics from AUTH (2013). She has participated in international and private sector funded R&D projects and has published 2 international journals and 3 conference papers. Since September 2018 she is also working as a Research Associate at CyRIC (Cyprus Research & Innovation Center Ltd) in the context of RISE (Marie Skłodowska-Curie action). Her research interests include Data Mining, Machine Learning, Natural Language Processing and Semantic Technologies.

Ricardo Campos is an assistant professor at the ICT Departmental Unit of the Polytechnic Institute of Tomar and lecturer at the Porto Business School, where he teaches at the Business Intelligence and Analytics Post-Graduate Programme. He is an integrated researcher of LIAAD-INESC TEC, the Artificial Intelligence and Decision Support Lab of U. Porto, and a collaborator of Ci2.ipt, the Smart Cities Research Center of the Polytechnic of Tomar. He is PhD in Computer Science by the University of Porto (U. Porto). His PhD on temporal information retrieval led him to win the Fraunhofer Portugal Challenge 2013 and to be distinguished as an “outstanding” researcher by the INESC TEC research lab. He has over 10 years of research experience in Information Retrieval and Natural Language Processing. In 2018, he has been awarded the best short paper award at ECIR’18 and the 1st prize of the Arquivo.pt Award for the project Conta-me Histórias. In 2019 he has been awarded the Best Demo Presentation and the Recognized Reviewer Award at ECIR’19, and nominated outstanding reviewer of the NAACL-HTL’19 conference. He is an editorial board member of the Information Processing & Management Journal (Elsevier), co-chaired international conferences and workshops, and is a regular member of the scientific committee of several international conferences.

Grigorios Tsoumakas is an Assistant Professor of Machine Learning and Knowledge Discovery at the School of Informatics of the Aristotle University of Thessaloniki (AUTH) in Greece. He received a degree in Computer Science from AUTH in 1999, an MSc in Artificial Intelligence from the University of Edinburgh, United Kingdom, in 2000 and a PhD in computer science from AUTH in 2005. His research expertise focuses on supervised learning techniques (ensemble methods, multi-target prediction) and text mining (semantic indexing, sentiment analysis, topic modeling). He has published more than 100 research papers and according to Google Scholar he has more than 10,000 citations and an h-index of 42. Dr. Tsoumakas is a senior member of the ACM, an action editor of the Data Mining and Knowledge Discovery journal, and a member of the editorial board of the Frontiers of Computer Science journal. He is an advocate of applied research that matters and has worked as a machine learning and data mining developer, researcher and consultant in several national and private sector funded R&D projects.