KEYWORD EXTRACTION AND TOPIC MODELLING¶
Three trypes of information has been extracted from open-ended questions, in order to summarize users' answers:
- A Wordcloud with the words most frequently used. Though based only on the frequencies of the terms, in many cases they communicate well the terminology used and the areas of action indicated.
- Keywords, identified by a TextRank algorithm, coommunicating the most important terms in the answers. Keywords have been limited to single nouns.
- Topic modeling , a technique based in this case on Latent Dirichlet Allocation, which clusters the terms into groups that communicate the different "concepts" expressed.
The topic modeling output has been provided through an interactive visualization, including:
- A distance between the identified topics is shown on the left side. By clicking one of them, the most significant terms of the topic are shown on the right side. Topics can also be explored by using the panel on the top.
- The most significant terms for each topic appear on the right side. The frequency of each term actually observed from the answers is shown, together with the estimated frequency within the topic.
- By clicking on each term, the topics in which it appears are interactively shown, with size proportional to the importance of that term in the topic.