Social Semantic Web Paper Summaries 6: Sentiment Analysis In Social Networks: A Machine Learning Perspective by Elisabetta Fersini
In 2018, I have taken a master’s course from Bogazici University, called Social Semantic Web (CMPE 58H), teached by Suzan Üsküdarlı. (https://twitter.com/uskudarli)
I have wrote summaries for a few papers there. Today, I have decided to share them with you. This is both to share my understanding on these papers and to show my approach on how to read papers. Summary of the sixth paper is below.
2018 yılında Boğaziçi Üniversitesinde Sosyal Semantik Web (CMPE 58H) diye bir ders almıştım, Suzan Üsküdarlı hocamız dersi veriyordu.(https://twitter.com/uskudarli)
O derste birkaç makale özeti yazdım. Şimdi, o özetleri paylaşmaya karar verdim. Bunu, hem bu makalelere bakış açımı yansıtsın, ilgililer varsa okusun diye yapıyorum, hem de kendi makale okuyuş yöntemimi paylaşmak için yapıyorum. Altıncı makalenin özeti aşağıda.
Title: Sentiment Analysis In Social Networks: A Machine Learning Perspective
Citation: Elisabetta Fersini
She is a Computer Scientist with a passion on research on Machine Learning and Natural Language Processing. She has articles mainly focused on Sentiment Analysis and Machine Learning on Emotion Recognition.
Paper is an overview of recent development in Sentiment Analysis in Social Networks in the perspective of Machine Learning Methods. Author mentions the challenges with classifying expressions on Social Networks (expressions meaning sentences etc.) and then discusses various models and their performance. At the end of the paper, author mentions of application areas and makes estimates on future research’s inclination.
The rise of online social networks has flourished research on sentimental analysis.
Key Elements of Polarity Classification on Social Networks
- The messages on SNs are short and rich in meaning. This makes it harder to make the Sentimental Analysis. This can be dealed with an approach of hashtag and image analysing, which accompanies the text content.
- Content is noisy, having typos, not conforming grammar etc.
- Time makes the context evolve which makes it harder to define a context for the message.
- Explicit and Implicit information such as gender and age could be used to improve the performance of the analysis.
- Multilingualism: There are many languages to deal with aside from English.
- Relationships: Using network relationship data to infer features of user type could improve performance. Homophily is an important concept here. It hypotizes that social differences could be inferred via the network distance between the users.
Natural Language and Relationship usage on Polarity Classification
1- Using NL Only
Parts of speech:Linguistic role of the words (being and adjective etc.) carries importance. Sentimental shifters such as “not” and “don’t” also mean a lot for machine learning models.
Paralinguistic Content: Importance of Emoticons, Initialisms (such as OMG), Onomatopoeic Expressions (such as wow), word lengthening (italic and bold text), capital letters and hashtags.
Supevised learning provides a high performance however human labeled data is needed.
Semi-Supervised learning can be used both in small and big datasets. There are lexicon based and corpus based approaches, self-training and co-training models.
Unsupervised Learning methods mostly use lexicon based approach and generative models.
Using Both NL and Relationships
Combining content and relationships could be useful to analyse implicit opinions. However most of the recent research uses classical statistical methods, which assume the users are identical.
Status Homophily: This is similarity of the users in terms of their position in the society and economy. Being rich/poor and education degree could be the examples.
Value Homophily: This is similarity of the users in terms of their values. Schwartz values could be given as an example :)
There is a problem of assuming agreement between neighbour users. This is not the case in the real world. To better capture agreement between users, recent research is focused on Value Homophily rather than Status Homophily.
Jiang et al proposed a probability model for using the neighbour values as a sandpaper before analysing the sentiment.
To reduce the need of human labelling, many recent research has took advantege of the techniques that assumed polarity similarity between a user’s two contents is likely to be similar. This is also the case for contents of two neighbours.
Even though there are several usages of Status Homophily in the literature, usage of Value Homophily is scarce.
One approach in this method is to initialize sentiment labels with hashtags and to propagate the labels to the neighbours.
Value Homophily usage is again scarce.
- Analysing Consumer, Public Opinions.
- Summarizing User Reviews.
- Opinions of voters (Used by Obama)
- Augmentation of recommendation systems.
- Financial market behaviour estimation.
As it appears in many overview articles, author makes briefings on the area of Sentimental Analysis in Social Networks, then examines various papers and combines their results to estimate the topics of futher research in the future and to draw a conclusion.
Author concludes that the growth on sentimental analysis on social networks roots in three facts.
Vast number of area of application. Sentimental Analysis holding many challenging research problems inside of it. The Big Data that is already being collected.
Author also states that treating social network data texts as the only source of information in terms of analysing the sentiments is a fallacy. Social network relationships are also at the most importance to make such analyses.
I find overview papers useful mostly, since they provide a general knowledge on the field they are about and benefit many people apart from academicians that are already interested in the topic. The problems I found on this paper were the lack of explanations on some fundamental terms to understand the paper, using of hard language and spare detailing of application areas.
It is a benefitial paper overall, however the language was a bit hard to understand, which I find deficient for overview papers.