Monday, July 31, 2017

A Hybrid Approach to the Sentiment Analysis Problem at the Sentence Level [Problem Statement]

 


.

5.1 Are there other paths besides Supervised Machine Learning to address the Sentiment Analysis problem?

If we take a closer look to the SA/OM field and think about classification techniques, the first thing that comes to mind is the utilisation of Machine Learning (ML) approaches. Traditionally, in ML we think of unsupervised, semi-supervised or supervised machine learning algorithms. The latter technique, as we well know, relies heavily on training, which implies counting with the adequate and voluminous annotated datasets. This constitutes a drawback and we would like to avoid, if possible, having to count on prior data for training purposes, as this is not always possible, and we would like to explore a path that does not require such a massive annotation effort.

As a consequence, supervised machine learning does not look like a technique that we would be interested in pursuing. Ideally, in the context of SA/OM, an unsupervised strategy would rather “measure how far a word is inclined towards positive and negative” [224].

.

Ultimately, the problem of SA/OM is basically a NLP problem with emphasis in finding when a sentence reveals an opinion -as opposed to a fact- and extracting the polarity of the opinion (usually, Positive or Negative). Kanaga [112], in discussing ideas presented by Lotfi Zadeh in [269], says “The semantics of natural languages and information analysis is best handled by the epistemic facet of Fuzzy Logic. In the epistemic facet, natural language is viewed as a system for describing perceptions and an important branch of the same is possibility theory and computational theory of perceptions”. Hence, does it worth to take a new look to Fuzzy Sets / Logic as a potential effective tool in SA/OM? The path that we would like to pursue, will include the utilisation of linguistic semantic rules, lexicon-based approaches and fuzzy sets as fundamental components of a hybrid approach towards SA/OM. Based on the information, references and discussions shown in previous sections, we would like to think that the following concepts could become cornerstone to a potential research direction that would differ from the most commonly followed paths.

.

1. In Sentiment Analysis the most utilised approach, which accounts as well for most of the research published, is text classification (see figure 8.1) relying heavily on Machine Learning techniques, especially Support Vector Machine (SVM) and Na¨ıve Bayes.

.

2. Fuzzy Sets and Fuzzy Logic have been used as well, but to a lesser extent, and the literature about it is less abundant when compared to (1) above.

.

3. One of the main objections with regard to the use of Fuzzy Logic/Sets in Sentiment Analysis is given by Balahur-Dobrescu [21]: “we can show that while the fuzzy models of emotion perform well for a series of cases that fit the described patterns, they remain weak at the time of acquiring, combining and using new information”. However, we believe that some of the shortness can be minimised by combining together fuzzy methods and some semantic rules and linguistic techniques. See, for example, the progress reported on acquiring new information on Kruse et al [122] (using neuro-fuzzy modelling) and Hullermeier [104] ¨ (applying learning fuzzy rules).

.

4. With the advent of SentiWordNet [79] and Senticnet [42, 45], the availability of solid sentiment lexicons with incorporated updating capabilities has become a reality.

.

5. Hatzivassiloglou et al [92, 93] proposed a methodology to predict the semantic orientation of adjectives that could be extended to nouns, adverbs, and verbs. It seems that predicting the semantic orientation of certain parts of speech can greatly help on suggesting the semantic orientation of sentences and documents.

.

6. Grammatical dependencies may play a significant role in a proper understanding of a sentence. As quoted from [200], “In any sentence, words are arranged in a proper sequence to communicate information. The complete meaning of a sentence is not only determined by the meaning of words, but also by the pattern in which words are arranged”.

.

7. Supervised machine learning has proven to be a strong classification tool. However, it will depend enormously on the training data and we are attempting to move towards a system that depends less on pre-existing annotated data. We would like to rely more on the richness of fuzzy sets as a modelling apparatus, as well as in semantic rules, syntactic analysis and aggregation techniques (see Chapters 12, 13 and 14).

.

1. Are lexicon-based methods capable of delivering similar precision to the one provided by Supervised Machine Learning techniques in the determination of polarity subjectivity in Sentiment Analysis?

2. Are fuzzy methods adequate to support subjectivity determination and model polarity in Sentiment Analysis by introducing gradualness (graduality) represented through the application of fuzzy sets? 

3. Are semantic rules a good mechanism for computing semantic orientation in both, words and sentences? Is there going to be synergy among all these elements? Currently, most of research performed has been conducted using Supervised Methods in Machine Learning (mostly SVM, Na¨ıve Bayes and others).

.

Focus:

1. Building a sentiment lexicon: the creation of a sentiment lexicon, counting with sentiment-conveying terms (words), part-of-speech tags and polarity scores for each term (see Section 14.1.1).

.

2. Devise the necessary logic to evaluate the polarity of sentences (by using a sentiment lexicon-based approach): the devising of the algorithms that will produce a classification output using a number of rules and the sentiment lexicon previously defined (see Sections 14.1.2 and 14.1.2.1).

.

3. Design a process to identify the intensity/graduality of sentences: the generation of a mechanism to identify the intensity/graduality of a given sentence (see Section 14.1.3).

.

https://dora.dmu.ac.uk/bitstream/handle/2086/14363/OrestesAppel-Final-Version%28July-2017%29.pdf?sequence=1&isAllowed=y

.


No comments: