.
1.3 Research problem
Inspired by the challenges discussed in the previous section, the overall research problem in this study is to design and develop a systematic and comprehensive framework for document-level Email sentiment analysis. The aim is to effectively analyse and classify sentiments from Email data according to the framework shown in Figure 1.2. The framework consists of four major phases—preprocessing, feature generation, document vectorisation and sentiment analysis—and contains four main functions:
• Noise handling
• Sentiment sequence
• Sentiment classification
• Quantitative evaluation
To break down the aforementioned framework into more specific tasks, four research aims are defined according to the main components of the framework:
• Preprocessing: To investigate preprocessing methods that reduce the impact of unstructured and noisy data, and data scarcity.
• Feature generation: To investigate the effectiveness of sentiment sequence and multi-topic features on Email sentiment determination and effective feature generation methods.
• Document vectorisation: To investigate document vectorisation methods that capture sentiment sequence and multi-topic features that can be used to effectively model Email documents and represent them as numeric vectors
• Sentiment analysis: To investigate effective sentiment sequence discovery and sentiment classification methods.
.
The high-level research question derived from the main research problem is formulated as: how to incorporate the special characteristics of Email, including noise, sentiment sequence and multi-topic, into the sentiment analysis process and build a robust and effective framework for Email sentiment classification? Several sub-questions are identified that should lead to concrete technical approaches to achieving each aim:
.
Briefly, Research Question 2 is addressed through a study on sentiment sequence clustering, with a more detailed discussion given in Chapter 4. Research Question 3 is addressed through a study on sequence-encoded neural sentiment classification, with a more detailed discussion provided in Chapter 5. Research Question 4 is addressed by a study on multi-topic neural sentiment classification (Chapter 6). Research Question 1 is addressed by conducting experiments that compare the preprocessed and original data obtained in the second and third studies (Chapter 5 & 6). Research hypotheses associated with the research aims and questions are discussed in Section 2.5 following a thorough review of the literature and a summary of existing research gaps.
.
1.5 Thesis significance
The main significance of the research is the design and development of a systematic and comprehensive framework for document-level sentiment analysis of Email data. The framework fulfills four tasks, including noise handling, sentiment sequence discovery, sentiment polarity classification and quantitative evaluation, through three studies on 1) sentiment sequence clustering, 2) sequence-encoded neural sentiment classification and 3) multi-topic neural sentiment classification. an investigation on the . This research further contributes to the literature of Email sentiment analysis by investigating the effectiveness of Email data preprocessing and augmentation methods on solving the issues of data scarcity and imbalanced class distributions.
.
https://researchonline.jcu.edu.au/65310/1/JCU_65310_liu_s_thesis_2020.pdf
.