Tuesday, August 30, 2016

A Framework and practical implementation for sentiment analysis and aspect exploration [problem statement]


 

.

1.2 Problem statement and research questions

.

The explosion of the Web 2.0 has not only brought us a huge volume of opinionated data recorded in digital forms, but also provided us a great opportunity to understand the sentiment of the public by analysing these large-scale data. However, all of the user generated data is a double-edged sword: the larger the size of the data, the more difficult it is to extract useful information. A survey shows that Facebook generates 250 million posts per hour and Twitter users on the other hand generate 21 million tweets per hour (George, 2015). Nowadays, the review website TripAdvisor 4 generates more than 255 reviews every minute and nearly 2,600 new topics are posted every day. So far TripAdvisor has over 385 million reviews and opinions from users around the world (TripAdvisor, 2016). Facing such big data, studies have already revealed that more than half of online customers encounter frustrations during their online shopping. It makes difficult for a potential customer to read the reviews and make an informed decision. Around 30% of online customers have felt confused and overwhelmed by the amount of information, since there is a large number of spam or duplicate content in websites (Horrigan, 2008; Niven, 2012).

.

Although we are in the era of Web 2.0, flooded with tons of data every day, companies and organizations also face problems dealing with the opinionated data effectively. A survey shows that three quarters of 2,100 organizations do not have a clear idea of what their most valuable customers think about them and nearly 31% of them find it difficult to measure customers’ opinions (Michael, 2012). It is obvious that they do not lack the data sources of customers’ opinions, but the overwhelming size of opinionated data and the complexity of dealing with subjectivity, makes it difficult to extract useful information for organizations. 

.

The need to deal with these unstructured opinionated data naturally leads to the rise of research in the field of sentiment analysis. Sentiment analysis has been one of the most active research areas in natural language processing (NLP) since 2002 (see Section 2.3). The main task of sentiment analysis is to automatically determine the semantic orientation (SO) in a given document (Turney, 2002; Pang and Lee, 2008;). Semantic orientation (SO) refers to a measure of opinions and subjectivity, which indicates the polarity (positive, negative or neutral) and strength of words, phrases, sentences or documents (Hatzivassiloglou and McKeown, 1997; Turney 2002; Liu, 2010). Currently research on sentiment analysis has been dominated by two basic approaches: the first one is machine learning approach, which aims to build text classifiers by selecting right text features and algorithms from labelled instances of texts (see Section 2.5.2). The other is semantic orientation approach, which involves calculating the overall polarity via the semantic orientation of words or phrases in the text (see Section 2.5.1). Since the latter approach utilizes lexical resources like lists of opinion-bearing words, lexicons, dictionaries etc., it is also referred as lexicon-based approach (Peng and Park, 2004; Ding et al., 2008; Na et al., 2009; Taboada et al., 2011; Molina-González et al., 2015). Thus in this thesis, the terms ‘semantic orientation approach’ and ‘lexicon-based approach’ are used interchangeably.

.

Many sentiment analysis tools and applications have been developed to mine the opinions in user generated content in the Web. However, the performances are very poor due to the complexity of natural language (Sobkowicz et al. 2012, Mohammad et al., 2013; Maynard, 2016). In essence, sentiment analysis is still a problem of natural language processing (NLP), which deals with the natural language documents, which are also called unstructured data (Liu, 2012). Prior researches show that sentiment analysis is more difficult than the traditional topic-based text classification (Pang and Lee, 2008). Although various approaches have been proposed to conduct sentiment analysis, it is still difficult to deal with some linguistic phenomena, such as negation and mix-opinion text. This leads to low accuracy of sentiment classification (Vinodhini, and Chandrasekaran, 2012; Park et al., 2015; Khan et al., 2016). Besides, it is insufficient to only determine the polarity of the opinions, since an opinion without a target is of limited use. The task of extracting the opinions and their targets simultaneously, is also called aspect-level sentiment analysis in the research literature and is more difficult to achieve (Liu, 2012). Current studies show that the methods dealing with aspect-level sentiment analysis are limited (see Section 2.3.3). 

.

Due to the existing real-world problems in dealing with the big data and current research gaps (see Section 2.4.4 for more details), the research presented in this thesis is motivated to address the following two research questions: 

1) How can online product reviews be automatically and accurately classified with respect to their sentiments?

2) How to detect the aspects of sentiments shown in the online product reviews effectively? 

.

The first research question concerns the need to manage the large amount of online reviews automatically and improve the performance of sentiment classification. The second research question underlines the significance to identify the targets of the opinions, which pursues to help individuals to make an informed purchasing decision and provide manufacturers insight in order improve their products or services. 

.

1.3 Aim and objectives 

.

The aim of this thesis is to explore an effective way to conduct fine-grained sentiment analysis by improving the performance of sentiment classification and extracting aspects related with the sentiments. To cater for this aim, there are three objectives that this research has tried to achieve.

.

The first objective intends to handle the text that contains positive and negative orientated opinions, because most of the real-word data shows that positive and negative sentiments co-occur in the same document. Most documents will have both positive and negative views. Besides the aspects (attributes of an entity that a review is about) of the opinions can be various, and therefore, it is essential to separate the mixed-opinion reviews.

.

Secondly, following the semantic orientation approach for sentiment analysis (see Section 2.5.1), a domain sentiment lexicon needs to be constructed and is used to determine the polarity of a document. The sentiment lexicon contains the words with their sentiment inclinations. Due to various domains, words could be used differently and show opposite sentiment orientations in each domain. Thus the sentiment lexicon used for sentiment analysis is the key to obtaining more accurate results.

.

Furthermore, online product reviews include a variety of aspects (see Section 2.3.3). Therefore, the third objective is to extract the aspects of the products within a review, instead of predefining them, and then identify the sentiments about them.

.

Achieving these three objectives should lead to a coherent sentiment analysis framework that is proposed in this research (see Chapter 3), which aims to improve the performance of sentiment classification and provide in-depth aspect-level analysis. 

.

https://www.research.manchester.ac.uk/portal/files/55559300/FULL_TEXT.PDF

.

Friday, July 8, 2016

Making Sense of Pattern Grading


.
One pattern, three sizes
Size 12Size 16Size 6
A base size 12 pattern (left) can be graded up to a size 16 (center) using the cut-and-spread method, and similarly graded down to a size 6 (right)  by cutting and overlapping along specified cut lines.

Methods of gradingThere are three basic methods of grading: cut and spread, pattern shifting, and computer grading. No one method is technically superior and all are equally capable of producing a correct grade.
Cut-and-spread method
Cut-and-spread method: The easiest method, which is the basis of the other two methods, is to cut the pattern and spread the pieces by a specific amount to grade up, or overlap them to grade down. No special training or tools are required-just scissors, a pencil, tape, and a ruler that breaks 1 in. down to 1/64.

Pattern shifting
Pattern shifting: Pattern shifting is the process of increasing the overall dimensions of a pattern by moving it a measured distance up and down and left and right, (using a specially designed ruler) and redrawing the outline, to produce the same results as the cut-and-spread method.


The most recent development, computer grading, is the fastest method, but tends to be an investment only larger manufacturers can afford. However, sophisticated home computer software is becoming affordable.
.

woodward "epidemiology: study design and data analysis"


woodward "epidemiology: study design and data analysis keywords "badongo or depositfiles or easy-share or filefactory or gogobox or hotfile or linkbucks or mediafire or or megaupload or sendspace or uploadbox or zshare" do not work anymore now.

Interpreting Epidemiologic Evidence: Strategies for Study Design and Analysis: David A. Savitz: 9780195108408


Interpreting Epidemiologic Evidence: Strategies for Study Design and Analysis: David A. Savitz: 9780195108408

Wednesday, March 23, 2016

Topic modeling in sentiment analysis: A systematic review

 

Abstract

With the expansion and acceptance of Word Wide Web, sentiment analysis has become progressively popular research area in information retrieval and web data analysis. 

Due to the huge amount of user-generated contents over blogs, forums, social media, etc., sentiment analysis has attracted researchers both in academia and industry, since it deals with the extraction of opinions and sentiments. 

In this paper, we have presented a review of topic modeling, especially LDA-based techniques, in sentiment analysis. 

We have presented a detailed analysis of diverse approaches and techniques, and compared the accuracy of different systems among them. 

The results of different approaches have been summarized, analyzed and presented in a sophisticated fashion. 

This is the really effort to explore different topic modeling techniques in the capacity of sentiment analysis and imparting a comprehensive comparison among them.

References


Liu, B., Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies, 5(1), pp. 1-167, 2012.


Pang, B. & Lee, L., Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, 2(1-2), pp. 1-135, 2008.


Hu, M. & Liu, B., Mining Opinion Features in Customer Reviews, in Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), San Jose, USA, vol. 4, pp. 755-760, July 2004.


Hu, M. & Liu, B., Mining and Summarizing Customer Reviews, in Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-04), Washington, USA, pp. 168-177, ACM, Aug. 2004.


Zhang, L. & Liu, B., Aspect and Entity Extraction for Opinion Mining, Data Mining and Knowledge Discovery for Big Data, pp. 1-40, Springer Berlin Heidelberg, 2014.


Kitchenham, B. A. & Mendes, E., A Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications, in Proceedings of the 8th International Conference on Empirical Assessment in Software Engineering (EASE-04), Edinburgh, Scotland, UK, pp. 47-55, May 2004.


Hofmann, T., Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, 42(1-2), pp. 177-196, 2001.


Blei, D. M., Ng, A. Y. & Jordan, M. I., Latent Dirichlet Allocation, The Journal of Machine Learning Research, 3, pp. 993-1022, 2003.


Fang, L. & Huang, M., Fine Granular Aspect Analysis Using Latent Structural Models, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, South Korea: Short Papers-Volume 2, pp. 333-337, Association for Computational Linguistics, July 2012.


Lin, Z., Jin, X., Xu, X., Wang, W., Cheng, X. & Wang, Y., A Cross-Lingual Joint Aspect/Sentiment Model for Sentiment Analysis, in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM-14), Shanghai, China, pp. 1089-1098, ACM, Nov. 2014.


Xueke, X., Xueqi, C. ,Songbo, T., Yue, L. & Huawei, S., Aspect-Level Opinion Mining of Online Customer Reviews, China Communications, 10(3), pp. 25-41, 2013.


Zhai, Z., Liu, B., Xu, H. & Jia, P., Constrained LDA for Grouping Product Features in Opinion Mining, Advances in knowledge discovery and data mining, pp. 448-459, Springer, 2011.


Moghaddam, S. & Ester, M., ILDA: Interdependent LDA Model for Learning Latent Aspects and their Ratings from Online Product Reviews, in Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR-11), Beijing, China, pp. 665-674, ACM, July 2011.


Brody, S. & Elhadad, N., An Unsupervised Aspect-Sentiment Model for Online Reviews, in Human Language Technologies: in Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT-10), Los Angeles, USA, pp. 804-812, Association for Computational Linguistics, June 2010.


Zhao, W.X., Jiang, J., Yan, H. & Li, X., Jointly Modeling Aspects and Opinions with a Maxent-LDA Hybrid, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-10), Massachusetts, USA, pp. 56-65, Association for Computational Linguistics, Oct. 2010.


Jo, Y. & Oh, A. H., Aspect and Sentiment Unification Model for Online Review Analysis, in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM-11), Hong Kong, pp. 815-824, ACM, Feb. 2011.


Xu, X., Tan, S., Liu, Y., Cheng, X. & Lin, Z., Towards Jointly Extracting Aspects and Aspect-Specific Sentiment Knowledge, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM-12), Maui Hawaii, USA, pp. 1895-1899, ACM, Oct. 2012.


Mukherjee, A. & Liu, B., Aspect Extraction through Semi-Supervised Modeling, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, South Korea: Long Papers-Volume 1, pp. 339-348, Association for Computational Linguistics, July 2012.


Kim, S., Zhang, J., Chen, Z., Oh, A. H. & Liu, S., A Hierarchical Aspect-Sentiment Model for Online Reviews, in Proceedings of the Twenty-Seventh AAAI conference on Artificial Intelligence(AAAI-13), Washington, USA, July 2013.


[Bagheri, A., Saraee, M. & De Jong, F., ADM-LDA: An Aspect Detection Model Based on Topic Modelling Using the Structure of Review Sentences, Journal of Information Science, 40(5), pp. 621-636, 2014.


Gruber, A., Weiss, Y. & Rosen-Zvi, M., Hidden Topic Markov Models, in Proceeding of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS-07), San Juan, Puerto Rico, pp. 163-170, Mar. 2007.


Wang, T., Cai, Y., Leung, H.-f., Lau, R. Y., Li, Q. & Min, H., Product Aspect Extraction Supervised with Online Domain Knowledge, Knowledge-Based Systems, 71, pp. 86-100, 2014.


Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. & Ghosh, R., R., Leveraging Multi-Domain Prior Knowledge in Topic Models, in Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (IJCAI-13), Beijing, China, pp. 2071-2077, AAAI Press, Aug. 2013.


Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. & Ghosh, R., Discovering Coherent Topics Using General Knowledge, in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM-13), San Francisco, USA, pp. 209-218, ACM, Oct. 2013.


Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. & Ghosh, R., Exploiting Domain Knowledge in Aspect Extraction, in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP-13), Seattle, USA, pp. 1655-1667, Oct. 2013.


Chen, Z., Mukherjee, A. & Liu, B., Aspect Extraction with Automated Prior Knowledge Learning, in Proceedings of the 52nd Annual Meeting of the Association of Computational Linguistics (ACL-214), Baltimore, USA, pp. 347-358, June 2014.


Han, J., Cheng, H., Xin, D. & Yan, X., Frequent Pattern Mining: Current status and Future Directions, Data Mining and Knowledge Discovery, 15(1), pp. 55-86, 2007.


Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P. & Steyvers, M., Learning Author-Topic Models from Text Corpora, ACM Transactions on Information Systems (TOIS), 28(1), pp. 1-38, 2010.


Chen, Z. & Liu, B., Topic Modeling Using Topics from Many Domains, Lifelong Learning and Big Data, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, pp. 703-711, June 2014.


Chen, Z. & Liu, B., Mining Topics in Documents: Standing on the Shoulders of Big Data, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-14), New York City, USA, pp. 1116-1125, ACM, Aug. 2014.





DOI: http://dx.doi.org/10.5614%2Fitbj.ict.res.appl.2016.10.1.6

http://journals.itb.ac.id/index.php/jictra/article/view/1442