Tuesday, June 16, 2020

Correlation feature selection

.
Correlation feature selection
The correlation feature selection (CFS) measure evaluates subsets of features on the basis of the following hypothesis: "Good feature subsets contain features highly correlated with the classification, yet uncorrelated to each other".[33][34]
.
https://en.wikipedia.org/wiki/Feature_selection#Correlation_feature_selection
.
.

Semantic analysis

.
Semantic analysis describes the process of understanding natural language–the way that humans communicate–based on meaning and context. ... It analyzes context in the surrounding text and it analyzes the text structure to accurately disambiguate the proper meaning of words that have more than one definition
.
https://expertsystem.com/natural-language-process-semantic-analysis-definition/
.

.

Word embedding

.
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per word to a continuous vector space with a much lower dimension.

Methods to generate this mapping include neural networks,[1] dimensionality reduction on the word co-occurrence matrix,[2][3][4] probabilistic models,[5] explainable knowledge base method,[6] and explicit representation in terms of the context in which words appear.[7]

Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing[8] and sentiment analysis.[9]
.
https://en.wikipedia.org/wiki/Word_embedding
.

Support vector machine

.
In machine learningsupport-vector machines (SVMs, also support-vector networks[1]) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting).

An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.

New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.
In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.
When data are unlabelled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups.

The support-vector clustering[2] algorithm, created by Hava Siegelmann and Vladimir Vapnik, applies the statistics of support vectors, developed in the support vector machines algorithm, to categorize unlabeled data, and is one of the most widely used clustering algorithms in industrial applications.[citatio

.

Sentiment Classification

.
1.
Given a set of evaluative documents D, it determines whether each document d ? D expresses a positive or negative opinion (or sentiment) on an object. Learn more in: Opinion Mining and Information Retrieval: Techniques for E-Commerce
2.
Classifying the polarity of a given text in the document, sentence, or feature/aspect level; e.g., emotional states. Learn more in: Sentiment Analysis in Supply Chain Management
.
https://www.igi-global.com/dictionary/sentiment-classification/26512

Text mining

.
According to Hotho et al. (2005) we can differ three different perspectives of text mining, namely text mining as information extraction, text mining as text data mining, and text mining as KDD (Knowledge Discovery in Databases) process.[1] 

Text mining is "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources."[2] 

Written resources can be websitesbooksemailsreviews, articles.

Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text

High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning

Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 

'High quality' in text mining usually refers to some combination of relevancenovelty, and interest. 

Typical text mining tasks include text categorizationtext clustering, concept/entity extraction, production of granular taxonomies, sentiment analysisdocument summarization, and entity relation modeling (i.e., learning relations between named entities).
Text analysis involves information retrievallexical analysis to study word frequency distributions, pattern recognitiontagging/annotationinformation extractiondata mining techniques including link and association analysis, visualization, and predictive analytics

The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP), different types of algorithms and analytical methods. 

An important phase of this process is the interpretation of the gathered information.
A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. 

The document is the basic element while starting with text mining. Here, we define a document as a unit of textual data, which normally exists in many types of collections.[3]
.
https://en.wikipedia.org/wiki/Text_mining

singular value decomposition

.
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any  matrix via an extension of the polar decomposition.


Mathematical applications of the SVD include computing the pseudoinverse, matrix approximation, and determining the rank, range, and null space of a matrix. The SVD is also extremely useful in all areas of science, engineering, and statistics, such as signal processingleast squares fitting of data, and process control.


Animated illustration of the SVD of a 2D, real shearing matrix M. First, we see the unit disc in blue together with the two canonical unit vectors. We then see the actions of M, which distorts the disk to an ellipse. The SVD decomposes M into three simple transformations: an initial rotation V*, a scaling  along the coordinate axes, and a final rotation U. The lengths Ïƒ1 and Ïƒ2 of the semi-axes of the ellipse are the singular values of M, namely Î£1,1 and Î£2,2.

.
https://en.wikipedia.org/wiki/Singular_value_decomposition

Sentiment analysis

.
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.

Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
.
https://en.wikipedia.org/wiki/Sentiment_analysis

Feature selection

.
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Feature selection techniques are used for several reasons:

  • simplification of models to make them easier to interpret by researchers/users,[1]
  • shorter training times,
  • to avoid the curse of dimensionality,
  • enhanced generalization by reducing overfitting[2] (formally, reduction of variance[1])


The central premise when using a feature selection technique is that the data contains some features that are either redundant or irrelevant, and can thus be removed without incurring much loss of information.[2] Redundant and irrelevant are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.[3]

Feature selection techniques should be distinguished from feature extraction. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points). Archetypal cases for the application of feature selection include the analysis of written texts and DNA microarray data, where there are many thousands of features, and a few tens to hundreds of samples.
.
https://en.wikipedia.org/wiki/Feature_selection

Latent semantic analysis

.
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. 

LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). 

A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. 

Documents are then compared by taking the cosine of the angle between the two vectors (or the dot product between the normalizations of the two vectors) formed by any two columns. 

Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.[1]

An information retrieval technique using latent semantic structure was patented in 1988 (US Patent 4,839,853, now expired) by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI).[2]
.

textual features/textual conventions


Structural components and elements that combine to construct meaning and achieve purpose, and are recognisable as characterising particular text types (see language features).

Sequential minimal optimization

.
Sequential minimal optimization (SMO) is an algorithm for solving the quadratic programming (QP) problem that arises during the training of support-vector machines (SVM).

It was invented by John Platt in 1998 at Microsoft Research.[1]

SMO is widely used for training support vector machines and is implemented by the popular LIBSVM tool.[2][3]

The publication of the SMO algorithm in 1998 has generated a lot of excitement in the SVM community, as previously available methods for SVM training were much more complex and required expensive third-party QP solvers.[4]
.
https://en.wikipedia.org/wiki/Sequential_minimal_optimization

Matthews correlation coefficient

.
The Matthews correlation coefficient (MCC) is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975.[1] Although the MCC is equivalent to Karl Pearson's phi coefficient,[2] which was developed decades earlier, the term MCC is widely used in the field of bioinformatics.
.
https://en.wikipedia.org/wiki/Matthews_correlation_coefficient