In this paper we address the problem of automatically learning
to classify the sentiment of short messages/reviews by
exploiting information derived from meta-level features i.e.,
features derived primarily from the original bag-of-words
representation. We propose new meta-level features especially
designed for the sentiment analysis of short messages
such as:
information derived from the sentiment distribution
among the k nearest neighbors of a given short test
document x, (ii) the distribution of distances of x to their
neighbors and (iii) the document polarity of these neighbors
given by unsupervised lexical-based methods. Our approach
is also capable of exploiting information from the neighborhood
of document x regarding (highly noisy) data obtained
from 1.6 million Twitter messages with emoticons. The set
of proposed features is capable of transforming the original
feature space into a new one, potentially smaller and more
informed. Experiments performed with a substantial number
of datasets (nineteen) demonstrate that the effectiveness
of the proposed sentiment-based meta-level features is not
only superior to the traditional bag-of-word representation
(by up to 16%) but is also superior in most cases to state-ofart
meta-level features previously proposed in the literature
for text classification tasks that do not take into account
some idiosyncrasies of sentiment analysis. Our proposal is
also largely superior to the best lexicon-based methods as
well as to supervised combinations of them. In fact, the
proposed approach is the only one to produce the best results
in all tested datasets in all scenarios.