Embeddings Transformations for Sentiment Lexicon Enrichment

Martin Boyanov
4 min readSep 9, 2018

There exists a vector v, such that translating a negative word n by v leads to the vicinity of the antonym of n.

Examples:

  • bad + vgood
  • boring + vinteresting
  • stupid + vsmart

Introduction

Sentiment Analysis (aka Polarity Detection) is the task of deciding whether a given text is positive, negative or neutral. I was recently tasked with building a system to perform this analysis for a specific domain.

We set ourselves a goal of doing this without annotated examples — i.e without using supervised learning at first. I looked into classical approaches that focused on domain knowledge, and how unsupervised learning (word vectors), could be used to automatically grow our sentiment lexicon.

Sentiment Lexicon

Sentiment Lexicons are a simple beast. They consist of a mapping from a word to its polarity. The polarity score could be categorical (positive, neutral, negative) or numerical (e.g. on a scale from -5 to 5).

Categorical vs Numerical Sentiment Lexicons

Sentiment analysis is then some aggregation on the scores of the words in the text.
Lexicons can be created both manually by domain experts or algorithmically via various statistical measures. In the rest of the article we will focus on enriching an existing sentiment lexicon via transformations in word embedding space.

Word Embeddings

Word embeddings are representations of words in a high dimensional space. Each word is associated with a vector and semantically related words are close in embeddings space.

Word embeddings have been around for a while, but it was a 2013 paper “ Efficient Estimation of Word Representations in Vector Space” which brought them to the spotlight. Embeddings are now a standard part of most deep learning models dealing with NLP.

Word vectors can be derived via various algorithms. Most of them rely on the distributional hypothesis which states that words that are used and occur in the same contexts tend to purport similar meanings. The most popular embeddings algorithms are:

  • Continuous Bag of Words
  • Skipgram Model
  • GloVe
  • FastText

The cool thing about word embeddings is that they encode semantics and it is even possible to carry out arithmetical operations which preserve the semantic structure. The most famous example is that “king is to queen as man is to woman”:
king queen man woman

We shall leverage this property to enrich our sentiment lexicon.

Lexicon Enrichment

Lexicon enrichment will be achieved via two operations:

  • Search for synonyms by looking at the most similar vectors to known positive or negative words
Nearest neighbours of the word “great” in embeddings space
  • Search for antonyms by looking at the most similar vectors after translating by the neg2pos vector v or the pos2neg vector -v
Nearest neighbours of the word “great” after translating by the pos2neg vector -v

The steps needed to achieve the lexicon enrichment are:

Load the GloVe embeddings
  • Find the vector v by taking the mean of a small set of predefined antonym pairs
  • Define the neg2pos and pos2neg functions as simple translations by the vector v.

As it turns out, translating by the neg2pos vector leads to a more positive context, but it is still in the vicinity of the original word, and thus to its closest words/synonyms. I’ve proposed a simple way to filter them out — if a word in the new positive context is also present in the original negative context, but the score has decreased, then presumably it is a synonym of the negative word, and should be ignored when searching for antonyms.

  • Examine the results.

The tables below show how the transformations work for some examples both in the positive and negative directions. The start_word column lists the word from which the transformation was initiated. The closest column shows the words closest to the start_word in embeddings space. In the common case, they should be the synonyms of the start word. The translated column shows the words closest to the (start_word+v) point in embeddings space. The theory is that they should represent the antonyms of the start word. Unfortunately, they still contain some of the synonyms of the start_word. The last filtered column shows the results with the synonyms filtered out via the technique proposed above.

neg2pos examples
pos2neg examples

Caveats

My initial experiments seem to work best with adjectives. It’s possible that the antonym vector for nouns is different.

Future Work

In the near future I plan to examine some nice decompositions of the polarity bearing word vectors and to introduce a density based method to discover and score polarity for words in the embeddings vocabulary.

--

--

Martin Boyanov

Data Scientist passionate about NLP and Graph Modeling