Authorea

Alexander Kirillov edited bf_Abstract_The_word2vec_software__.tex almost 8 years ago

Commit id: 3ff15e100aeb07a3a0240fcca312701c47006d67

deletions | additions

{\bf Abstract} The goal of formalization, proposed in this paper, is to bring together, as near as possible, the linguistic problem of synonym conception and the computer linguistic methods based generally on empirical intuitive unjustified factors. Using the word vector representation we have proposed the geometric approach to mathematical modeling of synset. The word embedding is based on the neural networks (Skip-gram, CBOW), developed and realized as word2vec programme by T. Mikolov. , obtained by means of the neural networks, word embedding in euclidean space, em The word2vec software of Tomas Mikolov and colleagues1 has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers [1, 2]. We

figure out the rationale behind the equations. We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in