How to find semantic similarity using gensim and word2vec in python-python black hole net

python black hole index

download

python video tutorial

Python project combat

Other resources

blog Q&A The programmer

write register

News from this site

Rental advertising space, please contact the webmaster if you need cooperation

244893

article

122578561

browse

+focus

classification

no classification

tag

date

no datas

How to find semantic similarity using gensim and word2vec in python

posted on 2024-11-07 20:02 read(929) comment(0) like(27) collect(4)

I have a list of words in my python programme. Now I need to iterate through this list and find out the semantically similar words and put them into another list. I have been trying to do this using gensim with word2vec but could find a proper solution.This is what I have implemeted up to now. I need a help on how to iterate through the list of words in the variable sentences and find the semantically similar words and save it in another list.

import gensim, logging

import textPreprocessing, frequentWords , summarizer
from gensim.models import Word2Vec, word2vec

import numpy as np
from scipy import spatial

sentences = summarizer.sorteddict

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
model = word2vec.Word2Vec(sentences, iter=10, min_count=5, size=300, workers=4)

solution

If you don't care about proper clusters, you can use this code:

similar = [[item[0] for item in model.most_similar(word)[:5]] for word in words]

If you really want to clusterize the words, here are few notes:

There can be several such clusters.
The number of clusters depends on a hyperparameter, some threshold. When the threshold is big, all of the words are similar and belong to the same cluster, when it's too small, none of them are.
Words can be naturally included transitively into a cluster, i.e. A is similar to B and B is similar to C, so all three should be in the same cluster. This means you'll have to implement some sort of graph traversal algorithm.
The performance greatly depends on the training corpus: only if it's large enough, gensim word2vec will be able to capture proper similarity. Gemnsim hyperparameters and text pre-processing thus also matter.

Here's a naive and probably not very efficient algorithm and identifies clusters:

model = gensim.models.word2vec.Word2Vec(sentences, iter=10, min_count=5, size=300, workers=4)
vocab = model.wv.vocab.keys()

threshold = 0.9
clusters = {}
for word in vocab:
  for similar_word, distance in model.most_similar(word)[:5]:
    if distance > threshold:
      cluster1 = clusters.get(word, set())
      cluster2 = clusters.get(similar_word, set())
      joined = set.union(cluster1, cluster2, {word, similar_word})
      clusters[word] = joined
      clusters[similar_word] = joined

Category of website: technical article > Q&A

Author：qs

link：http://www.pythonblackhole.com/blog/article/246855/ef7e2588421937579fb6/

source：python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

27 0

collect article

name:

Comment content: (supports up to 255 characters)

The programmer(new)

no articles

python ebook(new)

Pacman-python classic arcade game source code download

Cannon-projectile python freegames game source download

Four-middle chess-python mini game source download

Lingbo microstep-python mini game source code download

Pikachu go python mini game source code download

Memory-Number-pair puzzle game python mini game source download

Ping Pong-Classic Arcade Game-Python Mini Games Source Download

Classic memory puzzle game-python mini game source download

Three-middle chess python game source code download

Battle of Tanks-python game

python script(new)

Python generates simple fractal source code download

Picture-to-character painting gadget source download

Python confession applet (1) source download

Web confession applet single page source download

web confession applet source download

Web small program watch day countdown source download

Web confession applet romantic confession source download

Web confession applet picture slide show source download

Web confession applet (1) big love plus text source download

Web confession applet love tree plus text animation source code download

blog(new)

Huawei OD Machine Test Paper E - Narcissus Mathematics I (Java & Python & JS & C++ & C )

Python installs Crypto library and reports an error: ModuleNotFoundError: No module named 'Crypto'

Python series: Several methods for comparing sizes in Python

【Python】File Operation

Summary of pitfalls and solutions for Mamba environment installation under Windows (without bypassing selective_scan_cuda)

Python embedded packaging, that is, embed version installation and use

Java——Student Information Management System (Simple + Super Detailed)

When using Python crawler, I encounter a socket.gaierror error, which is manifested as [Errno 11001] getaddrinfo failed. How can I solve this problem?

Community Edition/Professional Edition of PyCharm uses Python to operate MySQL database: configure database, create your own database, import data

Web design based on Python language (hand-in-hand teaching you to design a personal blog website)

python video tutorial(new)

Python games-fioodit source download

Python game-Xiao Xiaole source download

python game-pinball game source code download

python game: Super Mario source download

Download the source code of the small game "Ao Big Meow Run" developed with python+pygame

Three cool python games source code download

3D Bomberman game source download

Python realizes the game of life-python mini game source code download

Python game-16 lines of code to realize 3D pool game! -Source code download

Python painting game-python game source download

Python project combat(new)

python game: doder Dodge source code download

python game: space shooting source download

Python game: Airplane Wars source download

Python game test typing speed source download

Python small game shoot the intruder source code download

Python mini game hide the box source code download

Python small game Tetris source code download

Python small game Gomoku-man-machine battle source download

Python game greedy snake source download

Python games-Sokoban source code download

Q&A(new)

ws4py.websocket: Closing connection from client long delay

Faster Python implementation from bag of words data frame to array

How do you use Python Multiprocessing for a function with zero positional arguments?

Can't make a multi-variable linear regression converge

How to conditionally separate a cell value and add to a column using pandas

How to find semantic similarity using gensim and word2vec in python

How does Hard Voting select a result with an even number of classifiers in a VotingClassifier in scikit-learn?

Bokeh Data Table Linked to Select Widget

Maximum Likelihood Function in Python

nltk tag tag_sents give different results

python game(new)

Web confession applet video plus text cool dynamic html source download

Python game AI Gomoku

Digital exchange python puzzle game source code download

tron python games

python game life

Maze-python mini game source download

Kite python game source code download

Python small game puzzle source download

Little dinosaur run python game source code download

Alien invasion python mini game source download

Other resources(new)

car race game in python

A python implementation of 'game of life'

Implement game 2048 with python

Collection of python games

A simple text adventure to help me learn Python

A simple game in python with pygame

Python games-fioodit source download

Python game-Sokoban source code download

Python games-sliding puzzle source download

Python anecdote game: memory maze source download

Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7

For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327

Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!

About this site/About the site owner