no classification
no tag
no datas
posted on 2024-11-07 20:02 read(929) comment(0) like(27) collect(4)
I have a list of words in my python programme. Now I need to iterate through this list and find out the semantically similar words and put them into another list. I have been trying to do this using gensim with word2vec but could find a proper solution.This is what I have implemeted up to now. I need a help on how to iterate through the list of words in the variable sentences and find the semantically similar words and save it in another list.
import gensim, logging
import textPreprocessing, frequentWords , summarizer
from gensim.models import Word2Vec, word2vec
import numpy as np
from scipy import spatial
sentences = summarizer.sorteddict
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
model = word2vec.Word2Vec(sentences, iter=10, min_count=5, size=300, workers=4)
If you don't care about proper clusters, you can use this code:
similar = [[item[0] for item in model.most_similar(word)[:5]] for word in words]
If you really want to clusterize the words, here are few notes:
A
is similar to B
and B
is similar to C
, so all three should be in the same cluster. This means you'll have to implement some sort of graph traversal algorithm.Here's a naive and probably not very efficient algorithm and identifies clusters:
model = gensim.models.word2vec.Word2Vec(sentences, iter=10, min_count=5, size=300, workers=4)
vocab = model.wv.vocab.keys()
threshold = 0.9
clusters = {}
for word in vocab:
for similar_word, distance in model.most_similar(word)[:5]:
if distance > threshold:
cluster1 = clusters.get(word, set())
cluster2 = clusters.get(similar_word, set())
joined = set.union(cluster1, cluster2, {word, similar_word})
clusters[word] = joined
clusters[similar_word] = joined
Author:qs
link:http://www.pythonblackhole.com/blog/article/246855/ef7e2588421937579fb6/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!