You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to implement the model on a corpus of 5352 documents following the tutorial notebook. After running the model.fit() method I can plot my results as a graph, see topic distributions per document, and use the model.clustering_query to get valid outputs. However when running model.clusters I get a dict with N-topics but with empty lists:
Any known reasons as to why this may happen, or do I need to provide more info? I've installed graph-tool on my Windows system through Docker.
Edit:
After looking further into the source code for the model.clusters I see that the problem is that one of the objects contain NaN values, as such recoding NaNs to 0s helped me solve my problem. The problem then seems to originate from the model.get_groups() method, though I havn't had the time debugging that yet.
def clusters(self,l=0,n=10):
'''
Get n 'most common' documents from each document cluster.
most common refers to largest contribution in group membership vector.
For the non-overlapping case, each document belongs to one and only one group with prob 1.
'''
# dict_groups = self.groups[l]
dict_groups = self.get_groups(l=l)
Bd = dict_groups['Bd']
p_td_d = dict_groups['p_td_d']
p_td_d = np.nan_to_num(p_td_d, 0) # <----- This solved my issue
docs = self.documents
## loop over all word-groups
dict_group_docs = {}
for td in range(Bd):
p_d_ = p_td_d[td,:]
ind_d_ = np.argsort(p_d_)[::-1]
list_docs_td = []
for i in ind_d_[:n]:
if p_d_[i] > 0:
list_docs_td+=[(docs[i],p_d_[i])]
else:
break
dict_group_docs[td] = list_docs_td
return dict_group_docs
The error pertains to this warning:
/home/user/sbmtm.py:547: RuntimeWarning: invalid value encountered in true_divide
p_td_d = (n_db/np.sum(n_db,axis=1)[:,np.newaxis]).T
/home/user/sbmtm.py:553: RuntimeWarning: invalid value encountered in true_divide
p_tw_d = (n_dbw/np.sum(n_dbw,axis=1)[:,np.newaxis]).T
The text was updated successfully, but these errors were encountered:
I'm trying to implement the model on a corpus of 5352 documents following the tutorial notebook. After running the
model.fit()
method I can plot my results as a graph, see topic distributions per document, and use themodel.clustering_query
to get valid outputs. However when runningmodel.clusters
I get a dict with N-topics but with empty lists:Any known reasons as to why this may happen, or do I need to provide more info? I've installed graph-tool on my Windows system through Docker.
Edit:
After looking further into the source code for the
model.clusters
I see that the problem is that one of the objects contain NaN values, as such recoding NaNs to 0s helped me solve my problem. The problem then seems to originate from themodel.get_groups()
method, though I havn't had the time debugging that yet.The error pertains to this warning:
The text was updated successfully, but these errors were encountered: