-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: current kNNWithMeans implementation doesn't allow for classical item-item kNN approach #135
Comments
So basically that's the same as using Peason, but instead of instead of centering with the row average (for row-row similarity), and centering with the column average (for column-column), we center with the column average for row-row similarity and with the row average for column-column?
Sure
No, we should have the choice between the similarity measures, regardless of the choice of algorithm. If a user wants to use an "inappropriate" measure for whatever reason, she should be able to do it. |
I am not sure whether we got each other correctly. My proposal: |
The answer is "yes". |
Sorry, I thought about it some more, and I was wrong here:
Adjusted cosine considers different means from KNNWithMeans (they use orthogonal means, as you have pointed), so, there is no need to pass some data from algo to the similarity measure and they could be completely decoupled. |
Yeah I agree. Let's just implement the new similarity measure then. If some improvement can be done there will always be room for them. |
Nicolas,
Thus, I have a couple of proposals regarding "adjusted cosine implementation":
|
For the 1st bullet - actually, whole vectors (rather than only mutually weighted items) can be used with a "usual" cosine as well. Example: "Recommender system. The Textbook", 2.3.1.1, formula (2.6) BTW, I offer to implement all of this as separate scoring functions, rather than parameters:
|
Hi everyone and thank you for this useful project. I have also encountered the same issues trying to implement item-item CF with adjusted cosine using KNNBasic and passing as train set the user-mean-centered ratings matrix, instead of the original ratings matrix. |
I see. Basically any similarity metric could use either complete vectors of ratings or just commonly rated items /common users, as currently implemented. I chose the latter because my intuition is that with the former one, we're comparing extremely sparse vectors and choosing a value of zero for non-existing ratings is completely arbitrary. That being said the current version also has major flaws (basically we're comparing similarities which do not have the same support so that does not make sense). So if using the whole vectors is commonly used and if it's efficient in practice, then there's no reason not to implement it. A good way for allowing this new version would be to add a However :) , |
@NicolasHug But before commit - I need your agreement on the following design issue:
It seems that 2nd approach is perfect, but probably I miss smth else. Could you confirm it? |
Yes there's no problem in creating different But once again, to keep the commit history clean, the implementation of |
I added Regarding adding documentation: |
Hi, |
@FrancescaCristo89 Thank you for the info! |
I'll close this issue as it's been stagnating and it's actually concerned with two different problems: supporting adjusted cosine, and computing similarities based on common ratings only (or not). I've opened #163 and #164 in replacement (with reference to this issue). Separating these two issues will hopefully be clearer and easier to follow. Nicolas |
Current implementation for kNNWithMeans computes similarity measure on the original ratings, and than uses mean values to compute a prediction.
According to:
For the purpose of item-item similarity computation:
we should use "adjusted cosine" similariy, by taking into account not raw ratings, but rather subtracting average item-rating from each users-rating, before computing item similarities.
How to address the problem:
Possible approach would be to add an "adjusted cosine" similarity measure. But in case we compute it independently of kNNWithMeans - this will mean that we will compute mean values twice, independently for similarity measure and predictions, which seems to be a computation-ineffective approach. So, I would suggest rather to incorporate mean-adjustment in the prediction algorythm itself.
The text was updated successfully, but these errors were encountered: