-
Notifications
You must be signed in to change notification settings - Fork 0
ksafford/rsa
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
aboveAveScore.pig: Loads files with a format id, {(word1, score1), . . . (wordN, scoreN)}, finds the average score for each word, and generates a relation with only the words with above average scores for each id. Output is stored as id, {(id, word1) . . . (id, wordN)} in /<outpath>/above_ave_score/part-r-xxxxx To Run: pig -p files="<files to read in>" -p outpath="<path to output>" aboveAveScore.pig topNwords.pig: Loads files with a format id, {(word1, score1), . . . (wordN, scoreN)}, finds the average score for each word, and generates a relation with the top N scoring words. Output is stored as id, {(word1, score1) . . . (wordN, scoreN)} in /<outpath>/top-N-words/part-r-xxxxx To Run: pig -p files="<files to read in>" -p outpath="<path to output>" -p N="<number of top words>" topNwords.pig sumTfidfScore.pig: Joins a list of user id, word, and score with a shorter list of user id. For the subset of users on the smaller list, calculates the sum of all the scores per word. That is, if a word appears on two user list with a score of 1.2 and 1.5, the output will have that word scored with 2.7 To Run: pig -p term=freq="<path to term freq>" -p user_id="<path to user id>" -p outpath="<outpath>" sumTfidfScore.pig
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published