Skip to content

ksafford/rsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

aboveAveScore.pig:

 Loads files with a format id, {(word1, score1), . . . (wordN, scoreN)}, finds the average score for each word,
 and generates a relation with only the words with above average scores for each id.
 Output is stored as id, {(id, word1) . . . (id, wordN)}
 in /<outpath>/above_ave_score/part-r-xxxxx

 To Run:
 pig -p files="<files to read in>" -p outpath="<path to output>" aboveAveScore.pig

topNwords.pig:

 Loads files with a format id, {(word1, score1), . . . (wordN, scoreN)}, finds the
 average score for each word, and generates a relation with the top N scoring words.
 Output is stored as id, {(word1, score1) . . . (wordN, scoreN)}
 in /<outpath>/top-N-words/part-r-xxxxx

 To Run:
 pig -p files="<files to read in>" -p outpath="<path to output>" -p N="<number of top words>" topNwords.pig


sumTfidfScore.pig:

 Joins a list of user id, word, and score with a shorter list of user id.
 For the subset of users on the smaller list, calculates the
 sum of all the scores per word. That is, if a word appears on two user
 list with a score of 1.2 and 1.5, the output will have that word scored
 with 2.7

 To Run:
 pig -p term=freq="<path to term freq>" -p user_id="<path to user id>" -p outpath="<outpath>" sumTfidfScore.pig

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published