Skip to content

rishanki/correlation-matrix_Pyspark_RDD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

correlation-matrix_Pyspark_RDD

The following code will help you in generating a correlation matrix for your N no of variables in the pyspark environment. You just have to fill the mandatory settings-

#INPUT-TABLE-DETAILS: #Give the name of the input schema ( if you have any else you can remove this line and make changes to the code) input_table_schema="XXXXX" #Give the name of input table input_table_name=" XXXXX" #Give the name of the target target_name='target'

#CHECK FOR CORRELATION MATRIX #Give a list of variables Var= ['entity','region_name','state','cost_amt','cdv', 'offer_play_mix_name','target']

Note: you can simply give the path of your data if you dont have it on a server. df=spark.sql("select * from " +input_table_schema+"."+input_table_name) Its a spark sql initialisation line. You can change it according to your database pathway.

Rest of the code will run as it is producing the matrix.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published