big-data-put

Preparing

Run cluster

gcloud dataproc clusters create ${CLUSTER_NAME} --enable-component-gateway --region ${REGION} --master-machine-type n1-standard-2 --master-boot-disk-size 50 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 50 --image-version 2.1-debian11 --optional-components ZEPPELIN --project ${PROJECT_ID} --max-age=3h

Delete cluster

gcloud dataproc clusters delete ${CLUSTER_NAME} --region ${REGION} --project ${PROJECT_ID}-type n1-standard-2 --master-boot-disk-size 50 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 50 --image-version 2.1-debian11 --option

Export the bucket name

export BUCKET_NAME=YOUR_BUCKET_NAME

Run MapReduce

mapred streaming -files mapper.py,combiner.py,reducer.py -input gs://${BUCKET_NAME}/projekt1/input/datasource1 -mapper  mapper.py -combiner combiner.py -reducer reducer.py -output gs://${BUCKET_NAME}/projekt1/mr_output

Run Hive

beeline -u jdbc:hive2://localhost:10000/default \
      -f hive.hql \
      --hivevar input_dir4=gs://${BUCKET_NAME}/projekt1/input/datasource4 \
      --hivevar input_dir3=gs://${BUCKET_NAME}/projekt1/mr_output \
      --hivevar output_dir6=gs://${BUCKET_NAME}/projekt1/hive_output

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
input		input
output		output
projekt2		projekt2
scripts		scripts
README.md		README.md
combiner.py		combiner.py
hive.hql		hive.hql
mapper.py		mapper.py
projekt1.py		projekt1.py
reducer.py		reducer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

big-data-put

Preparing

Run cluster

Delete cluster

Run MapReduce

Run Hive

About

Uh oh!

Releases

Packages

Uh oh!

Languages

TMichaelan/big-data-put

Folders and files

Latest commit

History

Repository files navigation

big-data-put

Preparing

Run cluster

Delete cluster

Run MapReduce

Run Hive

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages