Skip to content

Use Vagrant to provision a VM with Apache Spark, Python and key components from the Elastic Stack.

Notifications You must be signed in to change notification settings

kyle-jarvis/vagrant-elk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vagrant-elk version1.0.0

A VM configured with Vagrant for exploring Spark, ElasticSearch and Kibana.

1. Vagrant-up and running.

Vagrant getting started.

With vagrant installed, clone this repository and initialise and provision a VM by running: vagrant up

2. Provisioning.

ElasticSearch

Running on port 9200. Query the status of the ElasticCluster by e.g.

curl -XGET 127.0.0.1:9200/_cat/health?v

from the guest machine.

Kibana

Running on port 5601, forwarded to 5602 on the host machine. Find Kibana UI at 127.0.0.1:5602 through host machine's browser.

Spark

Spark 2.4.6 is installed during the provisioning process. During installation the synced folder (this projects root, guest machines' /vagrant/) is first checked for spark-2.4.6-bin-hadoop.2.7.tgz, downloadable from here. If the file is not found a default mirror is used to wget get the file, sometimes this is slow, so it's recommended to manually download and include this file.

An additional jar file is downloaded as part of the provisioning to facilitate ES integration with hadoop (Spark, for the purposes of this repository).

3. Demo scripts

Included in this repository are a number of scripts demonstrating some of the functionality of the VM. As part of the provisioning process, these scripts are installed into a virtual environment, and made accessible through a common entry point accessed as shown below:

cd /vagrant/examples

pipenv shell

Run the command:

spark-es-demo --help

You'll see there are a few options that you can try:

(1) spark-es-demo make-spark-dataframe

(2) spark-es-demo spark-to-es

(3) spark-es-demo search-es-index

The above commands will illustrate how to create a Spark DataFrame using the pyspark API, index the contents of a Spark DataFrame directly into an ES cluster, and finally, how to query an ES cluster using the requests lib.

Running (3) with the --match-all flag will submit a match-all search to a specified ES index, otherwise the user will be propted for input to compose a basic match query.

About

Use Vagrant to provision a VM with Apache Spark, Python and key components from the Elastic Stack.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages