Throughout this lecture, we will make use of Jupyter notebooks to explain and illustrate underlying concepts. Some of these notebooks are also required for the assignment sheets.
In order to execute the notebooks, we basically only need a Python
installation and a couple of Python packages. In addition, a few of the notebooks require third-party software like PostgreSQL
or Neo4j
. Therefore, we provide you with a virtual machine that already has everything installed.
Unfortunately, some of you encounter problems regarding the installation and/or use of the virtual machine.
However, you do not necessarily need the virtual machine! For the assignment sheets, you will only need Python and the corresponding Python packages (no third-party software). Thus, if you are unable to use the virtual machine for whatever reason, you can also manually install the required software. In the following, we will briefly show you how this can be done (the shell commands assume macOS
as the operating system).
-
Install Python3 for your operating system. On
macOS
, we recommend the use of Homebrew. The virtual machine usesPython 3.10.12
. To be on the safe side, we also recommend the use ofPython 3.10.
Apparently,Python 3.11
has a problem with the installation of at least one required package:$ brew install [email protected]
-
The bigdataengineering repository contains the notebook required for the lecture. Clone the repository to a folder of your choice on your machine:
$ git clone --recursive https://github.com/BigDataAnalyticsGroup/bigdataengineering.git
-
Afterward, go into the
bigdataengineering
folder:$ cd bigdataengineering
-
Create a virtual environment. A guide for the creation of virtual environment (on different platforms) can be found here:
$ python3.10 -m venv .
-
Once the virtual environment is created, it can be activated using the following commmand:
$ source bin/activate
Your terminal should now indicate that the virtual environment is activated, i.e., by the prefix
(notebooks)
. -
Upgrade the package installer pip for Python and install the Python packages:
(notebooks) $ pip install --upgrade pip (notebooks) $ pip install -r requirements.txt
Now you should have all the Python packages installed and be ready to use the Jupyter notebooks.
-
Start the Jupyter server. Again, always make sure to activate the virtual environment before using Jupyter, otherwise it does not work:
(notebooks) $ jupyter notebook
This should automatically open Jupyter in your browser. If not, copy the shown URL and paste it in your browser.
-
After you have finished working on the notebooks, you can stop the Jupyter server by pressing
Ctrl-C
in your terminal and confirming withy
andEnter
(or by pressingCtrl-C
two times). Afterwards, the virtual environment can be deactivated as follows:(notebooks) $ deactivate
Note that after you have successfully installed all required Python packages for the first time, your workflow to work on the notebooks only contains steps 3, 5, 7, and 8, i.e., you navigate to the bigdataengineering
folder, activate the virtual environment, start the Jupyter server, work on the notebooks, stop the Jupyter server, and deactivate the virtual environment again.