Skip to content
This repository was archived by the owner on Sep 26, 2019. It is now read-only.
yosukefk edited this page Jan 12, 2019 · 6 revisions

Welcome to the finn_preproc wiki!

I am dumping here whatever note for using docker.

docker run and docker exec

docker concepts

User may want to spend half an hour or so to familiarize yourself with docker design. I suggest getting started page of docker website ( [TODO] link here ). Here is what I think bare minimum things users would like to know.

  • docker image
    You either download or build into your local machine (for finn preprocessor, you build this by docker build). This becomes template for docker container. In finn preprocessor, this has instruction to get environment right (written in Dockerfile) and code for processing (mostly in code_anaconda). No data or database is included.

  • docker container
    [TODO]

  • run, start, stop, exec, delete
    run to create container, you can start/stop. use exec to interact with it. delete to ditch the container (maybe image can be deleted as well?) [TODO]

  • Kitematic GUI tool
    Nice, kind of intuitive version of docker ps, docker inspect, but not sure if they are as powerful. [TODO]

Storage

See wikipage specific to how to manage storage as well.

-v (--volume) option

-v option allows docker to use a directory in host machine as if it was part of docker container. At this point, finn preprocessor uses --volume option to access code to run finn, and you cannot work around it (run preprocessor without -v working) for now. This I found tricky to get it right, at least on Windows.

If you are using "docker toolbox" (meant for older OS), you may have to use somewhere under C:/Users. No other drives mounted on the system can be used. If you use "docker desktop", you have to explicitly allow drives (C: etc) that can be shared with docker. On Linux, Max didn't say anything particular, so I guess it can be anywhere in the system, as long as the user has read/write/exec permission? ( [TODO] ask Max)

Where intermediate data goes

MODIS land cover data (lct and vcf) are downloaded as hdf file, preprocessed into TIF format then imported into database. They exists in subdirectories of code_anaconda (downloads for the hdf files, proc_XXX for intermediate files). They can be deleted when they are loaded into database.

Database itself is hidden somewhere as part of docker container (instance of docker image). Inside the docker container, it is located as /var/lib/postgres/*. It is kind of recommended to use --volume for this directory, so that the data persists independent of docker. But as of 2018-12-30 we could not let it to work, using docker on Windows. Will continue work on how to let it work ( [TODO] ) ( [TODO] refer to the developer of kartoza/postgis and Alex Urquhart)

The reason we want to have /var/lib/postgres/* to exists on host's filesytem, not inside docker container is that when something goes wrong with container, or the computer, and want to start over but pick up from where you left, I think I should be able to fire up the database as long as we stick with same version of PostgreSQL ( [TODO] confirm this). So for now, alternative approach would be to manually dump the /var/lib/postgres/*, create new container and see if i can use it to bootleg the finn preprocessor ( [TODO] check if this works).

Interactive use

run arbitrary system command

From jupyter notebook, run by !echo hello to run system command echo with argument hello.

bash (Interactive use)

If you need to have command prompt (for debugging etc) and one line !cmd from jupyter is not enough, do follows.

docker -it -e DEBIAN_FRONTEND=dialog finn /bin/bash

This runs a command /bin/bash with the Docker container finn. So it is like shooting off another terminal from the same machine that gives you Jupyter Notebook interface for normal use of FINN preprocessor. You can go behind it and see what's going on, provided that you are familiar with Linux. DEBIAN_FRONTEND=dialog allows you to make the shell to respond to control character as expected (e.g. ctrl-Z to suspend)

psql

you could install postgreSQL in host computer, and run psql with appropriate host (either localhost, or docker assigned machine ip) and port (5432). username is docker, password is docker. database is finn.

Alternatively, start bash terminal from the docker container (see the item above) and then use psql in there. psql -U docker -W -d finn -h localhost -p 5432 is the command to get psql terminal. You may have to su postgresql first, because your default login to bash is root.

QGIS (or ArcMap)

To visualize content of PostGIS, it is highly recommended to install QGIS on host computer. You can get it from QGIS website. I found Alex Urquhart's blog very easy to follow. There is also README from kartoza/postgis, on which the finn preprocessor docker is developed.

ArcMap may work with PostGIS database, but I haven't tried seriously. It is probably harder than QGIS which works almost out of box.

docker random tips

  • What is a container made of?
    When you forgot what a container was doing, use docker inspect <container_name> or docker container inspect <container_name>. You can find what volume used, what environment set etc.

  • Docker doesn't start!!
    There can be a couple of reasons.

    • First, the docker in general starts slow. Wait long enough, may be 10 min., before you try fix problem.
    • If you have changed password for your machine, it may be that docker remember the password you used earlier and Docker still remembers. In Docker Desktop there is factory reset kind of button. Try pressing the button. (I am not sure if it is going to wipe the containers/volumes. I will test)
    • Docker Toolbox for Windows runs on top of Oracle VM. If VM's storage got full (it is preset), VM won't start and Docker cannot. I will look for way to work around also a way to prevent.
  • Do I really need docker?
    Docker makes easier to set up environment (postgreSQL with right option and plugins, python with necessary libraries etc). So it's great if it's working. But it does add extra layer to worry about. We have run code without Docker on Linux and on Windows. So getting PostgreSQL for the OS you are using, plus Anaconda python with necessary library may let the code to work (I will test at some point, and get rid of system specific commands)