Skip to content

Commit

Permalink
Updated folders and config files
Browse files Browse the repository at this point in the history
  • Loading branch information
ENate committed Sep 25, 2023
1 parent b55847e commit 188b86f
Show file tree
Hide file tree
Showing 42 changed files with 180 additions and 59 deletions.
10 changes: 10 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
AWS_ACCESS_KEY_ID=admin
AWS_SECRET_ACCESS_KEY=sample_key
AWS_REGION=us-east-1
AWS_BUCKET_NAME=mlflow
MYSQL_DATABASE=mlflow
MYSQL_USER=mlflow_user
MYSQL_PASSWORD=mlflow_password
MYSQL_ROOT_PASSWORD=toor
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
MLFLOW_TRACKING_URI=http://localhost:5000
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.settings
.vscode
1 change: 1 addition & 0 deletions .gitpod.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ RUN pip install --upgrade pip
RUN sudo apt-get install -y protobuf-compiler python-pil python-lxml

# Install tensorflow ranking and datasets
RUN pip install tensorflow
RUN pip install -q tensorflow-ranking && pip install -q --upgrade tensorflow-datasets
RUN pip install pip install --upgrade tensorflow-hub

Expand Down
3 changes: 1 addition & 2 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 1 addition & 18 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,7 @@
"editor.fontFamily": "'Droid Sans Mono', 'monospace'",
"terminal.integrated.letterSpacing": 1,
"workbench.colorCustomizations": {
"activityBar.activeBackground": "#64d25b",
"activityBar.activeBorder": "#0e321f",
"activityBar.background": "#64d25b",
"activityBar.foreground": "#15202b",
"activityBar.inactiveForeground": "#15202b99",
"activityBarBadge.background": "#6971d6",
"activityBarBadge.foreground": "#e7e7e7",
"sash.hoverBorder": "#64d25b",
"statusBar.background": "#41c436",
"statusBar.foreground": "#15202b",
"statusBarItem.hoverBackground": "#349c2b",
"statusBarItem.remoteBackground": "#41c436",
"statusBarItem.remoteForeground": "#15202b",
"titleBar.activeBackground": "#41c436",
"titleBar.activeForeground": "#15202b",
"titleBar.inactiveBackground": "#41c43699",
"titleBar.inactiveForeground": "#15202b99",
"commandCenter.border": "#15202b99"
"activityBar.activeBorder": "#0e321f"
},
"peacock.remoteColor": "#41c436"
}
53 changes: 34 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,30 @@
# Main Contents
### Introduction

The main parts of this repository consists of the following folders:
The repository provides a theoretical and practical guide on how to prepare environments, train and apply several machine learning models to problems in different settings. We begin with how to prepare a training environment using the most popular technology stacks. Most of the implementations and examples discussed in the repository are done with the proposed tools. However, this is not a recommendation for a particular tool but a matter of choice, convenience and performance. In specific cases, I will mention why using a particular tool may be suitable in a given scenario. For now, I will begin by listing the main tools and discuss the contents of the repository.

- `misc_folders` - containing deep neural network and epidemiological models
In details, we begin by identifying the contents of the ```misc_folders```
folders and the misc_folder consists of supporting files. The ``` misc_folder ``` contains the following files:
* A Deep neural network architecture drawing file.
* Epidemiological models to study the `n=2` strain in a given population implemented in Python. The aim is to analyze the effect of `n=2` disease strains consisting of different variations. The problem solves the case for `n=2` disease strains affecting a given population, which was submitted in partial fulfillment of the award of the Postgraduate Diploma at the African Institute for Mathematical Sciences, Capetown, South Africa.
* A python program implemented to study the simulation of a molecule in the nucleus of an atom.
### Technology Stack
- Training of the models discussed in this repository is done using
* TensorFlow
* Pytorch
* FLAX - flexible API and built on JAX
* Many python based deep learning frameworks and libraries
* Use of Other languages will be highlighted where necessary.
- Observability ([as discussed here](https://grafana.com/grafana/dashboards/16110-fastapi-observability/)
) using Grafana, Tempo, Loki and Prometheus


### Contents

Outline of the main folders contain the following

- `supervised` - containing deep neural networks models and applications
- `unsupervised` - containing models without label data
- `reinforcement` - describing implementation of models with agents
- `Quantum` - discusses concepts in quantum computing, algorithms and deep learning
### supervised
- containing deep neural networks models and applications
### unsupervised
In this repository, we discuss examples of models without output labels. In relation to this, we also present examples of problems where non classical training approaches. Note that the main different between unsupervised and supervised learning models is based on the absence of output labels associated with the data corresponding to the proble. Hence, we must distinguish the availability of output labels in the training data before proceeding with making the choice of the training algorithm.
### reinforcement
- describing implementation of models with agents
### Quantum
- discusses concepts in quantum computing, algorithms and deep learning


## Observability
Expand All @@ -25,10 +37,13 @@ We use the example from [this repo](https://github.com/blueswen/fastapi-observab
References will be made to progress in the different models implemented in the folders contained in this repository. For instance,
we will present the links to the papers, tutorials and other forms of publications associated with the topics covered in this repository.

## Tech Stack
- Training is based on using python and related packages such as
* TensorFlow
* FLAX - flexible form and built on JAX
* Transformers
- Observability ([from here](https://grafana.com/grafana/dashboards/16110-fastapi-observability/)
) using Grafana, Tempo, Loki and Prometheus
### misc_folders
- containing deep neural network and epidemiological models
In details, we begin by identifying the contents of the ```misc_folders```
folders and the misc_folder consists of supporting files. The ``` misc_folder ``` contains the following files:
* A Deep neural network architecture drawing file.
* Epidemiological models to study the `n=2` strain in a given population implemented in Python. The aim is to analyze the effect of `n=2` disease strains consisting of different variations. The problem solves the case for `n=2` disease strains affecting a given population, which was submitted in partial fulfillment of the award of the Postgraduate Diploma at the African Institute for Mathematical Sciences, Capetown, South Africa.
* A python program implemented to study the simulation of a molecule in the nucleus of an atom.


## Structure of the repository
79 changes: 79 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
version: "3.9"
services:
s3:
image: minio/minio
restart: unless-stopped
ports:
- "9000:9000"
- "9001:9001"
environment:
- MINIO_ROOT_USER=${AWS_ACCESS_KEY_ID}
- MINIO_ROOT_PASSWORD=${AWS_SECRET_ACCESS_KEY}
command: server /data --console-address ":9001"
networks:
- internal
- public
volumes:
- minio_volume:/data
db:
image: mysql/mysql-server:5.7.28
restart: unless-stopped
container_name: mlflow_db
expose:
- "3306"
environment:
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
volumes:
- db_volume:/var/lib/mysql
networks:
- internal
mlflow:
container_name: tracker_mlflow
image: tracker_ml
restart: unless-stopped
build:
context: ./Dockerfile
dockerfile: Dockerfile
ports:
- "5000:5000"
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_REGION}
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000
networks:
- public
- internal
entrypoint: Dockerfile server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ --artifacts-destination s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
depends_on:
wait-for-db:
condition: service_completed_successfully
create_s3_buckets:
image: minio/mc
depends_on:
- "s3"
entrypoint: >
/bin/sh -c "
until (/usr/bin/mc alias set minio http://s3:9000 '${AWS_ACCESS_KEY_ID}' '${AWS_SECRET_ACCESS_KEY}') do echo '...waiting...' && sleep 1; done;
/usr/bin/mc mb minio/${AWS_BUCKET_NAME};
exit 0;
"
networks:
- internal
wait-for-db:
image: atkrad/wait4x
depends_on:
- db
command: tcp db:3306 -t 90s -i 250ms
networks:
- internal
networks:
internal:
public:
driver: bridge
volumes:
db_volume:
minio_volume:
2 changes: 1 addition & 1 deletion etc/dashboards.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: 1
providers:
- name: 'FastAPI Observability'
- name: 'Application Observability'
orgId: 1
folder: ''
type: 'file'
Expand Down
1 change: 1 addition & 0 deletions infra/docker/airflow/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM airflow2/ariflow
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 14 additions & 0 deletions supervised/generative-ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Generative Adversarial Networks (GANS)

### Introduction
This supervised deep learning method is based on a generator feed forward neural network and a distributor. Formulated on ideas linked to game theory, it is meant to present to competing networks which will output a given probability using information derived from the data set.

### Features of GANs
- Two competing agents whose objectives is to work for opossing goals.
- This implies each participating agent continues to come up with strategies to decieve one another
- This method is associated with Game theoretic minimax methods.

In order to understand the foundation, implementation and application of GANs, we provide a basic desciption of the model. Then provide concrete examples on how GANs can be applied to a real life problem. Before delving into these steps, we will like to describe a simple example of a typical scenario which can be used to replicate GANs models.

### Example Description
Consider a situation involving two agents in real life: a police officer and a criminal. As stated in the example here, if the criminal is a counterfeiter, and often tries to come up with ways to evade detection, the police officer will also come up with a much better way to provide security.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
File renamed without changes.
File renamed without changes.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
29 changes: 29 additions & 0 deletions supervised/transformers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Concepts

-----------------------------------------------
We discuss the advent of transformers and their applications to training various machine learning problems. To begin, we highlight the steps and the evolution of prior deep learning architectures and their limitations in training.
### Tools and Tech Stack
- Python 3.10+
- Observation via Grafana (UI), Loki(logs), tempo(traces) and Prometheus (metrics).
- Pytorch (for some examples)
- Examples implemented using mlflow for monitoring and used in training pipelines.
- TensorFlow - a library from training machine learning models
- Flax is a flexible user experience library via JAX
### Models
The following models are implemented in this folder:

### In the begining..
As we know, deep learning neural networks are known to exhibit universal approximation abilites in predicting or classifying problems. However, their limitations in translation tasks, image processing and similar problems have been widely encountered and discussed in literature. Hence, improvements on DNNs have resulted in other types of architectures. For instance, recurrent neural networks (RNNs) -- with a special case of Long Short Term Memory networks (LSTMs), convolutional neural networks (CNNs) and more. Even these architectures have shown remarkable results in translation tasks, image processing, segmentation, speech recognition tasks, they are limited in a number of applications.
Language models represent supervised learning models used to train and develop text and document based learning.

### Life before BERT
- Bidirectional Encoder Representations from Transformers (BERT)
- Use of machine language translation
- Attention based models via the `Àttention is All you Need` paper
### BERT Model
- Based on introduction of optimal training to model

### Life After BERT
- Usage of `simplified` training architecture
- Removed bottlenecks in training
- Added more simplified attention based parts in training
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
14 changes: 0 additions & 14 deletions transformers/README.md

This file was deleted.

12 changes: 7 additions & 5 deletions unsupervised/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Contents of the folder
### Contents

--------------------------

## Algorithms
### Algorithms

* Identify different data collection techniques
* Data types, preparation and analysis.
* Apply use cases to clustering, principal component analysis (PCA) etc.
* Apply unsupervised learning algorithms to prepare different types of data sets prior to training.
* Apply use cases to clustering, principal component analysis (PCA)
* Introduce methods to deal with existing algorithms.
* Apply unsupervised learning algorithms for the preparation of various data sets in different formats
prior to training.

### Machine Learning Folder
### Main Tasks and implementation

* Contains code, examples and explanations on how to apply different types of ML methods to several problems.

0 comments on commit 188b86f

Please sign in to comment.