Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
79b33cf
Fix error catching for recursive tree
Jan 10, 2025
57bc785
fix default values
Jan 13, 2025
605ddd3
run black
Jan 13, 2025
d71418c
add tree priority
sjnarmstrong Jan 14, 2025
2d352fe
Merge pull request #15 from capitec/feature/fix-default-value
sjnarmstrong Jan 14, 2025
0bc66cf
Merge pull request #14 from capitec/feature/catch-error-recursive-tree
sjnarmstrong Jan 14, 2025
2934a2b
Merge remote-tracking branch 'origin/dev' into feature/tree-priority
sjnarmstrong Jan 14, 2025
ea4b1c6
black formatting
sjnarmstrong Jan 14, 2025
37a7d68
Merge pull request #16 from capitec/feature/tree-priority
sjnarmstrong Jan 14, 2025
3d26328
Started treelite tree
sjnarmstrong Jan 14, 2025
396a90c
Treelite seems to be working and added some tests
sjnarmstrong Jan 15, 2025
0c89c07
Initialize project using Create React App
sjnarmstrong Nov 25, 2024
839fbc1
initial commit
sjnarmstrong Nov 27, 2024
084a093
failed custom minimap extension
sjnarmstrong Nov 27, 2024
97cebf5
initial sidebar
sjnarmstrong Nov 27, 2024
453e633
added side bar
sjnarmstrong Nov 28, 2024
d6f4d48
WIP: playing around with curves and nodes
sjnarmstrong Nov 28, 2024
e29a413
Got vertical flow working
sjnarmstrong Nov 29, 2024
267c33d
changed to tree
sjnarmstrong Dec 3, 2024
55e4eb3
flattened nodes
sjnarmstrong Dec 4, 2024
8957c4b
started using shadcn
sjnarmstrong Dec 4, 2024
4121d2e
use uuid and different proxy
sjnarmstrong Dec 5, 2024
48caf24
Before changeing layout again to connections nested
sjnarmstrong Dec 11, 2024
f59002a
Changed layout again for nested conditions
sjnarmstrong Dec 11, 2024
6133165
Change to hex
sjnarmstrong Dec 11, 2024
c5f669c
Update some colors
sjnarmstrong Dec 11, 2024
a1118fe
Add buttons on nodes
sjnarmstrong Dec 12, 2024
dcc6d85
Remove buttons off nodes
sjnarmstrong Dec 12, 2024
0498449
Before move to treelite
sjnarmstrong Dec 31, 2024
90f71e8
added treelite config attempting to replace rete with reactflow
sjnarmstrong Jan 2, 2025
cc17847
Moved over to react-flow
sjnarmstrong Jan 2, 2025
a47c51b
before reworking some of the add logic
sjnarmstrong Jan 3, 2025
bad65fa
adds and reconnects working well now
sjnarmstrong Jan 3, 2025
43fd21a
all edit boxes working
sjnarmstrong Jan 3, 2025
fea59bb
removing cycles seems to be mostly working i was able to get multiple…
sjnarmstrong Jan 3, 2025
ab911e9
added format code and added more sidebar options going to do refactor…
sjnarmstrong Jan 6, 2025
538263e
got formatting working really messy still
sjnarmstrong Jan 7, 2025
5426091
added node orders. subgraphs a bit oddly linked
sjnarmstrong Jan 7, 2025
4e5dfbf
output editor looking good
sjnarmstrong Jan 7, 2025
2c935d0
added some more safety and checks
sjnarmstrong Jan 7, 2025
352a573
Better output table and nodes dont require labels to be manually updated
sjnarmstrong Jan 8, 2025
fbb1b26
implemented save
sjnarmstrong Jan 8, 2025
2018e48
Make use of file uploads rather than server
sjnarmstrong Jan 16, 2025
36c3be6
Testing with config formats
sjnarmstrong Jan 16, 2025
60c6842
Cleanup testing and backend
sjnarmstrong Jan 16, 2025
cdefd91
Removal of unneeded files for merge into main repo
sjnarmstrong Jan 16, 2025
df87306
added gitignore
sjnarmstrong Jan 16, 2025
1889381
Add note button
sjnarmstrong Jan 16, 2025
e6e760d
Add in some readmes, work on inference a bit
sjnarmstrong Jan 22, 2025
725760c
Additional changes and documentation
sjnarmstrong Feb 21, 2025
c0ec8b6
Remove some outputs
sjnarmstrong Feb 21, 2025
fd8b763
Dtree viz updates
sjnarmstrong Feb 21, 2025
a1ccf87
Doc changes
sjnarmstrong Mar 3, 2025
6d066b3
fix craco build
sjnarmstrong Apr 25, 2025
5b340e5
Add the ui dockerfile
sjnarmstrong Apr 25, 2025
ac589ce
🦠
sjnarmstrong Apr 25, 2025
8f79147
make nginx more friendly for non-root
sjnarmstrong Apr 25, 2025
fc10222
update dockerfile
sjnarmstrong Apr 25, 2025
a23e8da
fix the nginx serving
sjnarmstrong Apr 29, 2025
234a074
fist stab at range config editor
sjnarmstrong May 7, 2025
5f76eba
Enable multiple branches
sjnarmstrong May 9, 2025
78fa51c
Add default for default left
sjnarmstrong Jun 20, 2025
68f40e2
Formatting
sjnarmstrong Jun 20, 2025
054614d
new version of formatter
sjnarmstrong Jun 20, 2025
7fccbaf
add treelite
sjnarmstrong Jun 20, 2025
f7f8b75
Fix tests
sjnarmstrong Jun 20, 2025
f3366bd
formatting
sjnarmstrong Jun 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,4 @@ cython_debug/
# General
.DS_Store

.vscode
.vscode
35 changes: 35 additions & 0 deletions docker/ui.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM node:18-alpine AS builder

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

COPY src/ ./src/
COPY public/ ./public/
COPY tsconfig*.json ./*.js ./*.json ./

RUN npm run build
# This is useful for testing
# RUN npm i serve
# EXPOSE 3000
# CMD ["npx", "serve", "-s", "build"]

FROM nginx:stable-alpine

COPY --from=builder /app/build /usr/share/nginx/html

RUN addgroup -g 1000 -S appgroup && \
adduser -u 1000 -S appuser -G appgroup && \
mkdir -p /var/cache/nginx/client_temp && \
mkdir -p /tmp/nginx && \
chown -R 1000:1000 /var/cache/nginx && \
chown -R 1000:1000 /usr/share/nginx/html && \
chown -R 1000:1000 /tmp/nginx

COPY nginx.conf /etc/nginx/nginx.conf

USER 1000

EXPOSE 8000
CMD ["nginx", "-g", "daemon off;"]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 0 additions & 3 deletions docs/getting_started/contributing.md

This file was deleted.

204 changes: 204 additions & 0 deletions docs/getting_started/contributing/architecure/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
## Spockflow Architecture Documentation - Landing Page


### **Directory Structure**

To better understand Spockflow’s architecture, let’s explore the key folders and their responsibilities within the package:

---

#### **1. `spockflow/`**
This is the core of the Spockflow framework, containing all of its primary modules and components.

---

##### **core.py**
- The main entry point for Hamilton integration.
- Defines a custom decorator for injecting Spockflow logic into Hamilton’s DAG framework.
- Expands Hamilton subdags with configurable components and generates nodes from these components.
- Calls `initialize_spock_module` to inject the Spockflow functionality into a given module, allowing for the automatic generation of Hamilton nodes.

Example of Hamilton subdag integration:
```python
@subdag(
feature_modules,
inputs={"path": source("source_path")},
config={}
)
def feature_engineering(feature_df: pd.DataFrame) -> pd.DataFrame:
return feature_df
```
- In Spockflow, the `initialize_spock_module` decorator ensures that subdags are expanded and executed according to the framework's configuration.

---

##### **nodes.py**
- Contains the definition of `VariableNode`, the core class responsible for transforming configuration-driven logic into executable Hamilton nodes.
- Handles utilities such as `CloneVariableNode` (for duplicating nodes) and `AliasedVariableNode` (for renaming nodes without re-executing).
- Uses Pydantic classes to serialize and deserialize configuration, making it easier to manage node definitions and configurations.
- The `generate_nodes` function within `VariableNode` handles the actual creation of subnodes, ensuring that each node can be expanded within a Hamilton DAG.

```python
def _generate_nodes(self, ...):
...
node_functions = inspect.getmembers(
compiled_variable_node, predicate=self._does_define_node
)
...
```
This method identifies and expands functions within a module as Hamilton nodes, ensuring that subcomponents can be injected into larger data pipelines.

---

##### **_serializable.py**
- Provides utilities to help with the serialization and deserialization of data, particularly for handling Pandas DataFrames and Series.
- Ensures that data passed through Spockflow nodes can be properly transformed and maintained across different steps of the pipeline.

---

#### **2. `components/`**
Contains all the core components and decision-oriented modules in Spockflow, including:
- **Decision Trees**: Build decision trees to enforce rules for data enrichment and transformations.
- **Scorecards**: Create scoring systems for evaluating data based on multiple parameters.
- **Decision Tables**: Define mappings of input values to outputs based on set conditions.

Each of these components is built as reusable modules that can be configured and inserted into your data flows.

---

#### **3. `inference/`**
- Contains logic and tools to serve models via endpoints compatible with services like AWS SageMaker.

---


### **How to Define a Custom Node in Spockflow**

In Spockflow, custom nodes allow users to extend the framework's functionality by creating new components that integrate seamlessly into the Hamilton DAG-based architecture. A custom node is a class that inherits from `VariableNode` and defines its own behavior for node creation, input handling, and execution.

Here, we'll define a custom `Tree` node as an example of how to create a custom decision-making process using Spockflow’s infrastructure.

#### **Step 1: Define the Custom Node Class**

To create a custom node, you need to subclass `VariableNode` and define several key components, such as input fields, the `compile()` method, and custom logic for handling inputs and outputs.

```python
class Tree(VariableNode):
# This is used in visualisation by Hamilton
doc: str = "This executes a user-defined decision tree"

# Define fields using Pydantic (these can be any fields for configuration)
execution_conditions: typing.List[str]
execution_outputs: typing.List[str]

# The compile function needs to be provided. By default, it will just return self.
def compile(self):
# This step may involve transforming or processing the input data into a usable format
from .compiled import CompiledNumpyTree
return CompiledNumpyTree(self)
```

- `execution_conditions` and `execution_outputs` are lists of strings that define the conditions and outputs associated with the decision tree.
- The `compile()` function is responsible for transforming the raw input data into a format that can be used by the Hamilton DAG. In this case, it initializes a `CompiledNumpyTree`.

#### **Step 2: Define a Compiled Representation for the Node**

To optimize how the node’s logic is executed, we can define a compiled version of the node, such as `CompiledNumpyTree`. This compiled version will contain the logic to handle the execution and manage inputs dynamically.

```python
class CompiledNumpyTree:
def __init__(self, tree: Tree) -> None:
# This constructor will process and configure the tree into an executable form
self.tree = tree
# Additional processing logic for the tree can go here

def _get_inputs(self, function: typing.Callable):
# Returns the expected input types for the node
node_input_types = {o: pd.DataFrame for o in self.tree.execution_outputs}
node_input_types.update({c: typing.Union[np.ndarray, pd.Series] for c in self.tree.execution_conditions})
return node_input_types
```

- The `CompiledNumpyTree` class is responsible for transforming the raw `Tree` object into an optimized version that can be used in a Hamilton DAG.
- The `_get_inputs` function dynamically determines the input types required for this node’s execution.

#### **Step 3: Define Node Functions with `@creates_node`**

Next, we define the various operations that make up the logic of our custom `Tree` node. These operations are implemented as functions within the `Tree` class and are decorated with `@creates_node`. The `@creates_node` decorator tells Spockflow to treat these methods as subnodes within the Hamilton DAG.

```python
from spockflow.nodes import creates_node
import numpy as np
import pandas as pd

class Tree(VariableNode):
# Other fields and compile method defined previously

@creates_node(kwarg_input_generator="_get_inputs") # Generates node inputs dynamically
def format_inputs(
self, **kwargs: typing.Union[pd.DataFrame, pd.Series]
) -> TFormatData:
# Process inputs and return transformed data
pass

@creates_node() # Defines a subnode for conditions met
def conditions_met(self, format_inputs: TFormatData) -> np.ndarray:
# Logic for evaluating conditions based on inputs
pass

@creates_node() # Defines a subnode for prioritizing conditions
def prioritized_conditions(self, conditions_met: np.ndarray) -> np.ndarray:
# Logic for prioritizing conditions
pass

@creates_node() # Defines a subnode for generating condition names
def condition_names(self, format_inputs: TFormatData) -> typing.List[str]:
# Logic to generate the names of the conditions
pass

@creates_node() # Defines a subnode for the final decision logic
def all(
self,
format_inputs: TFormatData,
conditions_met: np.ndarray,
) -> pd.DataFrame:
# Logic for making a decision based on inputs and conditions
pass

@creates_node(is_namespaced=False) # This node will be created outside the namespace
def get_results(
self,
format_inputs: TFormatData,
prioritized_conditions: np.ndarray,
) -> pd.DataFrame:
# Final output of the decision tree process
pass
```

- **`@creates_node`**: This decorator defines the function as a subnode in the DAG.
- The `kwarg_input_generator="_get_inputs"` argument is used to specify how to dynamically determine the input types for this node.
- Each method, such as `format_inputs()`, `conditions_met()`, etc., corresponds to a specific operation in the decision tree.

The above tree when created as follows:

```python
# Example Tree node instance
example_tree = Tree(execution_conditions=["a", "b"], execution_outputs=["c", "d"])

```
will create the following DAG:
![](./tree.drawio.svg)

- The above relationships represent the connections between nodes, where each `@creates_node` function becomes part of the Hamilton DAG.
- The `example_tree.format_inputs` node takes inputs `a`, `b`, `c`, and `d`, and feeds them into subsequent nodes like `conditions_met`, `prioritized_conditions`, and others.

---

### **Summary of Custom Node Creation Steps**

1. **Define the Node Class**: Inherit from `VariableNode` and specify fields such as conditions and outputs.
2. **Compile the Node**: Provide a `compile()` method to transform the node into an optimized executable form (e.g., `CompiledNumpyTree`).
3. **Define Operations as Subnodes**: Use `@creates_node` to define methods that represent different parts of the decision-making process.
4. **Establish Dependencies**: The created subnodes will automatically link based on their input/output relationships, forming a complete DAG.

By following these steps, you can define complex, decision-oriented nodes in Spockflow and integrate them seamlessly into a Hamilton-based data pipeline.
Loading