Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.
This transform serves as a template for transform writers as it does not perform any transformations on the input (i.e., a no-operation transform). As such, it simply copies the input parquet files to the output directory. It shows the basics of creating a simple 1:1 table transform. It also implements a single configuration value to show how configuration of the transform is implemented.
The noop transform simply copies the input, so the output format is the same as the input.
Output column name | Data type | Description |
---|---|---|
same as input | same as input | same as input |
... | ... | ... |
The transform can be initialized with the following parameters found in NOOPTransform
Parameter | Default | Description |
---|---|---|
noop_sleep_sec |
1 | Number of seconds to sleep while inside the transform() method. This may be useful to simulate transform timeings and as a way to limit I/O bandwidth use. |
noop_pwd |
None | specifies a dummy password not included in metadata. Provided as an example of metadata that we want to not include in logging. |
When running the transform with a launcher (i.e. TransformLauncher), the above are available as command line options in addition to the options provided by the launcher.
First we need a python environment containing the Noop transform. We create the virtual environment in the project:
make venv
source venv/bin/activate
or by installing the DPK transform wheel
python -m venv venv
source venv/bin/activate
pip install data-prep-transforms
Now that we have a virtual environment containing the transform, we invoke the transform from the CLI using the runtime parameters and those from the transform itself (i.e. the table above). For example, to run the transform in the python runtime,
make venv
source venv/bin/activate
python -m dpk_noop.runtime --noop_sleep_sec 10 \
--data_local '{ "input_folder": "test-data/input", "output_folder": "output" }'
deactivate
or in the Ray runtime using a local Ray cluster,
...
python -m dpk_noop.ray.runtime --run_locally True --noop_sleep_sec 10 \
--data_local '{ "input_folder": "test-data/input", "output_folder": "output" }'
...
or in the spark runtime,
...
python -m dpk_noop.spark.runtime --noop_sleep_sec 10 \
--data_local '{ "input_folder": "test-data/input", "output_folder": "output" }'
...
ls output
To see results of the transform.
To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.