Skip to content

Commit 0fb543b

Browse files
davitbzhdavitbzhbubriks
authored
[FSTORE-1008] enable interacting with java client to hopsworks (#344)
Co-authored-by: davitbzh <[email protected]> Co-authored-by: Ralf <[email protected]>
1 parent a700919 commit 0fb543b

File tree

4 files changed

+82
-13
lines changed

4 files changed

+82
-13
lines changed

docs/user_guides/client_installation/index.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,19 @@ The HSFS library is available on the Hopsworks' Maven repository. If you are usi
6969

7070
The library has different builds targeting different environments:
7171

72+
### HSFS Java
73+
74+
The `artifactId` for the HSFS Java build is `hsfs`, if you are using Maven as build tool, you can add the following dependency:
75+
76+
```
77+
<dependency>
78+
<groupId>com.logicalclocks</groupId>
79+
<artifactId>hsfs</artifactId>
80+
<version>${hsfs.version}</version>
81+
</dependency>
82+
```
83+
84+
7285
### Spark
7386

7487
The `artifactId` for the Spark build is `hsfs-spark-spark{spark.version}`, if you are using Maven as build tool, you can add the following dependency:

docs/user_guides/fs/compute_engines.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ In order to execute a feature pipeline to write to the Feature Store, as well as
44
Hopsworks Feature Store APIs are built around dataframes, that means feature data is inserted into the Feature Store from a Dataframe and likewise when reading data from the Feature Store, it is returned
55
as a Dataframe.
66

7-
As such, Hopsworks supports three computational engines:
7+
As such, Hopsworks supports five computational engines:
88

99
1. [Apache Spark](https://spark.apache.org): Spark Dataframes and Spark Structured Streaming Dataframes are supported, both from Python environments (PySpark) and from Scala environments.
1010
2. [Python](https://www.python.org/): For pure Python environments without dependencies on Spark, Hopsworks supports [Pandas Dataframes](https://pandas.pydata.org/) and [Polars Dataframes](https://pola.rs/).
1111
3. [Apache Flink](https://flink.apache.org): Flink Data Streams are currently supported as an experimental feature from Java/Scala environments.
12-
3. [Apache Beam](https://beam.apache.org/) *experimental*: Beam Data Streams are currently supported as an experimental feature from Java/Scala environments.
12+
4. [Apache Beam](https://beam.apache.org/) *experimental*: Beam Data Streams are currently supported as an experimental feature from Java/Scala environments.
13+
5. [Java](https://www.java.com): For pure Java environments without dependencies on Spark, Hopsworks supports writing using List of POJO Objects.
1314

1415
Hopsworks supports running [compute on the platform itself](../../concepts/dev/inside.md) in the form of [Jobs](../projects/jobs/pyspark_job.md) or in [Jupyter Notebooks](../projects/jupyter/python_notebook.md).
1516
Alternatively, you can also connect to Hopsworks using Python or Spark from [external environments](../../concepts/dev/outside.md), given that there is network connectivity.
@@ -18,17 +19,16 @@ Alternatively, you can also connect to Hopsworks using Python or Spark from [ext
1819

1920
Hopsworks is aiming to provide functional parity between the computational engines, however, there are certain Hopsworks functionalities which are exclusive to the engines.
2021

21-
| Functionality | Method | Spark | Python | Flink | Beam | Comment |
22-
| ----------------------------------------------------------------- | ------ | ----- | ------ | ------ | ------ | ------- |
23-
| Feature Group Creation from dataframes | [`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) | :white_check_mark: | :white_check_mark: | - | - | Currently Flink/Beam doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam.|
24-
| Training Dataset Creation from dataframes | [`TrainingDataset.save()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) | :white_check_mark: | - | - | - | Functionality was deprecated in version 3.0 |
25-
| Data validation using Great Expectations for streaming dataframes | [`FeatureGroup.validate()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - | `insert_stream` does not perform any data validation even when a expectation suite is attached. |
26-
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
27-
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
28-
| Reading from Streaming Storage Connectors | [`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) | :white_check_mark: | - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam only write operations are supported |
29-
| Reading training data from external storage other than S3 | [`FeatureView.get_training_data()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) | :white_check_mark: | - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30-
| Reading External Feature Groups into Dataframe | [`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) | :white_check_mark: | - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31-
| Read Queries containing External Feature Groups into Dataframe | [`Query.read()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) | :white_check_mark: | - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
22+
| Functionality | Method | Spark | Python | Flink | Beam | Java | Comment |
23+
| ----------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ------------------ | ---------------------- | ------------------ | ------------------ |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
24+
| Feature Group Creation from dataframes | [`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) | :white_check_mark: | :white_check_mark: | - | - | - | Currently Flink/Beam/Java doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam. |
25+
| Training Dataset Creation from dataframes | [`TrainingDataset.save()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) | :white_check_mark: | - | - | - | - | Functionality was deprecated in version 3.0 |
26+
| Data validation using Great Expectations for streaming dataframes | [`FeatureGroup.validate()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) <br/> [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - | - | `insert_stream` does not perform any data validation even when a expectation suite is attached. |
27+
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
28+
| Reading from Streaming Storage Connectors | [`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) | :white_check_mark: | - | - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam/Java only write operations are supported |
29+
| Reading training data from external storage other than S3 | [`FeatureView.get_training_data()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) | :white_check_mark: | - | - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30+
| Reading External Feature Groups into Dataframe | [`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) | :white_check_mark: | - | - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31+
| Read Queries containing External Feature Groups into Dataframe | [`Query.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) | :white_check_mark: | - | - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
3232

3333
## Python
3434

@@ -77,3 +77,7 @@ Apache Beam integration with Hopsworks feature store was only tested using Dataf
7777

7878
For more details head over to the [Getting Started Guide](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/java/beam).
7979

80+
## Java
81+
It is also possible to interact to Hopsworks feature store using pure Java environments without dependencies on Spark, Flink or Beam.
82+
83+
For more details head over to the [Getting Started Guide](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/java).

docs/user_guides/integrations/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Hopsworks is an open platform aiming to be accessible from a variety of tools. Learn in this section how to connect to Hopsworks from
44

55
- [Python, AWS SageMaker, Google Colab, Kubeflow](python)
6+
- [Java](java)
67
- [Databricks](databricks/networking)
78
- [AWS EMR](emr/emr_configuration)
89
- [Azure HDInsight](hdinsight)

docs/user_guides/integrations/java.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
description: Documentation on how to connect to Hopsworks from a Java client.
3+
---
4+
5+
# Java client
6+
7+
This guide explains step by step how to connect to Hopsworks from a Java client.
8+
9+
10+
## Generate an API key
11+
12+
For instructions on how to generate an API key follow this [user guide](../projects/api_key/create_api_key.md). For the Java client to work correctly make sure you add the following scopes to your API key:
13+
14+
1. featurestore
15+
2. project
16+
3. job
17+
4. kafka
18+
19+
## Connecting to the Feature Store
20+
21+
You are now ready to connect to the Hopsworks Feature Store from a Java client:
22+
23+
```Java
24+
//Import necessary classes
25+
import com.logicalclocks.hsfs.FeatureStore;
26+
import com.logicalclocks.hsfs.FeatureView;
27+
import com.logicalclocks.hsfs.HopsworksConnection;
28+
29+
//Establish connection with Hopsworks.
30+
HopsworksConnection hopsworksConnection = HopsworksConnection.builder()
31+
.host("my_instance") // DNS of your Feature Store instance
32+
.port(443) // Port to reach your Hopsworks instance, defaults to 443
33+
.project("my_project") // Name of your Hopsworks Feature Store project
34+
.apiKeyValue("api_key") // The API key to authenticate with the feature store
35+
.hostnameVerification(false) // Disable for self-signed certificates
36+
.build();
37+
38+
//get feature store handle
39+
FeatureStore fs = hopsworksConnection.getFeatureStore();
40+
41+
//get feature view handle
42+
FeatureView fv = fs.getFeatureView(fvName, fvVersion);
43+
44+
// get feature vector
45+
List<Object> singleVector = fv.getFeatureVector(new HashMap<String, Object>() {{
46+
put("id", 100);
47+
}});
48+
```
49+
50+
## Next Steps
51+
For more information how to interact from Java client with the Hopsworks Feature store follow this [tutorial](https://github.com/logicalclocks/hopsworks-tutorials/tree/java_engine/java).

0 commit comments

Comments
 (0)