Joyce is a highly scalable event-driven Cloud Native Data Hub.
Ok! Wait, what? Joyce allows you to ingest data from (almost) any source and expose the ingested data as standard APIs (REST, event notification) automatically. In order to specify to Joyce which data we want to pick from the incoming data stream and how APIs will look like you need to describe the expected behaviour with a DSL based on json-schema.
From a high level perspective Joyce performs 4 tasks:
- acquire content produced from different sources.
- transform the raw content with a DSL (a
schema) - store it somewhere (to a
sink) - serve the result of this process with an automatic REST API.
Documentation is available here
cd joyce-compose
docker-compose up -dThis will startup:
- a single node
kafkainstance persisted underdatadirectory - a single node
zookeeperinstance persisted underdatadirectory - a single node
mongodbinstance persisted underdatadirectory - AKHQ to monitor kafka topics exposed at localhost:6680
- joyce-import-gateway exposing it's API at localhost:6651
- joyce-mongodb-sink to store processed content to mongodb
- joyce-api exposed at localhost:6650 to consume processed content.
First of all we have to store a schema that tells the system how to project the content we import inside Joyce.
A schema is an enhanced json-schema with keywords that tells how to transform/project a content.
For a complete documentation on schema go here
You can write a schema in json or yaml.
Let's try to save one.
cat > import-user.yaml <<- "EOF"
$schema: https://joyce.sourcesense.com/v1/schema
$metadata:
subtype: import
namespace: default
name: user
description: A test schema
development: true
uid: code
collection: users
type: object
properties:
code:
type: integer
$path: $.user_id
name:
type: string
$path: $.first_name
surname:
type: string
$path: $.last_name
full_name:
type: string
$script:
language: python
code: "'_'.join([source['first_name'].upper(), source['last_name'].upper()])"
email:
type: string
email_checks:
type: object
$rest:
url: "https://api.eva.pingutil.com/email?email={{email}}"
method: GET
headers:
Content-Type: application/json
vars:
email: "$.email"
extract: "$.data"
properties:
valid:
type: boolean
$path: $.valid_syntax
disposable:
type: boolean
spam:
type: boolean
kind:
type: string
$fixed: "SimpleUser"
EOFNow we have to save the schema to import-gateway component:
curl -X POST -H "Content-Type: application/x-yaml" --data-binary @import-user.yaml http://localhost:6651/api/schemaNow your schema is ready to be used by the api, you can check it by going to http://localhost:6651/api/schema/import/default/user.
If you go to http://localhost:6650/docs you'll see a swagger interface with no resources, that's why resource derives from schema and must be configured to be exposed.
create a file schemas.json with this content:
cat > schemas.json <<- "EOF"
{
"schemas": {
"test-users": {
"source": "http://import-gateway:6651/api/schema/import/default/user"
}
}
}
EOFEdit the docker compose to expose the file as a volume:
rest:
image: sourcesense/joyce-rest:latest
ports:
- "6650:6650"
environment:
- MONGO_URI=mongodb://user:password@mongodb:27017/joyce
+ - SCHEMAS_SOURCE=/opt/schemas.json
+ volumes:
+ - ./schemas.json:/opt/schemas.json
links:
- mongodb
- import-gatewayNow restart the api to load the schema:
docker-compose stop rest
docker-compose up -d restCheck again swagger http://localhost:6650/docs and you'll see your resource.
Now you are ready to store content to the import-gateway:
curl -0 -v "http://localhost:6651/api/import" \
-H 'Content-Type: application/json; charset=utf-8' \
-H "accept: application/json; charset=utf-8" \
-H "X-Joyce-Schema-Id: joyce://schema/import/default.user" "http://localhost:6651/api/import" \
--data-binary @- << EOF
{
"user_id": 1337,
"first_name": "Jon",
"last_name": "Snow",
"email": "[email protected]",
"state": "Westeros"
}
EOFYour content should be transformed soon and can be retrieved using the api
curl http://localhost:6650/api/test-usersIf anything goes wrong, notification of errors and success during processing are published on the joyce_notification topic on kafka, you can inspect easily by using akhq on localhost:6680.