In order to learn more about microservices, I figured...
What better way than with a Mike Rowe Service?
🙊😜
A demo/sample project using Spring Boot, Kafka, REST, SQL, NoSQL, GraphQL, Gradle multimodule builds, etc., in a microservice architecture.
This is not affiliated with Mike Rowe, Dirty Jobs, 'The Way I Heard It', or any other Mike-Rowe-based entity/product in ANY way.
- 🤓 This repo and its contents exist solely for the purpose of 🔬🧠 learning/playing with a few technologies and concepts (in no particular order):
- Microservice Architecture (see: Architecture)
- Spring Boot (throughout)
- REST APIs (see: Client API)
- SQL with Spring Boot JPA (details TBD, maybe Postgres? Might also remove this module.)
- NoSQL (in this case, MongoDB)
- GraphQL (see: GraphQL Adapter)
- Elasticsearch (see: Transcript Service)
- Kafka (Streams) (see: Sentiment Analysis Stream Processor)
- a monorepo
- gradle multimodule builds, and more.
- 🚧 This is nowhere near complete – not even in the "do an end-to-end 'hello world' test" sense.
- 🤨 There is a disconnect between current code structure and the diagram below. Very much a work-in-progress!
- 🐣 There is no plan to fully implement each of the modules/services.
- 🥸 Data will be canned and mocked in most cases.
- 🐳 For now, this uses
docker-compose
- There is a tentative future plan to deploy this with K8s since I'd like to get more hands-on with that too, but no ETA on that.
- 🚨 This was created to learn about these technologies/concepts.
⚠️ DO NOT consider this as a reference for how to do anything the "right" way!⚠️ - 🙈 Apologies in advance for the lack of tests and javadocs, I plan to add both. The plan was to get an outline with stubbed modules in place first.
- 👋 Please don't hesitate to reach out with suggestions etc.
Essentially, having lacked any production microservice experience prior to this repo's inception, I wanted a "playground" in which to explore/learn a bunch of concepts and technologies.
This is my "thinking out loud" diagram for what this could look like.
This is NOT FINAL, nor is it a reflection of the code as it currently exists.
There are several notes embedded in the diagram which describe nuances, considerations, or theoretical "what if" ideas.
All timestamps are either unix epoch millis (converter) (as stated by variable/field name) or ISO-8601 (ex: 2012-04-23T18:25:43.511Z).
These are subject to change, this is a preliminary structure based on some early plans and has yet to be adapted to a (slightly) more well-considered design.
├── README.md
├── build.gradle
├── client-api
├── doc
│ └── architecture-summary.png
├── docker
│ └── mapped-volumes
├── docker-compose.yml
├── libs
│ ├── common
│ ├── db-adapter (removed)
│ ├── graphql-adapter
│ ├── kafka-adapter
│ ├── model
│ └── mongo-adapter
├── services
│ ├── imdb-service
│ ├── notification-service
│ ├── podcast-service
│ ├── transcript-service
│ └── twitter-service
├── settings.gradle
└── stream-processors
└── sentiment-processor
There is a client API that is the "main" REST API for accessing this Mike Rowe "Service" backend.
The exact endpoints are TBD for now, but some ideas include:
- Get the link to the latest podcast that mentions
<topic>
. - Retrieve the last tweet from Mike that has more than
<#>
likes. - Get the highest rated episode of Dirty Jobs since
<date>
.
When deployed locally, the endpoints will be exposed at: localhost:8080/api/query.
☝️ TIP: You can discover the defined endpoints with the following command:
egrep -R --include "*.java" '@RequestMapping|@GetMapping|@PostMapping|@PutMapping|@PatchMapping|@DeleteMapping'
This library contains common utilities/tools that are shared between modules. These can be included with:
api project(":common")
This library contains POJO structures that serve as a common data model between services.
api project(":model")
The data model uses code generation courtesy of the Netflix DGS CodeGen
plugin for GraphQL. This lets us leverage the GraphQL style of schema definition that is widely used and flexible and
allows for the common use of POJOs between Kafka, GraphQL, and other modules.
This includes the Data Model
library.
api project(":mongo-adapter")
An API wrapper around a MongoDB to make it easier for services to use MongoDB without implementing a database instance. This would also make it much easier to replace Mongo with an alternative solution at some point.
This includes the Data Model
and Common
libraries.
Services can include this with:
api project(":kafka-adapter")
This is a library that makes it exceedingly easy to integrate with Kafka or Kafka Streams as a publisher or receiver of Kafka data.
In order to leverage the common kafka configuration, topics list, etc., this annotation must be present on the
class that is annotated with @Configuration
:
@PropertySource("classpath:kafka-application.properties")
Kafka topics are defined centrally in libs/kafka-adapter/src/main/resources/kafka-application.properties
. For example:
#
# Topics
#
kafka.topic.notification=notification
kafka.topic.transcript=transcript
kafka.topic.sentiment=sentiment
kafka.topic.media=media
kafka.topic.news=news
kafka.topic.social=social
kafka.topic.company=company
kafka.topic.person=person
kafka.topic.subject=subject
This includes the Kafka Adapter
library (which, in turn, includes the Data Model
and Common
libraries).
api project(":graphql-adapter")
There are numerous resources for GraphQL with Java and Spring Boot. It seems as though I approach this at an inflection point between pre-official-Spring-Boot-GraphQL and post. Right now, it requires a pre-release version of Spring Boot to use. At the time of writing, I have opted for the "old" approach.
Note that the Data Model
module houses the POJOs generated by the DGS CodeGen plugin.
This project does leverage the Netflix DGS GraphQL library, there is a great Getting Started Guide on their site.
The official GraphQL Schema resources are really useful.
The GraphQL Schema is defined in: libs/model/src/main/resources/schema/schema.graphqls
. This schema defines the
data structures and query API for the GraphQL functionality.
As a byproduct of generating the POJOs in the data model while the GraphQL logic lives in the GraphQL module, the auto-generated and example code other than the data model POJOs may require manual manipulation/relocation and therefore isn't as streamlined as part of the build process. Thankfully, that logic rarely changes.
For example, the query APIs look like this:
type Query {
latestPodcastMentioningTopic(topic: String!): PodcastEpisode
mostPopularPodcastTopics(numMostPopular: Int): [Topic!]!
podcastTranscriptByEpisodeNumber(episodeNumber: Int!): Transcript
podcastByEpisodeNumber(episodeNumber: Int!): PodcastEpisode
televisionTranscript(showName: String!, seasonNumber: Int!, episodeNumber: Int!): Transcript
televisionEpisode(showName: String!, seasonNumber: Int!, episodeNumber: Int!): TelevisionEpisode
mostPopularTelevisionEpisode(showName: String!, seasonNumber: Int): TelevisionEpisode
mostPopularTweetSince(numDays: Int): SocialMediaPost
mostRecentTweetWithNumLikes(numLikes: Int): SocialMediaPost
mostPopularSocialMediaPostSince(numDays: Int): SocialMediaPost
mostPopularMovies(numMovies: Int): [Movie!]
}
There are 3 primary choices, and things are especially confusing with Spring Boot's GraphQL starter being in a pre-release phase at the time of writing. I won't summarize it here, but there are some great resources to cover the library options – I especially liked this one by Soham Dasgupta. This one from codingconcepts.com is also helpful.
- graphql-java (How to GraphQL)
- graphql-java-kickstart (GraphQL Java Kickstart Guide)
- netflix-graphql-dgs (Getting Started)
- Here is some additional information on using DGS GraphGL + Spring Boot.
- DGS CodeGen plugin is used to enable schema-first.
I have decided to take a "schema-first" approach, which means defining the .graphqls
first and then generating POJOs
from this schema.
For POJO generation, this project uses the DGS CodeGen Plugin. There is a guide for Getting Started with DGS CodeGen that is worth review too.
These are services which process incoming data and requests in various ways in order to produce results, filter data, generate transcripts, and more.
Retrieves data about TV and movies.
This was called the "IMDB" service, but is likely going to use more "open" alternatives like The Movie Database (TMDB) or Open Movie Database (OMDB).
Based on the current rules/configuration, publish notifications for certain events.
For example, send a push notification or SMS each time a podcast episode is posted or each time any of Mike Rowe's content mentions the state of Colorado.
Right now, this is a placeholder. Eventually this could just be the service interface for several services behind a curtain, things like Twilio (SMS), Email (... maybe?), IFTTT, SimplePush, AWS SNS, and others.
Retrieves episodes of Mike Rowe's podcast The Way I Heard It.
Uses RSS processing with the Feed Adapter Spring Integration (which uses ROME under the hood).
Feeds media to the Transcript Service
where a transcript does not yet exist in the cache.
Brainstorm: extract topics, people, etc. and queue up OSINT "jobs" on each.
Retrieves existing transcripts or fetches them using a service like Descript. Note that right now, this just generates mock data in lieu of actually integrating with Descript or similar.
Uses ElasticSearch to store transcripts.
To start a simple ElasticSearch container (stand-alone, independent of this project):
docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.6.2
This ia a service which leverages the Twitter4J library to monitor Mike Rowe's Twitter account as well as topics related to Mike Rowe. Data is ingested and published to a Kafka topic.
Monitored topics can be configured in the properties file for this service.
Stream processors use Kafka Streams to process data.
Placeholder/future.
The general idea is creating a Kafka Stream processor that takes content that mentions Mike Rowe and generate a sentiment score.
The "TODO" list here is endless, but a few focal points include:
- Create the Kafka adapter and remove redundant code from various modules.
- Figure out how to share
properties
files, right now each module hasapplication.properties
of its own.- Lots of resources for this one, including:
- Testcontainers setup (example)
- Use Secrets to capture things like the Twitter API tokens.