Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: Inter-Course-Phase communication concept #130

Open
Mtze opened this issue Jan 29, 2025 · 4 comments
Open

Meta: Inter-Course-Phase communication concept #130

Mtze opened this issue Jan 29, 2025 · 4 comments
Assignees
Labels
Component: Core Issues related to the prompt core enhancement New feature or request

Comments

@Mtze
Copy link
Member

Mtze commented Jan 29, 2025

Exchanging data between phases is currently limited to data stored within the cores database (course participation-metadata). This is fine for "small" data chunks related to students. In #126 we updated this model to distinguish between student accessable data and restricted data. For "simple" course phases (client only) phases this is sufficient. However, complex phases with their own microservice (maybe even their own access control mechanisms) this solution is not enough.

We envision a way that course phase implementations can directly exchange data - without an intermediate store in the core.

The course phase configuration interface already allows to see "required" and "provided" data that needs to be provided by one of the previous phases and data that may be used by a following phase respectively.

With the extension that services may provide data in their own API we need an abstraction / SDK that:

  1. Allows to specify which data is required
  2. Allows to specify which data is provided
  3. Handles dynamic data resolution
    • For data that the core can provide from its own database - the value is prefilled
    • For data that is stored in a different phase implementation, an callable API is provided
    • The SDK needs to call the respective APIs and return the required data transparently to the phase implementation
    • If data resolution fails - we need to handle that
  4. Handles API authorization by forwarding the users token to the core

Sub Problems

For this to work we need a few things

  • An API standard to specify (API Versioned)

    • what data a phase implementation requires
    • what data a phase implementation provides
    • which data is provided by the core already - "Is available"
    • which data needs to be resolved by the SDK - "needs resolving"
  • Implementations

    • Go SDK
    • Typescript SDK

Future Expansion

  • A core API that allows phases to register special keycloak roles that are required for the respective phase implementation
@Mtze Mtze added Component: Core Issues related to the prompt core enhancement New feature or request labels Jan 29, 2025
@niclasheun
Copy link
Contributor

Suggested Implementation

This document describes the current and proposed design for the interfaces between course phase modules. The goal is to establish a clear contract for data provided (outputs) and required (inputs) by each course phase, with the flexibility to resolve data either locally or remotely.

1. Current Implementation

Currently the course phase type table contains, which data is exposed by a course phase and which data is required by a course phase type (only exception is the application phase, where the data export is controlled by the application questions).
Inside the course configurator, the lecturer can connect the course phase and establish "data connections". By this, a meta_data_graph is built, which determines where a course phase gets its input data from.
The UI automatically displays, if the requirements are satisfied.

The format looks as follows:

Provided Output Metadata

Definition:

ProvidedOutputMetaData: [{ name: string, type: string }]

Example:

[
  { "name": "ApplicationScore", "type": "integer" },
  { "name": "availableDevices", "type": "string" }
]

Required Input Metadata

Definition:

RequiredInputMetaData: [{ name: string, type: string, alternativeNames?: string[] }]

Example:

[
  { "name": "ApplicationScore", "type": "integer", "alternativeNames": ["interviewScore"] }
]
The `alternativeNames` property allows handling different naming conventions for the same input type.

Data Access:
Currently, the data resolution is completely handled by the core. If a lecturer/editor requests the a course phase participations, the core resolves the meta_data graph and copies the required meta data from the previous phases into the prevData. This, however, introduces the limitation that all data, that is passed between the phases, must be stored in the core.

2. Suggested Additions

We now want to adapt the current system, to allow for dynamic resolution and data storage outside the core, which still enables later phases to access the data.

Therefore, we first need to adjust, how the meta data requirements are stored.

A. Required Input Metadata

  • Status: Remains unchanged.
  • Responsibility: The SDK will handle data resolution transparently, so course phases do not need to be aware of the data source. Hence, this should not change any behavior of incoming data.

B. Provided Output Metadata Enhancements

For the providedOutputMetaData, we now want to add the support for a course phase type, to define a resolution. The course phase type shall hence be able to specify, whether it has stored the exposed data inside the core (local) or at any other endpoint (remote).
Hence, I propose to add a new property resolution to the existing providedOutputMetaDataItems
Old Format (see above):

[ {name, type } ]

New Format:

[ {name, type, resolution } ]

Resolution Options

The new resolution field shall support two different types of resolution.

  • Local Resolution:
{ "location": "local" }

-> Data is copied from either restricted_data or student_readable_data.

  • Remote Resolution:
{ "location": "remote", "api": { "v1": "RemoteAddressV1", "v2": "RemoteAddressV2", ... } }
  • Versioning Guidelines:
    • The API version schema shall have the fixed names v1, v2, v3, ...
    • Single Version: If only one version is provided, it is used automatically.
    • Multiple Versions:
      • An APIVersion field is expected in the course phase’s restricted_data (CoursePhase is a specific implementation/usage of the course phase type. Hence it needs to specify which API version it depends on)
      • If no APIVersion is provided, default to v1.
      • This approach allows new course phases to use a specified API version while older phases continue using their current version.

2. Limitations

  • Even if different endpoints for various objects are specified, only one version number is maintained for the entire course phase.
  • Responsibility: The course phase type maintainer must ensure that API endpoints are consistent with the versioning strategy.
  • Extensibility:
    • Through the different versions it is easily possible to add or remove exposed objects. I.e. if a object is not exposed any more in a new API version, then it "api" field only contains a remote for a certain version. (i.e. it does not contain "v2").

3. Suggested API Specification

Endpoint Structure

  • URL Format:
<v1APIRoute>/<coursePhaseID>

Response Format

[
  { "coursePhaseParticipationID": "<uuid>", "objectName": "<object>" }
]

Retrieves the exposed data objects for all participants in a course phase.

Example

Course Phase Type Specification (ProvidedOutputMetaData)

[
  {
    "name": "seatAssignment",
    "type": "string",
    "resolution": {
      "location": "remote",
      "api": {
        "v1": "https://interviewComponent.de/v1/seatAssignment",
        "v2": "https://interviewComponent.de/v2/seatAssignment"
      }
    }
  },
  {
    "name": "tutor",
    "type": "string",
    "resolution": { "location": "local" }
  }
]

CoursePhase - RestrictedData (stores the current API Version)

{
  "...": "...",
  "apiVersion": "v2"
}

Core Request for CoursePhase Participations

Changes

  • currently it just return the course phase participations (already with the resolved data)
  • Now it adds a field, which returns which variables are required to be resolved externally
    • This is implemented as an extra field - bc. it is the same for all participations
{
  "participations": [
    { /* coursePhaseParticipation objects with prevData (resolved locally) */ }
  ],
  "requiresResolution": [
    { "name": "seatAssignment", "location": "https://interviewComponent.de/v1/seatAssignment" }
  ]
}

Provided Endpoint by the Course Module

  • Endpoint:
https://interviewComponent.de/v1/seatAssignment/<coursePhaseID>
  • Returned Data:
    This returns all coursePhaseParticipations that belong to the requested coursePhaseID.
[
  { "coursePhaseParticipationID": "<uuid>", "objectName": "<object>" }
]

Tasks of the SDK

The SDK has to perform the following tasks:

  • Retrieve the course participations from the core
    • If nothing to resolve -> return data
  • If data to resolve -> retrieve the data from the specified endpoints
  • Problem: CoursePhaseParticipations (from the previousPhases) must be mapped to CourseParticipations (used to identify the current courseParticipation)
    • retrieve the mapping from course_phase_participation to course_participation from the core
    • Copy retrieved data into the prevData of each coursePhaseParticipation based on the mapping retrieved from the core
    • -> return the data

Open Issues / Questions

  • Where should be the handling on which API version a course phase depends?
    • The core needs to know this, to know which data is exposed by the course phase (for correctness of the course phase editor).
    • The course type does the actual implementation. Hence does it not need a mapping between coursePhaseID and version?

@niclasheun
Copy link
Contributor

niclasheun commented Feb 2, 2025

@Mtze let us discuss this approach and next steps tomorrow in out meeting.

Additional Question:

  • Benefits of packaging resolution in SDK
    • Less trafik and responsibility in the core.
  • Benefits of adding resolution support in core
    • We remain a REST-based endpoint and a completely compatible with all programming languages.

@niclasheun
Copy link
Contributor

After re-discussing, we propose a new schema design.

Instead of the "required_input_meta_data" and "provided_output_meta_data" we rely fully of an API Spec. The Core shall be able to maintain different versions of the API Specs.

A suggested API spec could look like:

Required API Spec

openapi: 3.0.3
info:
  title: Intro Course - Required Input
  version: 1.0.0

##################################################
# 1) REQUIRED CORE DATA (JSONB)
##################################################
x-requiredCoreData:
  # We’re using JSON Schema to declare that we need "applicationScore"
  # or "interviewScore" from the core’s JSONB.
  type: object
  anyOf:
    - required: ["applicationScore"]
    - required: ["interviewScore"]
  properties:
    applicationScore:
      type: number
    interviewScore:
      type: number

##################################################
# 2) ENDPOINTS THIS PHASE EXPECTS TO CALL
##################################################
# Unlike a typical OpenAPI paths section (where we define our own endpoints),
# here we define the endpoints we EXPECT to be available from prior phases or the core.
# Note: We’re listing them under `paths` for convenience, but this is effectively 
# "what we need to call," not "what we provide."
paths:
  /developerProfiles/{coursePhaseID}:
    get:
      summary: Get all developer profiles of the students
      description: >
        This is an endpoint we expect to be available from a previous phase
        or the core system. We'll call it to retrieve developer profiles.
      parameters:
        - name: coursePhaseID
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: "Successful retrieval of profiles"
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DeveloperProfileList"

components:
  schemas:
    DeveloperProfileList:
      type: array
      items:
        $ref: "#/components/schemas/DeveloperProfile"

    DeveloperProfile:
      type: object
      properties:
        appleID:
          type: string
        gitLabID:
          type: string
      required:
        - appleID
        - gitLabID

Provided API Spec

openapi: 3.0.3
info:
  title: Intro Course - Provided Output
  version: 1.0.0
  
  
##########################################################
# 1) OUTPUT PROVIDED IN CORE (Course Phase Participation)
# -> lives in restrictedData or studentReadableData
##########################################################
x-providedCoreData:
  type: object
  properties:
    proficiencyLevel:
      type: string
    tutorPerformanceReview:
      type: object
      properties:
        timestamp:
          type: string
          format: date-time
        review:
          type: string
      required: ["timestamp"]
  required: ["proficiencyLevel"]

##################################################
# 2) ENDPOINTS PROVIDED BY THIS PHASE
##################################################
paths:
  /skillSurvey/{coursePhaseID}:
    get:
      summary: Perform an action in the Intro Course phase
      parameters:
        - name: coursePhaseID
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: "Action performed"
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/SkillSurvey"

  /health:
    get:
      summary: Health check
      responses:
        '200':
          description: "OK"

components:
  schemas:
    SkillSurvey:
      type: array
      items:
        $ref: "#/components/schemas/SkillResponse"
        
    SkillResponse:
      type: object
      properties:
        coursePhaseParticipationID:
          type: string
        skills:
          type: string



Responsibility of Core

  • In the Course Editor, the OpenAPI specs are compared and tested against each other
  • The Course Phase Restricted Data stores which API version is used for the course phase
  • new resolution Endpoint:
    • The new endpoint returns all incoming data connections (API Spec of all incoming phases with their API)
    • The SDK/Phase is then responsible of retrieving the data.

@niclasheun
Copy link
Contributor

Final Design Proposal: Course Phase DTO Integration & Metadata Graph Enhancement

After further discussion and review, we have refined our approach. This proposal introduces new database tables to better separate the concerns of course phase metadata and the DTO definitions. It also standardizes our endpoint conventions and clarifies the SDK integration strategy.


1. Database Schema Changes

1.1 Course Phase Type

The course_phase_type table is simplified to only include basic properties of the phase. The provided/required metadata is now handled in separate tables.

Schema:

Column Type Description
id UUID or Integer Unique identifier of the course phase type
name String Name of the course phase type
url String Base URL for the course phase type

1.2 Provided Output Table

This table defines which DTOs each course phase exposes. Each course phase type can declare multiple output DTOs, and each DTO definition adheres to OpenAPI conventions.

Schema:

Column Type Description
id UUID Unique identifier for the provided output definition
course_phase_type_id UUID Foreign key to the corresponding course phase type
name String Name of the DTO (e.g., Score, DeveloperProfile)
version_number Integer Version of the DTO specification
endpoint_url String The endpoint fragment for this DTO (e.g., /developerProfile/, but with the convention that corerepresents data stored in the core in restricted_data or student_readable_data)
specification JSON OpenAPI schema definition for the DTO

Example Data:

id         | course_phase_type_id  | dtoName             | version_number | endpoint_url           | specification
-------------------------------------------------------------------------------------
uuid-123   | <applicationPhaseID>  | Score            | 1              | core                   | { "type": "object", "properties": { "score": { "type": "number" } }, "required": ["score"] }
uuid-456   | <introCourseID>       | DeveloperProfile | 1              | /developerProfile/     | { "type": "object", "properties": { "profile": { "type": "object" } }, "required": ["profile"] }

Endpoint Conventions for Provided DTOs

  • List DTOs for a Course Phase:

    GET <coursePhaseTypeURL>/coursePhase/{coursePhaseID}/{definedEndpoint}/
    

    Response: An array of objects conforming to the DTO’s OpenAPI schema.
    Example Response:

    [
      {
        "coursePhaseParticipationID": "<participation-id>",
        "dtoName": { /* data matching the specified schema */ }
      },
      ...
    ]
  • Retrieve a Specific DTO for a Participation:

    GET <coursePhaseTypeURL>/coursePhase/{coursePhaseID}/{definedEndpoint}/{courseParticipationID}
    

    Response: A single object conforming to the DTO’s OpenAPI schema.
    Example Response:

    {
      "dtoName": { /* data matching the specified schema */ }
    }

1.3 Required Input Table

This table specifies the input DTOs that a course phase requires. Inputs can be simple strings or complex objects, defined according to the OpenAPI schema.

Schema:

Column Type Description
id UUID Unique identifier for the required input definition
course_phase_type_id UUID Foreign key to the course phase type that requires the input
dtoName String Name of the input (e.g., seatAssignment)
specification JSON OpenAPI schema for the required input

1.4 Metadata Graph

The metadata graph represents an N:M relationship between course phases. Because a course phase type can now export multiple DTOs, the mapping schema is extended to capture DTO-specific relationships.

Schema:

Column Type Description
from_course_phase_id UUID ID of the source course phase
from_course_phase_DTO_id UUID DTO identifier from the source phase (indicating the type)
to_course_phase_id UUID ID of the destination course phase
to_course_phase_DTO_id UUID DTO identifier from the destination phase

Note:
Since a course may contain multiple phases of the same type, both the course_phase_id and the course_phase_DTO_id are required to unambiguously identify the relationship.


2. OpenAPI Specifications for DTOs

Each DTO’s specification must adhere to OpenAPI standards. Below are examples of DTO schema definitions:

Example: Score DTO

{
  "type": "object",
  "properties": {
    "score": {
      "type": "number",
      "description": "The score achieved in this phase."
    }
  },
  "required": ["score"]
}

Example: DeveloperProfile DTO

{
  "type": "object",
  "properties": {
    "appleID": {
      "type": "string",
      "description": "The EMail used for the appleID"
    }, 
  "gitLabUserName": {
     "type": "string",
    },
  },
}

3. SDK Integration and Data Flow

3.1 Data Flow Overview

The following steps describe how data flows between the course phases and the core system, and how the SDK integrates with this process:

  1. Fetching Course Phase Participations:
    The SDK requests participations for a specific course phase from the core. The core returns a JSON payload that includes:

    • Participations: An array of coursePhaseParticipation objects, each containing preliminary data (e.g., prevData) that has already been resolved.
    • RequiresResolution: A list of DTOs that need further resolution, including the name and the URL endpoint where the SDK can retrieve the complete data. The SDK can then decide on its own, if it requires to resolve the data for none, some or all students.

    Example Response:

    {
      "participations": [
        {
          "id": "participation-uuid-1",
           /* participation-data */
          "prevData": { /* core-resolved data */ }
        },
        {
          "id": "participation-uuid-2",
          "prevData": { /* core-resolved data */ }
        }
      ],
      "requiresResolution": [
        {
          "dtoName": "seatAssignment",
          "baseURL": "https://interviewComponent.de/v1/"
          "coursePhaseID": <uuid>
        }
      ]
    }
  2. Resolving Additional DTOs:
    The SDK then uses the provided endpoints to retrieve additional data that is not stored directly in the core. This data will conform to the pre-defined DTO schemas.

  3. Mapping Participation IDs:
    We assume that each phase works on coursePhaseParticipationIDs. These IDs can be associated by the core with a courseParticipationID.
    Hence for a course phase to resolve data from a previous course phase, it has to resolve and match the coursePhaseParticipationIDs with each other by using the coruseParticipationID. The core must provide an endpoint to do so.

3.2 SDK Responsibilities

  • Data Aggregation:
    Merge core-provided participations with additional DTO data obtained via the provided endpoints.
  • ID Mapping Resolution:
    Call the mapping endpoint to correctly associate coursePhaseParticipationIDs from the previous phases with the coursePhaseParticipationIDs of the current phase.
  • Error Handling:
    Gracefully handle missing data, unresolved DTOs, or mapping failures.

4. Next Steps / Implementation Plan

TODO

@niclasheun niclasheun self-assigned this Feb 10, 2025
@niclasheun niclasheun moved this to In Progress in PROMPT Feb 10, 2025
@Mtze Mtze moved this from In Progress to In Review in PROMPT Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core Issues related to the prompt core enhancement New feature or request
Projects
Status: In Review
Development

No branches or pull requests

2 participants