Skip to content

Unnecessary JOIN added when using FILTER_PARAMS in a segment #9941

@Spark-NF

Description

@Spark-NF

Describe the bug

If, immediately after starting Cube, I run a query that triggers a JOIN and uses a segment with a FILTER_PARAMS, future queries using that segment will also have that JOIN even if they don't need it.

If I restart Cube then immediately perform a query with that segment that does not need a JOIN, future queries using that segment will behave correctly (those that need a JOIN have it, those that don't need one don't).

Note:

  • If the query uses filters instead of a segment, it seem to work as expected.
  • If the segment does not use FILTER_PARAMS, it works as expected.

To Reproduce

Note

Make sure to run the queries quickly after starting Cube, as the queries triggered in the background by the scheduler can influence the results.

  1. Start Cube.js
  2. Send a first query:
    {
      "dimensions": ["Tickets.ticketId", "Messages.ticketId"],
      "segments": ["Tickets.closedTickets"],
      "measures": ["Tickets.ticketCount"]
    }
  3. There is a LEFT JOIN between Tickets and Messages as expected.
  4. Send a second query:
    {
      "dimensions": ["Tickets.ticketId"],
      "segments": ["Tickets.closedTickets"],
      "measures": ["Tickets.ticketCount"]
    }
  5. There is still get a LEFT JOIN which is unnecessary and gives worse performance.

Expected behavior

The second query should not have the LEFT JOIN, which is the case if the queries are run in the opposite order.

Minimally reproducible Cube Schema

Note

A full reproducible example repository can be found here: https://github.com/Spark-NF/cubejs-additional-join
It contains the reproduction steps, a docker compose file, this cube config, as well as the outputs of the queries.

cube("Tickets", {
    sql: `
        SELECT
            ticket_id AS ticketId,
            max(ticket_closed_datetime) AS closedDatetime,
        FROM data_tickets
        GROUP BY ticketId
    `,
    segments: {
        closedTickets: {
            sql: `${FILTER_PARAMS.Tickets.closedDatetime.filter(`closedDatetime`)}`,
        },
    },
    measures: {
        ticketCount: {
            type: "count",
            sql: `${CUBE}.ticketId`,
        },
    },
    dimensions: {
        ticketId: {
            sql: `${CUBE}.ticketId`,
            type: "number",
            primaryKey: true,
            shown: true,
        },
    },
    joins: {
        Messages: {
            relationship: "one_to_one",
            sql: `${CUBE}.ticketId = ${Messages}.ticketId`,
        },
    },
});

cube("Messages", {
    sql: `
        SELECT
            ticket_id AS ticketId,
            count(ticket_message_id) AS messagesCount,
        FROM data_messages
        GROUP BY ticketId
    `,
    dimensions: {
        ticketId: {
            sql: `${CUBE}.ticketId`,
            type: "number",
            primaryKey: true,
            shown: true,
        },
    },
});

Version:

Starting 0.35.80, still present on 1.3.62

Additional context

  • The issue seem to be present since fix(schema-compiler): propagate FILTER_PARAMS from view to inner cube's SELECT #8466
  • Adding this.collectAllMemberNames(); in the BaseQuery constructor just before building the join hints seem to solve the issue
  • When the queries are run in the "wrong" order, the cache key ['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets'] will contain ['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets', 'Messages', 'Messages'] (for both queries), causing the join in future queries using that segment
  • When the queries are run in the "correct" order, the cache key ['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets'] will contain ['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets'] (for both queries)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions