-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
If, immediately after starting Cube, I run a query that triggers a JOIN and uses a segment with a FILTER_PARAMS, future queries using that segment will also have that JOIN even if they don't need it.
If I restart Cube then immediately perform a query with that segment that does not need a JOIN, future queries using that segment will behave correctly (those that need a JOIN have it, those that don't need one don't).
Note:
- If the query uses filters instead of a segment, it seem to work as expected.
- If the segment does not use
FILTER_PARAMS
, it works as expected.
To Reproduce
Note
Make sure to run the queries quickly after starting Cube, as the queries triggered in the background by the scheduler can influence the results.
- Start Cube.js
- Send a first query:
{ "dimensions": ["Tickets.ticketId", "Messages.ticketId"], "segments": ["Tickets.closedTickets"], "measures": ["Tickets.ticketCount"] }
- There is a
LEFT JOIN
betweenTickets
andMessages
as expected. - Send a second query:
{ "dimensions": ["Tickets.ticketId"], "segments": ["Tickets.closedTickets"], "measures": ["Tickets.ticketCount"] }
- There is still get a
LEFT JOIN
which is unnecessary and gives worse performance.
Expected behavior
The second query should not have the LEFT JOIN
, which is the case if the queries are run in the opposite order.
Minimally reproducible Cube Schema
Note
A full reproducible example repository can be found here: https://github.com/Spark-NF/cubejs-additional-join
It contains the reproduction steps, a docker compose file, this cube config, as well as the outputs of the queries.
cube("Tickets", {
sql: `
SELECT
ticket_id AS ticketId,
max(ticket_closed_datetime) AS closedDatetime,
FROM data_tickets
GROUP BY ticketId
`,
segments: {
closedTickets: {
sql: `${FILTER_PARAMS.Tickets.closedDatetime.filter(`closedDatetime`)}`,
},
},
measures: {
ticketCount: {
type: "count",
sql: `${CUBE}.ticketId`,
},
},
dimensions: {
ticketId: {
sql: `${CUBE}.ticketId`,
type: "number",
primaryKey: true,
shown: true,
},
},
joins: {
Messages: {
relationship: "one_to_one",
sql: `${CUBE}.ticketId = ${Messages}.ticketId`,
},
},
});
cube("Messages", {
sql: `
SELECT
ticket_id AS ticketId,
count(ticket_message_id) AS messagesCount,
FROM data_messages
GROUP BY ticketId
`,
dimensions: {
ticketId: {
sql: `${CUBE}.ticketId`,
type: "number",
primaryKey: true,
shown: true,
},
},
});
Version:
Starting 0.35.80, still present on 1.3.62
Additional context
- The issue seem to be present since fix(schema-compiler): propagate FILTER_PARAMS from view to inner cube's SELECT #8466
- Adding
this.collectAllMemberNames();
in theBaseQuery
constructor just before building the join hints seem to solve the issue - When the queries are run in the "wrong" order, the cache key
['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets']
will contain['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets', 'Messages', 'Messages']
(for both queries), causing the join in future queries using that segment - When the queries are run in the "correct" order, the cache key
['collectFrom', 'collectJoinHintsFromMembers', 'Tickets.closedTickets']
will contain['Tickets', 'Tickets', 'Tickets', 'Tickets', 'Tickets']
(for both queries)