Filter by joined data from different data source

Hello,
We have web analytics data in BigQuery (unique visits, click events, etc.). As part of those data, there is an ID of items (i.e., Products, Companies,…) that are stored in the different data sources (MongoDB).

We followed this example - https://cube.dev/docs/recipes/joining-multiple-data-sources - and it seems to work well. We can use dimensions from both data sources, and they are joined correctly.

However, filtering query by field from the joined tables from the different databases seems not to work. For example, in BQ, there are blog posts unique visits, and in MongoDB, we have post data, including the author. We can query both visits and post data (joined on authorId that is present on both tables), but we cannot filter query to include only some of the authors (filter "author equals [array_of_ids]). Query ignores it and returns the whole set regardless of set filters.

Is this even possible in current version of CubeJS?

Thank you all,
Milan Z.

Hi @littlewall and thanks for posting your question!
Could you share your schema and config file so we can try to reproduce that or at least take a look, please?

Hello,
thank you for your quick reply. This is where I get stuck.

cube('FileHostingAnalytics', {
    sql: `SELECT * FROM ${getDBName()}.filehosting`,

    dataSource: dataSources.bigQuery,

    preAggregations: {
        fileHostingRollup: {
            type: 'rollup',
            external: true,
            measures: [CUBE.count, CUBE.totalCount],
            timeDimension: CUBE.timestamp,
            dimensions: [CUBE.filehostingId],
            granularity: 'day',
            indexes: {
                categoryIndex: {
                    columns: [CUBE.filehostingId],
                },
            },
        },
        combinedRollup: {
            type: 'rollupJoin',
            measures: [CUBE.count, CUBE.totalCount],
            dimensions: [
                FileHostingData.author,
                CUBE.id,
                CUBE.filehostingId,
            ],
            rollups: [FileHostingData.fileHostingDataRollup, CUBE.fileHostingRollup],
        },
    },

    joins: {
        FileHostingData: {
            relationship: 'belongsTo',
            sql: `${CUBE.filehostingId} = ${FileHostingData.id}`,
        },
    },

    measures: {
        totalCount: {
            type: 'count',
            sql: 'id',
        },
        count: {
            type: 'count',
            drillMembers: [id],
        },
    },

    dimensions: {
        id: {
            sql: 'id',
            type: 'string',
            primaryKey: true,
        },

        filehostingId: {
            sql: 'filehostingId',
            type: 'string',
        },

        timestamp: {
            sql: 'timestamp',
            type: 'time',
        },
    },
});

cube('FileHostingData', {
    sql: 'SELECT id,author FROM `filehostings`',

    dataSource: dataSources.mongobi,

    preAggregations: {
        fileHostingDataRollup: {
            type: 'rollup',
            dimensions: [CUBE.id, CUBE.author],
            indexes: {
                categoryIndex: {
                    columns: [CUBE.id],
                },
            },
        },
    },

    measures: {
        count: {
            type: 'count',
        },
    },

    dimensions: {
        id: {
            sql: 'id',
            type: 'string',
            primaryKey: true,
        },
        author: {
            sql: 'author',
            type: 'string',
        },
    },
});

With this setup, I can do something like:

{
  "dimensions": [
    "FileHostingAnalytics.filehostingId",
    "FileHostingData.author"
  ],
}

And the result seems to be correct:


Those empty cells in some rows are correct (we don’t have data in MongoDB for that ids).

But when I try to add some filters:

{
 "filters": [
    {
      "member": "FileHostingData.author",
      "operator": "equals",
      "values": [
        "R6t8eyP0IXiPJFOsPn_LV"
      ]
    },
  ],
  "dimensions": [
    "FileHostingAnalytics.filehostingId",
    "FileHostingData.author"
  ],
}

The result is the same (expectations are just two rows in the result but got six)
I tried to add the filter “FileHostingData.author” with operator isSet, but It remains the same.

Thank you very much, it’s possible you’ll see something I don’t.

@littlewall
I’ve found the following issue and it seem to be the same one:

Feel free to add your comment and subscribe for updates there.