Multi-tenant with shared data Cube schema set up

I’m working on a reporting feature for an application using CubeJS and the team and I have hit something of a stumbling block. We’re trying to figure out how to design out CubeJS schema based on an existing DB with a few key requirements. Below is a description of the existing DB in CubeJS style syntax, it’s not exactly what we’ve declared in our actual cubes but I hope it will help outline the general relationships between the underlying DB.

cube(`Organization`, {
  joins: {},
});

cube(`Application`, {
  joins: {
    Organization: {
      relationship: `belongsTo`,
      sql: `${CUBE}.org_id = ${Organization}.id`,
    },
  },
});

cube(`Project`, {
  joins: {
    Application: {
      relationship: `belongsTo`,
      sql: `${CUBE}.app_id = ${Application}.id`,
    },
  },
});

cube(`ProjectTask`, {
  joins: {
    Project: {
      relationship: `belongsTo`,
      sql: `${CUBE}.proj_id = ${Project}.id`,
    },
    LibraryTask: {
      relationship: `belongsTo`,
      sql: `${CUBE}.lib_id = ${LibraryTask}.id`,
    },
    TaskStatus: {
      relationship: `belongsTo`,
      sql: `${CUBE}.status_id = ${TaskStatus}.id`,
    },
  },
});

cube(`LibraryTask`, {
  joins: {
    Organization: {
      relationship: `belongsTo`,
      sql: `${CUBE}.org_id = ${Organization}.id or ${CUBE}.org_id is null`,
    },
  },
});

cube(`TaskStatus`, {
  joins: {
    Organization: {
      relationship: `belongsTo`,
      sql: `${CUBE}.org_id = ${Organization}.id or ${CUBE}.org_id is null`,
    },
  },
});

cube(`Activity`, {
  joins: {
    Organization: {
      relationship: `belongsTo`,
      sql: `${CUBE}.org_id = ${Organization}.id`,
    },
    Application: {
      relationship: `belongsTo`,
      sql: `${Activity}.app_id = ${Application}.id or ${CUBE}.app_id is null`,
    },
    Project: {
      relationship: `belongsTo`,
      sql: `${Activity}.proj_id = ${Project}.id or ${CUBE}.proj_id is null`,
    },
  },
});

Some key notes and requirements:

  • The data in the LibraryTasks and TaskStatus is sometimes tied to an Organization but a large portion of it is not. It has org_id set to null to indicate it is generic and can be seen by all organizations.
  • The data in the rest of the cubes is always tied to an Organization and it is very important from a security perspective that we not allow one organization to see another’s data.
  • Some Activities are specific to a Project or Application but some are not.

The issue we’re running into is how to enforce organization separation while also allowing users to query for all standard LibraryTask and TaskStatus. Initially, we had designed our cubes so that when a user queries for lets say all of the projects (in their org), the Project table was the base table in the query to which we would join Organization and add a filter for the user’s org. What we found is that this works for Project and Application but when we try and query for ProjectTask instead of getting all of the tasks for the user’s org we get all of the ProjectTask in all orgs because the join goes through the library cube and not the project cube. If we remove the or null statements from the joins we fix the bleed between the orgs but loose the ability to list all the standard LibraryTask using a query.

If we reverse the join direction and make Organization the base table we solve some of the above problem, but run into an issue when querying for activities. If we try and query for a count of activities connected to a certain project the count is not aggregated over activity but over Organization because that is the base table and the number is the same for every project when it should not be.

We’re hoping for some advice on how to organize the the schema in this case. What makes more sense to have as a base table, Organization or the other tables? We thought about splitting up Activity into three different cubes and having them each have one join but that seems a kind of ugly. We also thought about using pre-aggregations to effectively truncate the tables to only include rows for a specific organization and rows where org_id = null but we’re apprehensive about the amount of code this adds that we need to then maintain. (There’s also a lot more organization specific tables I haven’t listed here.)

Any thoughts and ideas are welcome and let me know if you have any questions. Sorry for the super long post… and thank you for the taking the time to read it all!

Hi, have you seen this https://cube.dev/docs/multitenancy-setup
and this as well https://cube.dev/docs/security/context#using-query-rewrite

You need to find a way to supply Cubejs with a JWT Token that contains user context data like OrganizationId and ProjectId etc.

Then leverage the queryRewrite API to filter the results from Cubejs according the the data in SECURITY_CONTEXT aka the id that you’ve supplied in the JWT.

We’re using something to that effect already. This is a simplified version of how we declare our server:

const server = new CubejsServer({
  queryTransformer: (query, { securityContext: user }) => {
    query.filters.push({
      dimension: "Organization.id",
      operator: "equals",
      values: [user.org, null],
    return query;
  },
});

The complications arise from the ways CubeJS chooses to formulate the joins between the tables the user wants to query and the Organization table. Depending on how we declare the schema there’s either bleeding between the tenants or the results are too restricted and don’t return the data I’d expect to see.

1 Like