Is cube the answer for querying large data between time ranges?

Hello,
our company looking to improve our analytics dashboard, since we start dealing with big data.
going through the new technologies we found out about Cube.js and am trying to find out if it suitable for us or not.

after reading the documentation we start testing with simple queries and things was pretty good until we start using time ranges and it start to look like there is no major difference in performance between it and the normal postgres query.
i tried with and without using pre-aggregations, i thought pre-aggregation is the best choice for us, long as we don’t wan’t the data at time, so we can refresh it every 4-6 hours.

so i would like to share what i did here, to check with you if i miss something.

i used table named intent_event with 20+ million row for testing:

with below dimensions and measures:

cube('IntentEvent', {
  
  sql: 'SELECT * FROM platform.intent_event',

  preAggregations: {
    main: {
      type: 'originalSql',
      scheduledRefresh: true,
    }},

  refreshKey: {
    every: '4 hour'
  },

  measures: {
    count: {
      type: 'count',
      drillMembers: [id]
    }
  },

  dimensions: {
    id: {
      sql: 'id',
      type: 'number',
      primaryKey: true
    },
    botId: {
      sql: 'bot_id',
      type: 'number'
    },
    intentId: {
      sql: 'intent_id',
      type: 'number'
    },
    eventType: {
      sql: 'event_type',
      type: 'string'
    },
    eventSource: {
      sql: 'event_source',
      type: 'string'
    },
    eventTime: {
      sql: 'event_time',
      type: 'time'
    }
  },
  dataSource: 'default'
});

and this was the query to get top triggered intents :

    "query": {
        "measures": [
            "IntentEvent.count"
        ],
        "dimensions": [
            "IntentEvent.intentId"
        ],
        "timeDimensions": [
            {
                "dimension": "IntentEvent.eventTime",
                "dateRange": [
                    "2000-01-01",
                    "2021-12-31"
                ]
            }
        ],
        "filters": [
            {
                "member": "IntentEvent.botId",
                "operator": "equals",
                "values": [
                    "3"
                ]
            }
        ],
        "limit": 10
    }

any tip or advice will help, thanks.

1 Like

Hi @Abdulrhman_Al-Harasi :wave:

Thanks for such a detailed question and a thorough testing you’ve performed.

I think that the originalSql pre-aggregation doesn’t make much sense in your use case and won’t allow you to get a significant performance improvement. I suggest switching to a rollup pre-aggregation with a monthly partition granularity. It should improve the performance significantly.

1 Like

sure gonna try that, meanwhile i have a few questions, i learned that pre-aggregation option create tables on the DB when we use it and i wan’t to know:

  • how i can create the pre-aggregated tables when i run the cube.js service? it takes long time to create it when i send my first request and i wan’t to avoid that, i wan’t everything to be ready as soon as the server is running.
  • regarding to the above cube schema, is it gonna refresh the pre-aggregated table every ‘4 hour’ using the refreshKey property?

For that, please set up the background refresh and a dedicated refresh worker.

Not exactly. The preaggregation refresh interval is set via the refresh key set in the pre-aggregation options: https://cube.dev/docs/pre-aggregations#refresh-key

1 Like

Thanks @igorlukanin ,
after reading more about OLAP cube, i realized how powerful it is and how rollup pre-aggregation with a time partition is very useful in our case.

using Cube.js helped us to receive the data faster more than 100 times a normal postgreSQL query.
just awesome!!, looking forward to use it in our platform :fire: :v:

quick note, i get confused when i used refreshKey property. i find out later that there is one for
the in-memory caching and another one for the pre-aggregation. so it will be very helpful if the documentation mention something about that.

1 Like