Standard error / standard deviation, or aggregation function in `sql`

Hello!

I’m trying to figure out how to get CubeJS to send over standard error or standard deviation for a measure. What’s the best way to go about doing this?

I can do this:

      measures: {
        standardDeviation: {
          sql: `stddev_samp(value)`,
          type: `number`,
        },
      }

But I can’t use it with filters (which are important to my use-case):

      measures: {
        standardDeviation: {
          filters: [{ sql: `${MeasureComponent}."measureId" = 1` }],
          sql: `stddev_samp(value)`,
          type: `number`,
        },
      }

(postgres generates the error Error: column "measure_component.measureId" must appear in the GROUP BY clause or be used in an aggregate function)

Is there another way to do this? My apologies if there is an obvious sql solution here I’m not seeing (but I can’t do the normal sql trick of joining on a subquery, nor did I find a way to get windowing to work).

2 Likes

Hi @llamallamagirl :wave:

Thanks for such a detailed question!

I believe the root cause is that the number type that this measure has is not aggregatable, hence the error. The question is — how do you aggregate two (or more) standard deviation values together? Calculate the average value? Sum them?

You can try changing number to avg (or any other aggregatable measure type) and see if the error persists.

1 Like

Might stddev become a supported Cubejs measure type in the future?

I don’t think it might because Cube.js models data at a lower level (strings, numbers, time intervals, etc.).

Could you please elaborate on your use case? What are you trying to acheive? I’ll do my best to help you.

(I’ve googled about combining, or aggregating, standard deviations and it looks like the formula for that is quite complex. However, it looks like variances can be summed and then deviations calculated as square roots. If you have variations in your data set, you can express the deviations in the data schema, no problem. See: https://stats.stackexchange.com/a/26647)

1 Like

In this particular case, I have Cubejs serving up time series data of heart rate (and other biometric data). What I’d like to do is show confidence interval / error bars / variance / IQR around an aggregate value, such as the variance of daily heart rate. This would inform a visualization like this: Examples - Apache ECharts or in a box plot like this: Examples - Apache ECharts

What I’d most like is for Cubejs to support returning, for example, avg along with a statistical measure of variance, all within the same “measure.” In my mind, it seems like an important capability for something like Cubejs, but then again I’m only thinking in terms of how I’m using it :slight_smile:

Thank you for researching solutions for me! I will try it out.

I think I found something that works. Instead of doing this:

  measures: {
    standardDeviation: {
      filters: [{ sql: `${MeasureComponent}."measureId" = 1` }],
      sql: `stddev_samp(value)`,
      type: `number`,
    },
  }

Which creates sql like this:
select CASE WHEN (measure_component.“measureId” = 1) THEN stddev_samp(value) END […]

I can do this:

  measures: {
    standardDeviation: {
      sql: `stddev_samp(value) filter (where ${MeasureComponent}."measureId" = 1)`
      type: `number`,
    },
  }

Which creates sql like this:
select stddev_samp(value) filter (where measure_component.“measureId” = 1) […]
and does not expect/require the group by.

Here are details on ‘filter’ for postgres and with other databases.

Thanks for the help!

1 Like