Hi all,
I am rather new to PowerBI but I hope some of you can help me with this question.
I am working with an Oracle database holding data on compettition participation for customers. Each competition takes place over 6 days, and customers can compete numerous times on the same day.
The data I am working with is structured like this:
Column 1: CompetitionNumber
Column 2: CompetitionDay
Column 3: CustomerID
So as an example a couple of row could look like:
CompetitionNumber | CompetitionDay | CustomerID |
1 | 1 | 778856 |
1 | 1 | 778856 |
1 | 2 | 778856 |
1 | 2 | 808012 |
What I am trying to do is to get a measure of how many customers participated each day, so what I am doing is a Group BY on CompetitionNumber and CompetitionDay and counting the number of distinct rows.
However I am running into severe PowerBI performance issues - with PowerBI slowly filling up my PC's RAM over severtal hours doing the Gropu By until PowerBI crashes. The table I am working is very large - holding around 100+ million rows. However, when I try to do this via SQLDeveloper on the Oracle database itself it takes around 10 minutes for Oracle to compute the GroupBY.
As I see it there three different ways forward:
- Pull the data out manually using SQLDeveloper and Oracle. I have tried this, but that brings the main downside that my online PowerBI report won't be able to auto refresh itself using my On-premises gateway, so that won't be a long-term solution.
- Find some way to let PowerBI connect to the Oracle database using SQL, not PowerQuery (i.e. to let the Oracle database do the heavy lifting in terms of calculations). Could using an R query accomplish this (with something like ROracle) while still being able to use the on-premises gateway for auto refresh?
- Split up my PowerBI query into several sub-queries using filters on Competition - i.e. make a PowerBI query that extracts for each competition number. Using competition number level filtering gets the data size down to around 6 million, which I think would be more manageable for PowerBI. And then in the end combine each query into a single table.
What do you think - is there something else I could try to overcome this obstacle?
Many thanks beforehand!