I'm trying to build a model and it's driving me nuts.
A lot of questions here. Documenting my thoughts and trying to paint the picture.
I have 6 dimensions ( max 200k records) and 2 fact tables with approx 10 million records or so.
I originally denormalised the fact tables to avoid duplicate rows with the same index.
Turns out Direct query on this structure with the auto-generated SQL executes subqueries with "select * "on tables relevant to the query so the wide denormalised tables a can be a severe performance obstacle. Even if the chart only reference one dimension, the query drags in every column from the fact table and the dimension table. Logical reads are out of control.
Currently I have written view on top of the fact tables to separate them into subfacts with the 5-6i ndexes and 1 or 2 facts each. I have views on top of the tables that hide the metadata from PBI. (only SELECT ... FROM ...)
Should I physically split these tables out rather than use views?
Should I normalise the fact tables and duplicate the indexes more and verticalise the fact columns?
I then upgraded to Azure SQl database premium tier (for the ssd) and changed the fact tables to memory-optimized and added clustered columnstore indexes.
Still runs slow. What do I do next?
Upgrading further to Azure SQL database seems overkill. Including the persist layer I have less than 10gb.
SSAS Tabular is not an option cos I cant buy hardware and the idea is to use Power BI embedded.
How do I diagnose what the problem is?
Do I need Application Insights?
The cpu time on the queries is low but the elapsed time is 10-12 x the cpu time.
I dont know if this is in the normal range.
Do I need to reorganize/defrag/maintain the clustered columnstore indexes on the fact tables?
Also, how much processing is done on the website vs the database?
Do I need a minimum amount of website hardware?
Any help much appreciated.