When loading the data I need to keep only the latest record.
I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.
I thought I could
- Associate a statusID to each status, making sure that the lowest the ID the latest the event
- Sort by statusID
- Remove duplicates
Unfortunately it seems the sort order is not always respected,
I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.
It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.
Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.
Something along the line of
#"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),
But that does return a type incompatibility error