Quantcast
Channel: Desktop topics
Viewing all articles
Browse latest Browse all 213819

How to remove duplicates based on sort order

$
0
0

When loading the data I need to keep only the latest record.

I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.

 I thought I could

  • Associate a statusID to each status, making sure that the lowest the ID the latest the event
  • Sort by statusID
  • Remove duplicates

Unfortunately it seems the sort order is not always respected,

I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.

It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.

 

Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.

Something along the line of

  

 #"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),

 

But that does return a type incompatibility error


Viewing all articles
Browse latest Browse all 213819

Trending Articles