Hello all,
I'm hoping for some help or advice in how I might resolve this problem.
Basically, I need to try to track & summarise the week-on-week changes in status for a fairly large number of unique 'things'.
I currently do not have direct access to the datasets involved - they come from an external database in the form of weekly extracts.
Each week's dataset is a CSV file that currently averages 75,000 to 85,000 rows and consists of:
- Unique identifier for each 'thing' (9-digit number)
- Primary Status (Text String)
- 'Sub'-status (Text String)
- Date that Primary Status most recently changed
- Date for the 'week commencing' of the week the specific dataset refers to
What I want to be able do is to determine the status changes from one week to the next, categorise the change, then have summarised counts of those categories.
The categories I want to use are:
- 'No change' (Both weeks the same status) - ideally this should exclude any instances where there is no data for any given 'thing' in either week being compared
- 'New' ('thing' appears in the later week but not the earlier)
- 'Removed' ('thing' appears in earlier week but not the later)
- 'Changed', When there has been an actual status change between one week & the next. This should return the 'new' status one way or another.
Owing to the weekly nature of the extracts, merging tables is likely to be inpractical.
Any advice or assistance gratefully accepted.
Thanks in advance.