I'm creating a histogram using R (specifically, ggplot). The three columns I'm feeding the visual are:
- reportDate (type Date)
- id (type text)
- class (type text; can only be "Wired" or "Wireless")
The data source contains a row for every single connection that occurred for all devices whose ID is stored in the id column. I just want to create a histogram that tells me the number of days each ID had a connection. The code to accomplish this is:
##### Below added by Power BI; cannot be modified #####
# Create dataframe
# dataset <- data.frame(reportDate, id, class)
# Remove duplicated rows
# dataset <- unique(dataset
##### Above added by Power BI; cannot be modified #####
library(ggplot2) library(plyr) # Group by ID and count how many reportDates each one has DateCount <- ddply(dataset, .(id), summarise, Count = length(unique(reportDate))) # Plot ggplot(DateCount, aes(x=Count)) + geom_histogram(binwidth = 1, color="black", fill=rgb(1, 184, 170, maxColorValue=255)) + theme_bw() + xlab("Days Seen")
This works well, and produces the following result (I added a slicer so I can filter by a particular class):
Now let's say I add a measure to my report that's just a string:
Measure = "test"
If I add this column to the R visual, my intuition is that a column called Measure will be added to the dataset variable, but the rest of the columns should not change in any way, since the Measure column will just contain "test" for every row. However, once I add the Measure, the visual changes drastically:
Why is my intuition wrong? If I let the visual print how many rows dataset has, the values differ between both graphs. Without Measure it's 8740, and with Measure it's 14341. Why are these values different?
Thanks!