To do this, I made a function distanceWalked, which calculates the distance traveled for each row except the first.
distanceWalked <- function(data) { data$distance[1] <- 0 if (nrow(data) > 1) { for (i in 2:nrow(data)) { data$distance[i] <- dmatrix[data$section[i-1],data$section[i]] } } return(data)}
I then made a new data table which has all unique combinations of day and id
unique_combos <- unique(data.table(date = dt$day, id = dt$id))
Then I ran a for loop which subsets the data, chronologically orders it by the st column, run distanceWalked on it, and then aggregate it to a new data table
new_data <- data.table()for (i in 1:nrow(unique_combos)) { dt_sub <- dt[dt$day == unique_combos$date[i] & dt$id == unique_combos$id[i]] setorder(dt_sub, st) dt_sub <- distanceWalked(dt_sub) new_data <- rbind(new_data, dt_sub)}
I then used the dplyr package to find the sum of distance by each unique combination of day and id
library(dplyr)final_data <- new_data %>% group_by(day, id) %>% summarize(total_distance = sum(distance))
It should yield something like this
day id total_distance1 02/28 104 32 05/14 104 03 02/26 104 0
This might take awhile to complete for 5 million rows, but it should get you where you need to go!