InkaKA
← Back to Lima

The Inka Dink Transit Time Estimator

Predicting DPO package transit times to Lima

Like any good consumerist American family, we order a lot of stuff from the internet. We're also fortunate to have loving family and friends who send us packages from the US. How do we get all this stuff in Lima? Through our friends at the DPO! The Diplomatic Post Office is a service which magically (via Miami) transports our mail from a US address to the Embassy here in Lima where my mom picks it up. Sonja and I really like opening up the packages because they usually contain really good stuff; it is like having a bunch of mini birthdays throughout the year. As soon as we know something is coming we're always like: "when's it gonna get here?", but the USPS is always wrong, because they don't seem to have an understanding of what happens after the package reaches Miami.

My dad and I set out to make predictions on when an item in transit might arrive. We analyze historical shipment data to fit four different regression models on how long a package will take to arrive based on size, origin, day of week, and holiday proximity and you can use the tool I built, the Inka Dink Transit Time Estimator, below and then read on to understand how it works.

Low sample size: Only 11 shipments have been logged. The feature-based models (Poisson GLM, Linear Regression, Random Forest) have ~15 parameters, so degrees of freedom is too high. These models will both overfit the training data and alias parameters because they can't be differentiated given the low sample count. As more samples are added this error will go away. Send more mail.

Inka Dink Transit Time Estimator

Select features below to get a predicted transit time from each model. Leave a field blank to average over all values of that feature. The range below each estimate is a 95% prediction interval.

Consensus Estimate (median)
Arrival in days on
Poisson
days
Poisson GLM
days
Linear
days
Random Forest
days

Data Overview

We began tracking shipment times when we arrived in Lima in early March, including who sent the package, where it came from, when it was sent and when it arrived. Raw data is shown at the bottom of this page.

11
Shipments
10
Avg Transit Days
7–19
Range
3
Origin Regions

Regression Models

I fit four progressively complex models to the data using five features: package size (Mail/S/M/L), sender type (Commercial/Personal), sender region (Northeast, South, Midwest, West), day of week shipped, and whether a US or Peru public holiday falls between the ship date and 9 days after (the average transit time). Each model adds more flexibility to capture patterns in transit times.

We measured performance of the models using RMSE (root mean squared error) measures prediction accuracy in days, so lower is better. Note: Random Forest reports out-of-bag error (honest), while the others report in-sample error (optimistic). This essentially means that the Random Forest is being assessed against data that some of the trees didn't train on (which is part of the RF algorithm to avoid overfitting). The other models include all samples in their construction and don't have this built-in assessment, so the comparison isn't perfect and Random Forest will be more honest in its assessment.

Model Description RMSE
Poisson Baseline model assuming transit days follow a Poisson distribution. Predicts the same value for all shipments. 3.22
Poisson GLM Poisson regression with features. Coefficients are multiplicative (log link): exp(β) gives the rate ratio. 0.64
Linear Additive model where each feature contributes independently to predicted transit time. 0.64
Random Forest Ensemble of decision trees that can capture nonlinear relationships and feature interactions. 2.79

Poisson (Intercept-Only)

This is a parameterless baseline that models transit days as count data drawn from a Poisson distribution with rate parameter λ = 10 days. It predicts the same value for all shipments regardless of features. The Poisson PMF overlay on the histogram above, and replicated below, shows this model's predicted distribution.

Poisson GLM Coefficients

Poisson Generalized Linear Model (GLM) adds features via a log link function: log(E(Yx))=α+βx\log(E(Y \mid \mathbf{x})) = \alpha + \boldsymbol{\beta}^\top \mathbf{x}. The "Rate Ratio" column shows exp(β) for each feature, which is the the multiplicative effect on transit days. A rate ratio of 1.15 means "15% longer," while 0.95 means "5% shorter." Poisson's multiplicative model naturally prevents negative predictions and assumes variance equals the mean.

We can use the Akaike Information Criterion (AIC) to compare this GLM vs the parameterless Poisson. AIC = 65.8 vs. 57.2 for parameterless. GLM's higher AIC suggests the added features don't improve fit enough to justify the extra parameters with this small sample (n=11). AIC penalizes model complexity, so the simpler intercept-only model is preferred by this metric despite the GLM's lower RMSE. This is a sign that this model is likely overfitting.

Feature β Rate Ratio p-value
Baseline
(Intercept) +2.7054 ×14.961 0.022
Package Size
Mailref 0 ×1
S -0.5082 ×0.602 0.6218
M -0.3747 ×0.687 0.6112
L -0.3747 ×0.687 0.674
Sender Type
Commercialref 0 ×1
Personal -0.0795 ×0.924 0.9037
Sender Region
Northeastref 0 ×1
Midwest 0 ×1 1
South -0.2513 ×0.778 0.618
Day of Week
Mondayref 0 ×1
Thursday 0 ×1 1
Friday +0.2336 ×1.263 0.5264
Saturday +0.3185 ×1.375 0.4931

Linear Model Coefficients

Ordinary least-squares (OLS) regression models transit days as a linear combination of features: E(Yx)=α+βxE(Y \mid \mathbf{x}) = \alpha + \boldsymbol{\beta}^\top \mathbf{x}. Each coefficient βj\beta_j represents the additive change in expected transit days relative to the reference level (marked ref). R² = 0.961 (adjusted 0.605), meaning the model explains about 96% of the variance in transit times.

The p-value tests the null hypothesis H0 ⁣:βj=0H_0\!: \beta_j = 0 that a feature has no effect. Coefficients with p < 0.05 are bolded below; these are the features where we can reject the null with 95% confidence. With n=11 observations and 15 parameters, the gap between R² and adjusted R² (0.961 vs. 0.605) suggests some overfitting. Adjusted R² penalizes additional features, so if we have more features but don't get any additional improved R² then the adjusted R² will be lower.

Feature Effect (days) p-value
Baseline
(Intercept) +16.5 0.2716
Package Size
Mailref 0
S -7.5 0.4565
M -6.5 0.4159
L -6.5 0.4643
Sender Type
Commercialref 0
Personal -0.5 0.9202
Sender Region
Northeastref 0
Midwest 0 1
South -2 0.6257
Day of Week
Mondayref 0
Thursday 0 1
Friday +2.5 0.5122
Saturday +3 0.5

Random Forest

Random forest (500 trees via ranger) builds an ensemble of decision trees, each trained on a bootstrap sample. Each node in these trees split the result on a single feature, eventually resulting in a "decision" on expected transit time. All 500 are averaged to come to a final prediction. It can capture nonlinear relationships and feature interactions that the other models cannot. For example, this is the only model that could conceivably recognize that deliveries don't happen on Saturday or Sunday, so a shipping date of Wednesday could never take 9-10 days; it can have a non-continuous jump for this case. Importance is measured by how much each feature reduces prediction error across the ensemble (impurity-based importance).

Raw Data

All 11 datapoints, sorted by ship date.

Raw data Model predictions
+ indicates model predicted longer ship time
Shipped Arrived Transit Size Type Region Poisson (intercept-only) Poisson GLM Linear Regression Random Forest
2026-03-27 2026-04-08 12d M Personal Northeast -2d 0d 0d +0.3d
2026-04-06 2026-04-17 11d M Personal Northeast -1d -1.5d -1.5d -1.1d
2026-04-08 2026-04-17 9d S Commercial Midwest +1d 0d 0d -0.1d
2026-04-08 2026-04-17 9d S Commercial Northeast +1d 0d 0d 0d
2026-04-09 2026-04-17 8d M Commercial South +2d 0d 0d +0.4d
2026-04-09 2026-04-17 8d L Commercial South +2d 0d 0d +0.3d
2026-04-13 2026-04-20 7d S Commercial South +3d 0d 0d +1.2d
2026-04-13 2026-04-21 8d M Commercial South +2d 0d 0d +0.3d
2026-05-04 2026-05-12 8d M Personal Northeast +2d +1.5d +1.5d +1.9d
2026-05-02 2026-05-13 11d M Commercial South -1d 0d 0d -0.2d
2026-04-25 2026-05-14 19d Mail Personal Midwest -9d 0d 0d -3.9d