library(lpsugar)
library(ROI)
#> ROI: R Optimization Infrastructure
#> Registered solver plugins: nlminb, highs.
#> Default solver: auto.This vignette focuses on different regression techniques that can be solved using linear models.
We start by generating the data we’ll use.
set.seed(123)
n <- 50
k <- 3
true_beta <- c(2, 3, -5)
x <- matrix(rpois(n*k, 6), nrow = n, ncol = k)
x[, 1] <- 1
e <- rnorm(n)
y <- (x %*% true_beta) + e
head(x)
#> [,1] [,2] [,3]
#> [1,] 1 2 6
#> [2,] 1 5 5
#> [3,] 1 8 6
#> [4,] 1 3 10
#> [5,] 1 6 6
#> [6,] 1 4 9
head(y)
#> [,1]
#> [1,] -20.974429
#> [2,] -8.284773
#> [3,] -5.220718
#> [4,] -38.818697
#> [5,] -10.138891
#> [6,] -30.994236Least Absolute Deviation
The objective of Least Absolute Deviation (LAD) regression is to minimize the absolute residuals.
The absolute value function is not linear, so we have to separate each into:
Then the problem can be written like this:
This works. The objective function attempts to push both and down, but the constraints ensure that:
If .
If .
Let’s write the problem in lpsugar.
n <- nrow(x)
k <- ncol(x)
lad <- lp_problem() |>
lp_var(beta[1:k, 1]) |>
lp_var(e_pos[1:n], lower = 0) |>
lp_var(e_neg[1:n], lower = 0) |>
lp_min(sum(e_pos + e_neg)) |>
lp_alias(yhat = x %*% beta) |>
lp_con(
pos = e_pos >= y - yhat,
neg = e_neg >= yhat - y
) |>
lp_solve()
cbind(
true_beta = true_beta,
lad_beta = lad$variables$beta[, 1] |> round(2)
)
#> true_beta lad_beta
#> 1 2 1.83
#> 2 3 2.98
#> 3 -5 -4.97