August 12, 2025

argmin

The term argmin stands for “argument of the minimum”.
It identifies the input values (arguments) that minimize a given function, rather than the minimum value itself.

Min vs. Argmin

$\min f(x)$ → the smallest output value of the function.
Example: for $f(x) = x^2$, $\min f(x) = 0$.
$\arg\min f(x)$ → the input $x$ that achieves this minimum.
Example: for $f(x) = x^2$, $\arg\min f(x) = 0$ (since $x=0$ minimizes the function).

Formal Definition

For a function $f(x)$:

\[\arg\min_{x \in S} f(x) = \{ x^* \in S \mid f(x^*) \leq f(x) \ \text{for all} \ x \in S \}\]

This means $\arg\min$ returns the set of points $x^*$ in the domain $S$ where $f(x)$ attains its smallest value.

Example

Function:

\[f(x) = (x - 3)^2 + 2\]

Minimum value: $\min f(x) = 2$ (when $x = 3$)
Argmin: $\arg\min f(x) = 3$

At $x = 3$:

\[f(3) = (3 - 3)^2 + 2 = 2\]

Multiple Minimizers

If multiple inputs yield the same minimum, $\arg\min$ returns all such points.

Example:

\[g(x) = \cos(x)\]

Minimum value: $-1$

\[\arg\min g(x) = \{ (2k+1)\pi \mid k \in \mathbb{Z} \}\]

(all odd multiples of $\pi$)

Domain Restriction

$\arg\min$ can be restricted to a subset $S$ of the domain.

Example: for $h(x) = x^2$ with

\[S = \{ 2, 3, 4 \}\] \[\arg\min_{x \in S} h(x) = 2\]

since $2^2 = 4 < 3^2 = 9 < 4^2 = 16$.

Multivariable Functions

$\arg\min$ works similarly for functions with multiple inputs.

Example:

\[f(x, y) = x^2 + y^2\] \[\arg\min f(x, y) = (0, 0)\]

Why Use Argmin?

In optimization, machine learning, and statistics, $\arg\min$ identifies the optimal parameters (e.g., weights in regression, cluster centers in k-means).
It answers: “Which inputs lead to the best solution?” rather than just “What is the best value?”

Argmin in Machine Learning

In mathematics and machine learning, the symbol $\arg\min$ — short for “argument of the minimum” — is used to describe the input values that minimize a given function. Rather than returning the smallest output value (which $\min$ does), $\arg\min$ tells us where that smallest value occurs. For example, for $f(x) = x^2$, we have $\min f(x) = 0$, but $\arg\min f(x) = 0$ because $x = 0$ is the point that achieves this minimum. In higher dimensions, $\arg\min$ works the same way: for $f(x, y) = x^2 + y^2$, the minimum occurs at $(0, 0)$.

This concept is central to optimization problems in machine learning. One way to define the problem of model fitting or training is to find parameter values $\theta$ that minimize the empirical risk on the training set:

\[\hat{\theta} = \arg\min_{\theta} L(\theta) = \arg\min_{\theta} \frac{1}{N} \sum_{n=1}^N \ell(y_n, f(x_n; \theta))\]

This approach is called empirical risk minimization (ERM). However, our real objective is not just to do well on the training data, but to minimize the expected loss on future unseen data:

\[\theta^* = \arg\min_{\theta} \mathbb{E}_{(x, y) \sim p_{\text{data}}} \big[ \ell(y, f(x; \theta)) \big]\]

Because the true data distribution $p_{\text{data}}$ is unknown, we use the training set as an approximation, but must take care to avoid overfitting. Regularization, validation, and early stopping are common tools to ensure good generalization.