Verallgemeinertes Skalarprodukt · Mathematik zu ML und AI

Allgemeine Form des Skalarprodukts

Ein Skalarprodukt auf dem $\mathbb{R}^n$ lässt sich über eine symmetrisch positiv definite Matrix $\mathbf{A}$ definieren:

Die verallgemeinerte Definition

\langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{x}^\top \mathbf{A}\, \mathbf{y}

Das Standard-Skalarprodukt (Punktprodukt) ist der Spezialfall $\mathbf{A} = \mathbf{I}$ (Einheitsmatrix): $\mathbf{x} \cdot \mathbf{y} = x_1 y_1 + x_2 y_2$ . Gleichung 3.19 wählt ein konkretes $\mathbf{A} \neq \mathbf{I}$ und liefert damit ein anderes, ebenso gültiges Skalarprodukt.

Gleichung 3.19 — das konkrete Beispiel

Die Matrix $\mathbf{A}$ lautet:

\mathbf{A} = \begin{pmatrix} 1 & -\tfrac{1}{2} \\[4pt] -\tfrac{1}{2} & 1 \end{pmatrix}

Eingesetzt in $\mathbf{x}^\top \mathbf{A}\, \mathbf{y}$ ergibt sich die rechte Seite von 3.19:

\langle \mathbf{x}, \mathbf{y} \rangle = x_1 y_1 - \tfrac{1}{2}\bigl(x_1 y_2 + x_2 y_1\bigr) + x_2 y_2

Der Unterschied zum Punktprodukt ist der Mischterm $-\tfrac{1}{2}(x_1 y_2 + x_2 y_1)$ . Die Matrix $\mathbf{A}$ ist symmetrisch und positiv definit (Eigenwerte $\tfrac{3}{4}$ und $\tfrac{1}{2}$ ) — daher ist dies ein gültiges Skalarprodukt.

Schritt für Schritt: x⊤ A y

Dank der Assoziativität der Matrixmultiplikation gilt $(\mathbf{x}^\top \mathbf{A})\,\mathbf{y} = \mathbf{x}^\top (\mathbf{A}\,\mathbf{y})$ . Man kann die Klammern frei setzen — die Faktoren bleiben in ihrer Reihenfolge.

Schritt 1 — $x^\top \cdot A$ (Zeile × Matrix)

$\mathbf{x}^\top$ ist ein Zeilenvektor ( $1\times2$ ), $\mathbf{A}$ eine $2\times2$ -Matrix. Das Ergebnis ist wieder ein Zeilenvektor ( $1\times2$ ):

\mathbf{x}^\top \mathbf{A} = \begin{pmatrix} x_1 - \tfrac{1}{2}x_2 & -\tfrac{1}{2}x_1 + x_2 \end{pmatrix}

Schritt 2 — $A \cdot y$ (Matrix × Spalte)

$\mathbf{A}$ ist $2\times2$ , $\mathbf{y}$ ein Spaltenvektor ( $2\times1$ ). Das Ergebnis ist wieder ein Spaltenvektor ( $2\times1$ ):

\mathbf{A}\,\mathbf{y} = \begin{pmatrix} y_1 - \tfrac{1}{2}y_2 \\[4pt] -\tfrac{1}{2}y_1 + y_2 \end{pmatrix}

Schritt 3 — finales Skalarprodukt

Der Zeilenvektor aus Schritt 1 wird mit dem Spaltenvektor aus Schritt 2 multipliziert. Das Ergebnis ist ein Skalar:

\mathbf{x}^\top(\mathbf{A}\,\mathbf{y}) = x_1 y_1 - \tfrac{1}{2}\bigl(x_1 y_2 + x_2 y_1\bigr) + x_2 y_2

Dimensionen auf einen Blick

Ausdruck	Format	Ergebnis
$\mathbf{x}^\top$	$1\times2$ (Zeile)	–
$\mathbf{A}$	$2\times2$ (Matrix)	–
$\mathbf{y}$	$2\times1$ (Spalte)	–
$\mathbf{x}^\top \cdot \mathbf{A}$	$1\times2 \cdot 2\times2$	$1\times2$ (Zeile)
$(\mathbf{x}^\top\mathbf{A}) \cdot \mathbf{y}$	$1\times2 \cdot 2\times1$	$1\times1$ (Skalar)
$\mathbf{A} \cdot \mathbf{y}$	$2\times2 \cdot 2\times1$	$2\times1$ (Spalte)
$\mathbf{x}^\top \cdot (\mathbf{A}\,\mathbf{y})$	$1\times2 \cdot 2\times1$	$1\times1$ (Skalar)

Gleichung 3.20 — konkretes Zahlenbeispiel

Wir setzen den konkreten Vektor $\mathbf{x} = \mathbf{y} = (1, 1)^\top$ ein und vergleichen seine Länge unter beiden Skalarprodukten. Der einzige Unterschied ist der Mischterm $-x_1 x_2$ aus 3.19.

Norm unter dem neuen Skalarprodukt (3.19)

\langle \mathbf{x}, \mathbf{x} \rangle = 1^2 - \underbrace{1\cdot 1}_{\text{Mischterm}} + 1^2 = 1 \quad\Longrightarrow\quad \lVert \mathbf{x} \rVert_{\mathbf{A}} = \sqrt{1} = 1

Standard-Punktprodukt zum Vergleich:

\langle \mathbf{x}, \mathbf{x} \rangle = 1^2 + 1^2 = 2 \;\Rightarrow\; \lVert \mathbf{x} \rVert = \sqrt{2} \approx 1{,}41

Derselbe Vektor misst unter 3.19 die Länge $1$ statt $\sqrt{2}$ — er ist also kürzer. Der Mischterm $-x_1 x_2$ zählt hier $-1$ und zieht das Quadrat der Norm von $2$ auf $1$ herunter.

Warum ist er kürzer?

Für $\langle \mathbf{x}, \mathbf{x} \rangle$ setzt man $\mathbf{y} = \mathbf{x}$ :

\langle \mathbf{x}, \mathbf{x} \rangle = x_1^2 - x_1 x_2 + x_2^2 = \bigl(x_1^2 + x_2^2\bigr) - x_1 x_2

Der Mischterm entscheidet, ob die Norm gegenüber dem Standard wächst oder schrumpft:

Fall	Mischterm	Norm vs. Standard
$x_1 x_2 > 0$ (gleiches Vorzeichen)	wird subtrahiert	kleiner (kürzer)
$x_1 x_2 < 0$ (verschiedene Vorzeichen)	wird addiert	größer (länger)

Schlüsselbegriffe

Die wichtigsten Begriffe

Assoziativität

$(\mathbf{x}^\top\mathbf{A})\,\mathbf{y} = \mathbf{x}^\top(\mathbf{A}\,\mathbf{y})$ — die Klammerung ist frei wählbar, die Reihenfolge der Faktoren bleibt erhalten.

Kommutativität

Gilt für Matrizen nicht: $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$ im Allgemeinen.

Spaltenvektoren

$\mathbf{x}$ und $\mathbf{y}$ sind per Konvention Spalten ( $n\times1$ ). $\mathbf{x}^\top$ macht daraus eine Zeile ( $1\times n$ ).

Positiv definit

$\langle \mathbf{x}, \mathbf{x} \rangle > 0$ für alle $\mathbf{x} \neq \mathbf{0}$ — die Voraussetzung für ein gültiges Skalarprodukt.

Quelle: Deisenroth, Faisal & Ong — Mathematics for Machine Learning, Cambridge University Press 2020, Kapitel 3 (mml-book.com).