Strong error analysis for stochastic gradient descent optimization algorithms

Arnulf Jentzen; Benno Kuckuck; Ariel Neufeld; Philippe Von Wurstemberger

doi:10.1093/imanum/drz055

Strong error analysis for stochastic gradient descent optimization algorithms

Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld^*, Philippe Von Wurstemberger

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

23 Citations (Scopus)

Abstract

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small \varepsilon \in (0,\infty) and every arbitrarily large p{\,\in\,} (0,\infty) that the considered SGD optimization algorithm converges in the strong L^p-sense with order 1/2-\varepsilon to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large p \in (0,\infty) strong L^p -convergence rates.

Original language	English
Pages (from-to)	455-492
Number of pages	38
Journal	IMA Journal of Numerical Analysis
Volume	41
Issue number	1
DOIs	https://doi.org/10.1093/imanum/drz055
Publication status	Published - Jan 1 2021
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020 The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.

ASJC Scopus Subject Areas

General Mathematics
Computational Mathematics
Applied Mathematics

Keywords

Stochastic approximation algorithms
Stochastic gradient descent
Strong error analysis

Access to Document

10.1093/imanum/drz055

Cite this

@article{78a9c6a81fa04d669358ca1de4374725,

title = "Strong error analysis for stochastic gradient descent optimization algorithms",

abstract = "Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small \textbackslash{}varepsilon \textbackslash{}in (0,\textbackslash{}infty) and every arbitrarily large p\{\textbackslash{},\textbackslash{}in\textbackslash{},\} (0,\textbackslash{}infty) that the considered SGD optimization algorithm converges in the strong L\textasciicircum{}p-sense with order 1/2-\textbackslash{}varepsilon to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large p \textbackslash{}in (0,\textbackslash{}infty) strong L\textasciicircum{}p -convergence rates.",

keywords = "Stochastic approximation algorithms, Stochastic gradient descent, Strong error analysis",

author = "Arnulf Jentzen and Benno Kuckuck and Ariel Neufeld and \{Von Wurstemberger\}, Philippe",

year = "2021",

month = jan,

day = "1",

doi = "10.1093/imanum/drz055",

language = "English",

volume = "41",

pages = "455--492",

journal = "IMA Journal of Numerical Analysis",

issn = "0272-4979",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Strong error analysis for stochastic gradient descent optimization algorithms

AU - Jentzen, Arnulf

AU - Kuckuck, Benno

AU - Neufeld, Ariel

AU - Von Wurstemberger, Philippe

PY - 2021/1/1

Y1 - 2021/1/1

N2 - Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small \varepsilon \in (0,\infty) and every arbitrarily large p{\,\in\,} (0,\infty) that the considered SGD optimization algorithm converges in the strong L^p-sense with order 1/2-\varepsilon to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large p \in (0,\infty) strong L^p -convergence rates.

AB - Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small \varepsilon \in (0,\infty) and every arbitrarily large p{\,\in\,} (0,\infty) that the considered SGD optimization algorithm converges in the strong L^p-sense with order 1/2-\varepsilon to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large p \in (0,\infty) strong L^p -convergence rates.

KW - Stochastic approximation algorithms

KW - Stochastic gradient descent

KW - Strong error analysis

UR - http://www.scopus.com/inward/record.url?scp=85118186225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118186225&partnerID=8YFLogxK

U2 - 10.1093/imanum/drz055

DO - 10.1093/imanum/drz055

M3 - Article

AN - SCOPUS:85118186225

SN - 0272-4979

VL - 41

SP - 455

EP - 492

JO - IMA Journal of Numerical Analysis

JF - IMA Journal of Numerical Analysis

IS - 1

ER -

Strong error analysis for stochastic gradient descent optimization algorithms

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this