This problem to the AI4OPT challenge was suggested by prof. Alexander Gasnikov

Curator: Alexandr Lobanov (lobbsasha@mail.ru)

Problem

Let

$$ F(x)=\frac1m\sum_{i=1}^m f_i(x) $$

be a (possibly client‑heterogeneous) objective in a federated setting with a central server and $ m $ clients. Assume smoothness of each $ f_i $ and the interpolation/overparameterization condition: there exists a common minimizer $ x^\star $ such that, for each client $ i $, the stochastic gradient oracle is unbiased and its variance vanishes at the solution (“state‑dependent noise”):

$$ \mathbb{E}[g_i(x,\xi)] = \nabla f_i(x), \qquad \mathbb{E}\left[|g_i(x,\xi)-\nabla f_i(x)|^2\right] \le \nu_i^2(x), \quad \text{with } \nu_i(x^\star)=0 (\text{or small}). $$

In centralized optimization, such assumptions enable accelerated rates with minibatching, zero‑order, or adaptive steps.

Question. To what extent do these acceleration guarantees transfer to federated learning under client sampling, partial participation, and communication constraints?

More concretely, for convex/$ \mu $-strongly convex $ F $ (or under a PL condition), determine whether there exist federated algorithms (first‑order, zero‑order, and/or adaptive) that achieve centralized‑style accelerated convergence in communication rounds, e.g.

  • strongly convex: rounds $ R = \tilde{O}(\sqrt{\kappa},\log(1/\varepsilon)) $,
  • convex: error $ O(1/R^2) $,

    while keeping total oracle calls and communicated bits near‑optimal or prove lower bounds ruling this out under standard FL constraints (non‑IID data, local steps, compression).

Deliverables may include:

  • an algorithm with a provable round complexity under the assumptions above;
  • matching lower bounds;
  • empirical evidence on synthetic and standard FL workloads demonstrating communication‑efficient acceleration in the interpolation regime.

Status and known results

  • Centralized, state‑dependent noise (interpolation).

    Recent work proves accelerated stochastic approximation when the noise depends on the state and vanishes near $ x^\star $.

    Ilandarideva et al. (2023).

  • Zero‑order & saddle‑point under interpolation.

    Acceleration (and gradient‑free methods) has been extended to black‑box optimization and certain saddle‑point settings in the overparameterized regime.

    Lobanov & Gasnikov (2023); Statkevich et al. (2024).

  • Adaptive stepsizes with inexact oracles.

    AdaGrad‑type procedures exhibit universality and can be combined with acceleration/variance‑reduction ideas under inexact, state‑dependent noise.

    Rodomanov, Jiang & Stich (2024).

  • Minibatching + interpolation (single‑machine).

    Near‑optimal stochastic optimization with minibatching in the interpolation setting is known centrally.

    Woodworth & Srebro (NeurIPS 2021).

  • Federated learning: gaps.

    Federated online/bandit convex optimization has been analyzed, but does not cover the interpolation/overparameterized case with vanishing‑at‑optimum noise, nor does it establish accelerated round complexity under these assumptions. Closing this gap is open.

    Patel et al. (ICML 2023).

References

  • S. Ilandarideva et al. Accelerated stochastic approximation with state‑dependent noise. arXiv:2307.01497 (2023).
  • A. Lobanov, A. Gasnikov. Accelerated zero‑order SGD for black‑box optimization under “overparameterization”.In: International Conference on Optimization and Applications, Springer, 2023, pp. 72–83.
  • E. Statkevich et al. Gradient‑free algorithm for saddle point problems under overparametrization. Chaos, Solitons & Fractals 185 (2024), 115048.
  • A. Rodomanov, X. Jiang, S. Stich. Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction. arXiv:2406.06398 (2024).
  • B. E. Woodworth, N. Srebro. An even more optimal stochastic optimization algorithm: minibatching and interpolation learning. NeurIPS 34 (2021), 7333–7345.
  • K. K. Patel et al. Federated online and bandit convex optimization. ICML (PMLR), 2023, 27439–27460.