The Workshop on Functional Inference and Machine Intelligence (FIMI) is an international workshop on machine learning and statistics, with a particular focus on theory, methods, and practice. It consists of invited talks, and poster sessions are also planned. The topics include (but not limited to):

  • Machine Learning Methods
  • Deep Learning
  • Kernel Methods
  • Probabilistic Methods

The workshop will be a virtual (via OnlineConf and Zoom). All schedules are in Japan Standard Time (GMT+9).

For participating, please make a registration from below.

Registration (closed)

If you have already registered, please go to Online Conf.

Previous Workshop: 2016, 2017, 2018, 2019, 2020, 2021

Invited Speakers

  • Murat A. Erdogdu (University of Toronto)
    Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev
  • Kenji Fukumizu (The Institute of Statistical Mathematics)
    A Scaling Law for Synthetic-to-real Transfer Learning
  • Octavian Ganea (Massachusetts Institute of Technology)
    Euclidean Deep Learning Models for 3D Structures and Interactions of Molecules
  • Arthur Gretton (University College London)
    Causal modelling with distribution embeddings: treatment effects, counterfactuals, mediation, and proxies
  • Hideitsu Hino (The Institute of Statistical Mathematics)
    Symplectic integrator via contact geometry for Nesterov-type ODE
  • Yuichi Ike (University of Tokyo)
    Topological loss functions and topological representation learning
  • Masaaki Imaizumi (The University of Tokyo)
    Stability of Deep Network Estimator for Nonparametric Regression with Adversarial Training
  • Masahiro Kato (University of Tokyo, CyberAgent, Inc.)
    Recent Findings on Density-Ratio Approaches in Machine Learning
  • Tengyuan Liang (The University of Chicago)
    Universal Prediction Band, Semi-Definite Programming and Variance Interpolation
  • Song Liu (University of Bristol)
    \(f\)-divergence and Loss Functions in ROC Curve
  • Atsushi Nitanda (Kyushu Institute of Technology)
    Convex Analysis of the Mean Field Langevin Dynamics
  • Taiji Suzuki (University of Tokyo)
    Effect of feature learning ability of neural networks for high dimensional input
  • Seiya Tokui (Preferred Networks / The University of Tokyo)
    Disentanglement Analysis with Partial Information Decomposition
  • Takaharu Yaguchi (Kobe University)
    Geometric Deep Energy-Based Models for Physics
  • Makoto Yamada (Kyoto University)
    Selective inference with Kernels


Tuesday 29th March.

09:55-10:00 Opening

Tengyuan Liang (The University of Chicago)
Title: Universal Prediction Band, Semi-Definite Programming and Variance Interpolation

We propose a computationally efficient method to construct nonparametric, heteroscedastic prediction bands for uncertainty quantification, with or without any user-specified predictive model. Our approach provides an alternative to the now-standard conformal prediction for uncertainty quantification, with novel theoretical insights and computational advantages. The data-adaptive prediction band is universally applicable with minimal distributional assumptions, has strong non-asymptotic coverage properties, and is easy to implement using standard convex programs. Our approach can be viewed as a novel variance interpolation with confidence and further leverages techniques from semi-definite programming and sum-of-squares optimization. Theoretical and numerical performances for the proposed approach for uncertainty quantification are analyzed.

Masaaki Imaizumi (The University of Tokyo)
Title: Stability of Deep Network Estimator for Nonparametric Regression with Adversarial Training

We study the stability of an estimator for the nonparametric regression problem by deep neural networks and adversarial training. Several studies show that deep neural networks give estimators for the nonparametric problem which theoretically outperforms conventional estimators in a specific setting. A limitation of the estimator by deep networks is stability: its convergence is measured by a restricted class of norms. In this study, we consider the adversarial training for deep networks and develop an estimator for the nonparametric regression problem. We investigate its efficiency by the minimax optimization scheme and derive several convergence rates with different norms. We also discuss an application based on the result.
12:15-13:45 Lunch break

Taiji Suzuki (University of Tokyo)
Title: Effect of feature learning ability of neural networks for high dimensional input

In this talk, we discuss the benefit of adaptivity or the feature learning ability of deep learning especially for high dimensional (or even infinite dimensional) input. This talk consists of two parts:
 (1) learning ability of neural network for infinite dimensional input with anisotropic smoothness, and
 (2) feature learning by one step gradient descent in high dimensions.
In the first half, I will discuss the adaptivity of deep learning to the smoothness of the true function. Although the standard nonparametric convergence is exponentially affected by the input dimensionality, we prove that deep learning can avoid the curse of dimensionality when the true function has anisotropic smoothness, that is, it has different smoothness towards different directions. This yields superior performance of DNN compared to linear estimators including kernel methods. Interestingly, this argument can be extended to infinite dimensional input. In the second half, we analyze how gradient descent captures informative features and improves the generalization performance in a two-layer neural network. We show that the internal layer's feature "aligns" to the true function in the first few gradient steps, and precisely characterize the benefit of this alignment in the high-dimensional regime. We show that the ridge estimator on trained features has improved performance, and under large step size, the learned kernel may outperform many random features and rotationally invariant kernel models. This demonstrates that even one gradient step can lead to considerable advantage over the initial features.

Makoto Yamada (Kyoto University)
Title: Selective inference with Kernels

Finding a set of statistically significant features from complex data (e.g., nonlinear and/or multi-dimensional output data) is important for scientific discovery and has many practical applications, including biomarker discovery. In this talk, I introduce kernel-based selective inference frameworks that can be used to find a set of statistically significant features from non-linearly related data without splitting the data for selection and inference. Specifically, I introduce a selective variant of hypothesis testing framework based on post selection inference: two sample test with Maximum Mean Discrepancy (MMD), an independence test with Hilbert-Schmidt Independence Criterion (HSIC), a goodness of fit with Kernel Stein Discrepancy (KSD). For example, in the selective independence test, we propose the hsicInf algorithm, which can handle non-linearity and/or multi-variate/multi-class outputs through kernels. Then, I show applications of kernel-based selective inference algorithms and discuss potential future work. The talk will be an overview of our recent ICML, NeurIPS, and AISTATS publications.

Seiya Tokui (Preferred Networks / The University of Tokyo)
Title: Disentanglement Analysis with Partial Information Decomposition

Disentangled representation learning is an approach to recovering the underlying factors of variation in data. While the concept is intuitive, it is far from obvious to measure how a learned representation disentangles the factors. Recently, several metrics have been proposed, which compare how each variable explains a generative factor. These metrics, however, may fail to detect entanglement that involves more than two variables, e.g., representations that duplicate and rotate generative factors in high dimensional spaces. In this talk, we introduce a framework and a metric to analyze information sharing in a multivariate representation with Partial Information Decomposition. The framework enables us to understand disentanglement in terms of uniqueness, redundancy, and synergy. We design entanglement attacks to inject multi-variable entanglement to representations and show that our framework correctly responds to entanglement. We show, through experiments on variational autoencoders, that models with similar disentanglement scores have a variety of characteristics in entanglement, for each of which a distinct strategy may be required to obtain a disentangled representation.

Arthur Gretton (University College London)
Title: Causal modelling with distribution embeddings: treatment effects, counterfactuals, mediation, and proxies

A fundamental causal modelling task is to predict the effect of an intervention (or treatment) \(D=d\) on outcome \(Y\) in the presence of observed covariates \(X\). We can obtain an average treatment effect by marginalising our estimate \(\gamma(X,D)\) of the conditional mean \(E(Y|X,D)\) over \(P(X)\). More complex causal questions require taking conditional expectations. For instance, the average treatment on the treated (ATT) addresses a counterfactual: what is the outcome of an intervention \(d'\) on a subpopulation that received treatment \(d\)? In this case, we must marginalise \(\gamma\) over the conditional distribution \(P(X\backslash d)\), which becomes challenging for continuous multivariate \(d\). Many additional causal questions require us to marginalise over conditional distributions, including Conditional ATE, mediation analysis, dynamic treatment effects, and correction for unobserved confounders using proxy variables. We address these questions in the nonparametric setting using kernel methods, which apply for very general treatments \(D\) and covariates \(X\) (learned NN features may also be used). We perform marginalization over conditional distributions using conditional mean embeddings, in a generalization of two-stage least-squares regression. We provide strong statistical guarantees under general smoothness assumptions, and a straightforward and robust implementation (a few lines of code). The method is mostly demonstrated by addressing causal modelling questions arising from the US Job Corps program for Disadvantaged Youth.
20:00-21:30 Poster Session

Wednesday 30th March.


Murat A. Erdogdu (University of Toronto)
Title: Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev

We study sampling from a target distribution \(e^{-f}\) using the Langevin Monte Carlo (LMC) algorithm. For any potential function \(f\) whose tails behave like \(|x|^\alpha\) for \(\alpha \in [1,2]\), and has \(\beta\)-Hölder continuous gradient, we derive the sufficient number of steps to reach the \(\varepsilon\)-neighborhood of a \(d\)-dimensional target distribution as a function of \(\alpha\) and \(\beta\) in Rényi divergence. Our result is the first convergence guarantee for LMC under a functional inequality interpolating between the Poincaré and log-Sobolev settings (also covering the edge cases).

Atsushi Nitanda (Kyushu Institute of Technology)
Title: Convex Analysis of the Mean Field Langevin Dynamics

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this talk, we give a concise convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings (this result in continuous time setting was also given in the concurrent and independent work [Chizat (2022)]). The key ingredient of our proof is a proximal Gibbs distribution associated with the dynamics, which, in combination with techniques in [Vempala and Wibisono (2019)], allows us to develop a simple convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that the proximal Gibbs distribution connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence.
12:15-15:00 Lunch break

Yuichi Ike (University of Tokyo)
Title: Topological loss functions and topological representation learning

Topological data analysis (TDA) is the branch of data science that aims to extract the topological information of given data. It compactly encodes the topological features into persistence diagrams, which are multisets in the two-dimensional space. In connection with machine learning, many techniques have been developed to incorporate persistence diagrams into loss functions for controlling the topology of parameters. In this talk, I discuss several recent developments of TDA-based loss functions and a theoretical guarantee for the convergence of such functions with respect to stochastic subgradient descent. I also talk about some attempts to construct a neural network to estimate (vectorization of) persistence diagrams from data.

Masahiro Kato (University of Tokyo, CyberAgent, Inc.)
Title: Recent Findings on Density-Ratio Approaches in Machine Learning

Approaches using density ratio functions play an important role in various areas of machine learning, including divergence among probability measures, anomaly detection, causal inference, and multi-armed bandit problems. In this talk, I present our recent theoretical and methodological findings related to density ratios. First, I report that the maximum likelihood estimation of density ratios can be interpreted as the computation of integral probability metrics (IPMs). Based on this finding, we propose the density-ratio metrics (DRMs) as new divergences, which bridge the Kullback-Leibler divergence and integral probability metrics. This finding also gives some insights into the estimation procedure of density ratios, such as the necessity of smoothness penalties. Next, I introduce a new causal inference method by applying density ratio estimation. In particular, I focus on nonparametric structural model estimation under conditional moment restrictions and find that we can solve this problem through approximation of the conditional moment restrictions using an estimated density ratio. Finally, I introduce a new density-ratio-based perspective on the best-arm identification (BAI) problem. In this study, we propose novel large deviation principles and develop an asymptotically optimal BAI strategy.

Song Liu (University of Bristol)
Title: \(f\)-divergence and Loss Functions in ROC Curve

Given two data distributions and a test score function, the Receiver Operating Characteristic (ROC) curve shows how well such a score separates two distributions. However, can the ROC curve be used as a measure of discrepancy between two distributions? This paper shows that when the data likelihood ratio is used as the test score, the arc length of the ROC curve gives rise to a novel \(f\)-divergence measuring the differences between two data distributions. Approximating this arc length using a variational objective and empirical samples leads to empirical risk minimization with previously unknown loss functions. We provide a Lagrangian dual objective and introduce kernel models into the estimation problem. We study the non-parametric convergence rate of this estimator and show under mild smoothness conditions of the real arctangent density ratio function, the rate of convergence is \(Op(n-\beta/4)\) (\(\beta\in(0,1]\) depends on the smoothness).
20:00-21:30 Poster Session

Wednesday 31th March.


Octavian Ganea (Massachusetts Institute of Technology)
Title: Euclidean Deep Learning Models for 3D Structures and Interactions of Molecules

Understanding 3D structures and interactions of biological nano-machines, such as proteins or drug-like molecules, is crucial for assisting drug and therapeutics discovery. A core problem is molecular docking, i.e., determining how two proteins or a protein and a drug-molecule attach and create a molecular complex. Having access to very fast computational docking tools would enable applications such as fast virtual search for drugs inhibiting disease proteins, in silico molecular design, or rapid drug side-effect prediction. However, existing computer models follow a very time-consuming strategy of sampling a large number (e.g., millions) of molecular complex candidates, followed by scoring, ranking, and fine-tuning steps. In this talk, I will show that geometry and deep learning (DL) can significantly reduce the enormous search space associated with the docking and molecular conformation problems. I will present my recent DL architectures, EquiDock and EquiBind, that perform a direct shot prediction of the molecular complex, and GeoMol, that models molecular flexibility. I will argue that the governing laws of geometry, physics, or chemistry that naturally constrain these 3D structures should be incorporated in DL solutions in a mathematically meaningful way. I will explain our key modeling concepts such as \(SE(3)\)-equivariant graph matching networks, attention keypoint sets, optimal transport for binding pocket prediction, and torsion angle neural networks. These approaches reduce the inference runtimes of open-source or commercial software from tens of minutes or hours to a few seconds, while being competitive or better in terms of quality. Finally, I will highlight a number of exciting on-going and future efforts in the space of artificial intelligence for structural biology and chemistry.

Yang Li (University of Tokyo)
Title: Toward robust non-rigid reconstruction with deep learning

4D reconstruction of non-rigidly deforming scenes has numerous applications in computer vision, virtual/augmented reality, robotics, etc. With the latest advancements of consumer-level depth sensors, such as Microsoft Kinect, Intel RealSense, and even smartphone-mounted cameras, non-rigid reconstruction using a single RGB-D camera has gained momentum. However, due to the high complexity and non-convexity of the problem and the limitation of range sensors, a robust reconstruction system for generic non-rigidly deforming scenes remains a challenge. We propose learning based methods that improve the robustness of non-rigid reconstruction under unconstrained environments:
 1) a learning-based optimization to alleviate the nonconvexity of the no-rigid tracking.
 2) a divide and conquer strategy that can handle the non-rigid reconstruction of complex scenes with both foreground and background objects.
 3) a completion approach to jointly recover the occluded structure and motion from partial RGB-D sensor observation.
 4) a robust feature matching approach that provides reliable landmarks for global motion localization.
Empirical experiments demonstrate the robustness and advantage of our approches over existing methods.
12:15-15:00 Lunch break

Takaharu Yaguchi (Kobe University)
Title: Geometric Deep Energy-Based Models for Physics

Many physical phenomena can be described by energy-based models such as the Hamilton equation and phase-field models, which admit the energy conservation or dissipation laws. Recently, methods to construct such models from observed data using deep learning have been attracting much attention. In this talk, we introduce discrete-time deep-learning models that preserve the energy conservation or dissipation laws. In addition, most existing methods for the Hamilton equation assume that the data are represented in canonical coordinates. However, this coordinate system is generally unknown, and this has been an obstacle for application to real problems. We also introduce a geometric approach to address this problem.

Hideitsu Hino (The Institute of Statistical Mathematics)
Title: Symplectic integrator via contact geometry for Nesterov-type ODE

We derive an explicit stable integrator based on symplectic and contact geometries for a non-autonomous ordinarily differential equation (ODE), which was found in improving the convergence rate of Nesterov’s accelerated gradient method. A previously investigated non-autonomous ODE is found to be able to be written as a contact Hamiltonian system. Then, by developing and applying a symplectization of a non-autonomous contact Hamiltonian vector field expressing the non-autonomous ODE, a symplectic integrator is derived. Because the proposed symplectic integrators preserve hidden symplectic and contact structures in the ODE, they should be more stable than the standard Runge?Kutta method. Numerical experiments demonstrate that, as expected, the second-order symplectic integrator is stable and high convergence rates are achieved. This work is done in collaboration with Prof. Shin-itiro Goto.

Kenji Fukumizu (The Institute of Statistical Mathematics)
Title: A Scaling Law for Synthetic-to-real Transfer Learning

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images.
19:00-21:00 Social Networking (@gather.town)

Poster session

FIMI2022 includes poster sessions. As the workshop will be held online, we do not stick to the poster format, although it is named a poster session. Instead, we use pre-recorded videos and live Q&As. Pre-recorded videos is now available at the online platform Online Conf.

Poster presentators:

  • Ian Gallagher; Andrew Jones; Patrick Rubin-Delanchy (University of Bristol)
    Spectral embedding for dynamic networks with stability guarantees
  • Pierre Glaser; Michael Arbel; Arthur Gretton (University College London)
    KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support
  • Annie Gray; Nick Whiteley; Patrick Rubin-Delanchy (University of Bristol)
    Matrix factorisation and the interpretation of geodesic distance
  • Hikaru Ibayashi (University of Southern California); Masaaki Imaizumi (University of Tokyo)
    Exponential escape efficiency of SGD from sharp minima in non-stationary regime
  • Yuri Kinoshita; Taiji Suzuki (The University of Tokyo)
    Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
  • Yoh-ichi Mototake; Kenji Fukumizu (ISM)
    Extracting interpretable physical information from time series data using deep neural networks
  • Hironori Murase; Kenji Fukumizu (ISM)
    Anomaly Detection by Generating Pseudo Anomalous Data via Latent Variables
  • Naoki Nishikawa; Taiji Suzuki (University of Tokyo)
    Mean-Field Two Layer Neural Network with Infinite Dimensional Input and Its Optimization
  • Akifumi Okuno (ISM, RIKEN AIP); Keisuke Yano (ISM)
    A generalization gap estimation for overparameterized models via the Langevin functional variance
  • Dominic Owens; Haeran Cho (University of Bristol)
    High-Dimensional Data Segmentation Under a Sparse Regression Model
  • Antonin Schrab (University College London); Arthur Gretton; Benjamin Guedj; Ilmun Kim; Melisande Albert; Beatrice Laurent
    MMD Aggregated Two-Sample Test & KSD Aggregated Goodness-of-fit Test
  • Jack Simons; Song Liu; Mark Beaumont (University of Bristol)
    Variational Likelihood-Free Gradient Descent
  • Yuto Tanimoto; Kenji Fukumizu (ISM)
    Multi-armed bandits with rebounding rewards
  • Shoji Toyota (ISM)
    Invariance Learning based on Label Hierarchy
  • Mingxuan Yi (University of Bristol)
    Sliced Wasserstein Variational Inference
  • Daniel James Williams; Song Liu (University of Bristol)
    Kernelised Stein Discrepancies for Truncated Probability Estimation
  • Pengzhou Abel Wu; Kenji Fukumizu (ISM)


  • Masaaki Imaizumi, The University of Tokyo
  • Taiji Suzuki, The University of Tokyo
  • Kenji Fukumizu, The Institute of Statistical Mathematics
  • Tatsuya Harada, The University of Tokyo
  • Song Liu, University of Bristol
  • Hideto Nakashima, The Institute of Statistical Mathematics


This workshop is supported by the following institution and grant:

  • Research Center for Statistical Machine Learning, The Institute of Statistical Mathematics
  • Japan Science and Technology Agency, CREST
    • "Innovation of Deep Structured Models with Representation of Mathematical Intelligence" in "Creating information utilization platform by integrating mathematical and information sciences, and development to society"
  • The University of Tokyo
  • Access Information

    We use a virtual online conference platform Online Conf. If you have not registered yet, please make a registration from here.

    Contact: fimi2022org@gmail.com