Award Abstract # 2023505
Foundations of Data Science Institute
NSF Org: |
DMS
Division Of Mathematical Sciences
|
Recipient: |
REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
|
Initial Amendment Date:
|
August 31, 2020 |
Latest Amendment Date:
|
August 16, 2024 |
Award Number: |
2023505 |
Award Instrument: |
Continuing Grant |
Program Manager: |
Stacey Levine
slevine@nsf.gov
(703)292-2948
DMS
Division Of Mathematical Sciences
MPS
Directorate for Mathematical and Physical Sciences
|
Start Date: |
September 1, 2020 |
End Date: |
August 31, 2026 (Estimated) |
Total Intended Award
Amount: |
$5,900,308.00 |
Total Awarded Amount to
Date: |
$5,900,308.00 |
Funds Obligated to Date:
|
FY 2020 = $1,180,062.00
FY 2021 = $1,180,062.00
FY 2022 = $1,380,062.00
FY 2023 = $980,062.00
FY 2024 = $1,180,060.00
|
History of Investigator:
|
-
Peter
Bartlett
(Principal Investigator)
bartlett@stat.berkeley.edu
-
Bin
Yu
(Co-Principal Investigator)
-
Michael
Jordan
(Co-Principal Investigator)
-
Martin
Wainwright
(Co-Principal Investigator)
-
Josh
Hug
(Co-Principal Investigator)
|
Recipient Sponsored Research
Office: |
University of California-Berkeley
1608 4TH ST STE 201
BERKELEY
CA
US
94710-1749
(510)643-3891
|
Sponsor Congressional
District: |
12
|
Primary Place of
Performance: |
University of California-Berkeley
Sponsored Projects Office
Berkeley
CA
US
94710-1749
|
Primary Place of
Performance Congressional District: |
12
|
Unique Entity Identifier
(UEI): |
GS3YEVSS12N6
|
Parent UEI: |
|
NSF Program(s): |
TRIPODS Transdisciplinary Rese
|
Primary Program Source:
|
01002021DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT
01002425DB NSF RESEARCH & RELATED ACTIVIT
|
Program Reference
Code(s): |
048Z,
075Z,
079Z
|
Program Element Code(s):
|
041Y00
|
Award Agency Code: |
4900
|
Fund Agency Code: |
4900
|
Assistance Listing
Number(s): |
47.049, 47.070
|
ABSTRACT

The Foundations of Data Science Institute (FODSI) brings together a large and diverse team of researchers and educators from UC Berkeley, MIT, Boston University, Bryn Mawr College, Harvard University, Howard University, and Northeastern University, with the aim of advancing the theoretical foundations for the field of data science. Data science has emerged as a central science for the 21st century, a widespread approach to science and technology that exploits the explosion in the availability of data to allow empirical investigations at unprecedented scale and scope. It now plays a central role in diverse domains across all of science, commerce and industry. The development of theoretical foundations for principled approaches to data science is particularly challenging because it requires progress across the full breadth of scientific issues that arise in the rich and complex processes by which data can be used to make decisions. These issues include the specification of the goals of data analysis, the development of models that aim to capture the way in which data may have arisen, the crafting of algorithms that are responsive to the models and goals, an understanding of the impact of misspecifications of these models and goals, an understanding of the effects of interactions, interventions and feedback mechanisms that affect the data and the interpretation of the results, concern about the uncertainty of these results, an understanding of the impact of other decision-makers with competing goals, and concern about the economic, social, and ethical implications of automated data analysis and decision-making. To address these challenges, FODSI brings together experts from many cognate academic disciplines, including computer science, statistics, mathematics, electrical engineering, and economics. Institute research outcomes have strong potential to directly impact the many application domains for data science in industry, commerce, science and society, facilitated by mechanisms that directly involve a stream of institute-trained personnel in industrial partners' projects, and by public activities designed to nurture substantive interactions between foundational and use-inspired research communities in data science. The institute also aims to educate and mentor future leaders in data science, through the further development of a pioneering undergraduate program in data science, and by training a diverse cohort of graduate students and postdocs with an innovative approach that emphasizes strong mentorship, flexibility, and breadth of collaboration opportunities. In addition, the institute plans to host an annual summer school that will deliver core curriculum and a taste of foundational research to a diverse group of advanced undergraduates, graduate students, and postdocs. It aims to broaden participation and increase diversity in the data science workforce, bringing the excitement of data science to under-represented groups at the high school level, and targeting diverse participation in the institute's public activities. And it will act as a nexus for research and education in the foundations of data science, by convening public events, such as summer schools and research workshops and other collaborative research opportunities, and by providing models for education, human resource development, and broadening participation.
The scientific focus of the institute will encompass the full range of issues that arise in data science -- modeling issues, inferential issues, computational issues, and societal issues ? and the challenges that emerge from the conflicts between their competing requirements. Its research agenda is organized around eight themes. Three of these themes focus on key challenges arising from the rich variety of interactions between a decision maker and its environment, not only the classical view of data that is processed in a batch or a stream, but also sequential interactions with feedback (the control perspective), experimental interactions designed to answer "what if" questions (the causality perspective), and strategic interactions involving other actors with conflicting goals (the economic perspective). The other research themes focus on opportunities for major impacts across disciplinary boundaries: on elucidating the algorithmic landscape of statistical problems, and in particular the computational complexity of statistical estimation problems, on sketching, sampling, and sub-linear time algorithms designed to address issues of scalability in data science problems; on exploiting statistical methodology in the service of algorithms; and on using breakthroughs in applied mathematics to address computational and inferential challenges. Intellectual contributions to societal issues in data science will feature throughout this set of themes. The institute will exploit strong connections with its scientific and industrial partners to ensure that these research directions enjoy a rich engagement with a broad range of commercial, technological and scientific application domains. Its sequence of research workshops and a collaborative research program will serve the broader research community by nurturing additional research in these key challenge areas. The institute will be led by a steering committee that will seek the help of an external advisory board to prioritize its research themes and activities throughout its lifetime. Its educational programs will include curriculum development from K-12 through undergraduate, a graduate level visit program, and a postdoc training model, aimed at empowering the next generation of leaders to fluidly work across conventional disciplinary boundaries while being mindful of the full set of scientific issues. The institute will undertake a multi-pronged effort to recruit, engage and support the full range of groups traditionally under-represented in mathematics, computer science and statistics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
(Showing: 1 - 10 of 76)
(Showing: 1 - 76 of 76)
Agarwal, Abhineet and Kenney, Ana M. and Tan, Yan Shuo and Tang, Tiffany M. and Yu, Bin
"MDI+: A Flexible Random Forest-Based Feature Importance Framework"
arXivorg
, 2023
Citation
Details
Agarwal, Abhineet and Tan, Yan Shuo and Ronen, Omer and Singh, Chandan and Yu, Bin
"Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods"
Proceedings of Machine Learning Research
, 2022
Citation
Details
Angelopoulos, Anastasios N and Kohli, Amit P and Bates, Stephen and Jordan, Michael I and Malik, Jitendra and Alshaabi, Thayer and Upadhyayula, Srigokul and Romano, Yaniv
"Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging"
International Conference on Machine Learning
, 2022
Citation
Details
Assos, Angelos and Attias, Idan and Dagan, Yuval and Daskalakis, Constantinos and Fishelson, Maxwell
"Online Learning and Solving Infinite Games with an ERM Oracle"
COLT 2023
, 2023
Citation
Details
Bartlett, Peter L. and Bubeck, Sebastien and Cherapanamjeri, Yeshwanth
"Adversarial Examples in Multi-Layer Random ReLU Networks"
Advances in Neural Information Processing Systems
, v.34
, 2021
Citation
Details
Bartlett, Peter L. and Indyk, Piotr and Wagner, Tal
"Generalization Bounds for Data-Driven Numerical Linear Algebra"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation
Details
Bartlett, Peter L and Long, Philip M and Bousquet, Olivier
"The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima"
Journal of machine learning research
, v.24
, 2023
Citation
Details
Bartlett, Peter L. and Montanari, Andrea and Rakhlin, Alexander
"Deep learning: a statistical viewpoint"
Acta numerica
, 2021
Citation
Details
Braverman, Mark and Derakhshan, Mahsa and Lovett, Antonio Molina
"Max-Weight Online Stochastic Matching: Improved Approximations Against the Online Benchmark"
23rd ACM Conference on economics and Computation
, 2022
https://doi.org/10.1145/3490486.3538315
Citation
Details
Chatterji, Niladri and Pacchiano, Aldo and Bartlett, Peter L. and Jordan, Michael I.
"On the Theory of Reinforcement Learning with Once-per-Episode Feedback"
Advances in neural information processing systems
, v.34
, 2021
Citation
Details
Chatterji, Niladri S. and Bartlett, Peter L. and Long, Philip M.
"Oracle lower bounds for stochastic gradient sampling algorithms"
Bernoulli
, v.28
, 2022
Citation
Details
Chatterji, Niladri S. and Long, Philip M. and Bartlett, Peter L.
"The interplay between implicit bias and benign overfitting in two-layer linear networks"
Journal of machine learning research
, 2022
Citation
Details
Chatterji, Niladri S. and Long, Philip M. and Bartlett, Peter L.
"When does gradient descent with logistic loss find interpolating two-layer networks?"
Journal of machine learning research
, v.22
, 2021
Citation
Details
Cherapanamjeri, Yeshwanth and Daskalakis, Constantinos and Ilyas, Andrew and Zampetakis, Manolis
"Estimation of Standard Auction Models"
23rd ACM Conference on Economics and Computation
, 2022
https://doi.org/10.1145/3490486.3538284
Citation
Details
Cherapanamjeri, Yeshwanth and Tripuraneni, Nilesh and Bartlett, Peter L. and Jordan, Michael I.
"Optimal Mean Estimation without a Variance"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation
Details
Dagan, Yuval and Daskalakis, Constantinos and Fishelson, Maxwell and Golowich, Noah
"From External to Swap Regret 2.0: An Efficient Reduction for Large Action Spaces"
, 2024
Citation
Details
Daras, Giannis and Dagan, Yuval and Dimakis, Alex and Daskalakis, Constantinos
"Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent"
, 2023
Citation
Details
Daras, Giannis and Shah, Kulin and Dagan, Yuval and Gollakota, Aravind and Dimakis, Alex and Klivans, Adam R
"Ambient Diffusion: Learning Clean Distributions from Corrupted Data"
, 2023
Citation
Details
Daskalakis, Constantinos and Skoulakis, Stratis and Zampetakis, Manolis
"The Complexity of Constrained Min-Max Optimization"
Proceedings of the Annual ACM Symposium on Theory of Computing
, v.53
, 2021
https://doi.org/10.1145/3406325.3451125
Citation
Details
Daskalakis, Constantinos and Stefanou, Patroklos and Yao, Rui and Zampetakis, Manolis
"Efficient Truncated Linear Regression with Unknown Noise Variance"
Advances in neural information processing systems
, 2021
Citation
Details
Deng, Yuyang and Qiao, Mingda
"Collaborative Learning with Different Labeling Functions"
, 2024
Citation
Details
Duncan, James and Kapoor, Rush and Agarwal, Abhineet and Singh, Chandan and Yu, Bin
"VeridicalFlow: a Python package for building trustworthy data science pipelines with PCS"
Journal of Open Source Software
, v.7
, 2022
https://doi.org/10.21105/joss.03895
Citation
Details
Fannjiang, Clara and Bates, Stephen and Angelopoulos, Anastasios N. and Listgarten, Jennifer and Jordan, Michael I.
"Conformal prediction for the design problem"
Proceedings of the National Academy of Sciences of the United States of America
, 2022
Citation
Details
Frei, Spencer and Chatterji, Niladri and Bartlett, Peter L.
"Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation
Details
Frei, Spencer and Vardi, Gal and Bartlett, Peter L and Srebro, Nathan
"The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in {ReLU} Networks"
Advances in neural information processing systems
, v.36
, 2023
Citation
Details
Frei, Spencer and Vardi, Gal and Bartlett, Peter L. and Srebro, Nathan and Hu, Wei
"Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data"
Proceedings of ICLR 2023
, 2023
Citation
Details
Goel, Gautam and Bartlett, Peter
"Can a Transformer Represent a Kalman Filter?"
, 2024
Citation
Details
Guo, Chenghao and Chen, Xi and Vlatakis-Gkaragkounis, Emmanouil V and Yannakakis, Mihalis
"Smoothed Complexity of SWAP in Local Graph Partitioning"
, 2024
Citation
Details
Guo, W.
"No-Regret Learning in Partially-Informed Auctions"
International Conference on Machine Learning
, 2022
Citation
Details
Guo, Wenshuo and Jordan, Michael and Zampetakis, Emmanouil
"Robust Learning of Optimal Auctions"
Advances in neural information processing systems
, 2021
Citation
Details
Ha, Wooseok and Singh, Chandan and Lanusse, Francois and Upadhyayula, Srigokul and Yu, Bin
"Adaptive wavelet distillation from neural networks through interpretations"
Advances in neural information processing systems
, 2021
Citation
Details
Hayou, Soufiane and Ghosh, Nikhil and Yu, Bin
"LoRA+: Efficient Low Rank Adaptation of Large Models"
, 2024
Citation
Details
Hayou, Soufiane and Ghosh, Nikhil and Yu, Bin
"The Impact of Initialization on LoRA Finetuning Dynamics"
, 2024
Citation
Details
Hsu, Aliyah and Cherapanamjeri, Yeshwanth and Park, Briton and Naumann, Tristan and Odisho, Anobel and Yu, Bin
"Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-making"
, 2024
Citation
Details
Hsu, Aliyah R. and Cherapanamjeri, Yeshwanth and Park, Briton and Naumann, Tristan and Odisho Anobel Y. and Yu, Bin
"An investigation into the effects of pre-training data distributions for pathology report classification"
arXivorg
, 2023
Citation
Details
Kandiros, Vardis and Daskalakis, Constantinos and Dagan, Yuval and Choo, Davin
"Learning and Testing Latent-Tree Ising Models Efficiently"
COLT 2023
, 2023
Citation
Details
Li, Aaron and Netzorg, Robin and Cheng, Zhihan and Zhang, Zhuoqin and Yu, Bin
"Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining"
, 2024
Citation
Details
Liu, Z. and Lu, M. and Wang, Z. and Jordan, M. I. and Yang, Z.
"Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy"
International Conference on Machine Learning
, 2022
Citation
Details
Mallinar, Neil and Zane, Austin and Frei, Spencer and Yu, Bin
"Minimum-Norm Interpolation Under Covariate Shift"
, 2024
Citation
Details
Michael I. Jordan and Guy Kornowski and Tianyi Lin and Ohad Shamir and Manolis Zampetakis
"Deterministic Nonsmooth Nonconvex Optimization"
Conference on Learning Theory
, 2023
Citation
Details
Michael I. Jordan and Tianyi Lin and Emmanouil V. Vlatakis-Gkaragkounis
"First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces"
Advances in neural information processing systems
, 2023
Citation
Details
Mou, Wenlong and Flammarion, Nicolas and Wainwright, Martin J. and Bartlett, Peter L.
"Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity"
Bernoulli
, v.28
, 2022
https://doi.org/10.3150/21-BEJ1343
Citation
Details
Mou, Wenlong and Ho, Nhat and Wainwright, Martin and Bartlett, Peter L and Jordan, Michael
"A Diffusion Process Perspective on Posterior Contraction Rates for Parameters"
SIAM Journal on Mathematics of Data Science
, v.6
, 2024
https://doi.org/10.1137/22M1516038
Citation
Details
Mou, Wenlong and Pananjady, Ashwin and Wainwright, Martin J. and Bartlett, Peter L.
"Optimal and instance-dependent guarantees for Markovian linear stochastic approximation"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation
Details
Muehlebach, Michael and Jordan, Michael I
"On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems"
ArXivorg
, 2021
Citation
Details
Nasseri, Keyan and Singh, Chandan and Duncan, James and Kornblith, Aaron and Yu, Bin
"Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data"
ArXivorg
, 2022
Citation
Details
Nika Haghtalab and Michael I. Jordan and Eric Zhao
"A Unifying Perspective on Multi-Calibration: Unleashing Game Dynamics for Multi-Objective Learning*"
arXivorg
, 2023
Citation
Details
Nikhil Ghosh, Song Mei
"The Three Stages of Learning Dynamics in High-dimensional Kernel Methods"
ArXivorg
, 2021
Citation
Details
Pacchiano, Aldo and Bartlett, Peter L. and Jordan, Michael I.
"An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit"
Proceedings of the 34th International Conference on Algorithmic Learning Theory
, 2023
Citation
Details
Pacchiano, Aldo and Ghavamzadeh, Mohammad and Bartlett, Peter L. and Jiang, Heinrich
"Stochastic Bandits with Linear Constraints"
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
, v.130
, 2021
Citation
Details
Pacchiano, Aldo and Lee, Jonathan and Bartlett, Peter L. and Nachum, Ofir
"Near Optimal Policy Optimization via REPS"
Advances in neural information processing systems
, v.34
, 2021
Citation
Details
Perdomo, Juan and Krishnamurthy, Akshay and Bartlett, Peter L. and Kakade, Sham
"A Sharp Characterization of Linear Estimators for Offline Policy Evaluation"
Journal of machine learning research
, 2023
Citation
Details
Perdomo, Juan and Simchowitz, Max and Agarwal, Alekh and Bartlett, Peter L.
"Towards a Dimension-Free Understanding of Adaptive Linear Control"
Proceedings of the 34th Conference on Learning Theory (COLT2021)
, 2021
Citation
Details
Qiao, Mingda and Zheng, Letian
"On the Distance from Calibration in Sequential Prediction"
, 2024
Citation
Details
Ronen, Omer and Humayun, Ahmed and Balestriero, Randall and Baraniuk, Richard and Yu, Bin
"ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks"
, 2024
Citation
Details
Sakos, Iosif and Vlatakis-Gkaragkounis, Emmanouil-Vasileios and Mertikopoulos, Panayotis and Piliouras, Georgios
"Exploiting hidden structures in non-convex games for convergence to Nash equilibrium"
Advances in neural information processing systems
, 2023
Citation
Details
Shen, Dennis and Ding, Peng and Sekhon, Jasjeet and Yu, Bin
"Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data"
arXivorg
, 2022
Citation
Details
Singh, Chandan and Ha, Wooseok and Yu, Bin
"Interpreting and Improving Deep-Learning Models with Reality Checks"
Lecture notes in computer science
, 2022
https://doi.org/10.1007/978-3-031-04083-2_12
Citation
Details
Sun, Liwen and Agawal, Abhineet and Kornblith, Aaron and Yu, Bin and Xiong, Chenyan
"ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance"
, 2024
Citation
Details
Tan, Yan Shuo and Agarwal, Abhineet and Yu, Bin
"A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds"
Proeedings of the International Workshop on Artificial Intelligence and Statistics
, 2022
Citation
Details
Tan, Yan_Shuo and Ronen, Omer and Saarinen, Theo and Yu, Bin
"The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis"
, 2024
Citation
Details
Tan, Yan Shuo and Singh, Chandan and Nasseri, Keyan and Agarwal, Abhineet and Duncan, James and Ronen, Omer and Epland, Matthew and Kornblith, Aaron and Yu, Bin
"Fast Interpretable Greedy-Tree Sums (FIGS)"
ArXivorg
, 2023
Citation
Details
Vasconcelos, F and Vlatakis-Gkaragkounis, E-V and Mertikopoulos, P and Piliouras, G and Jordan, M I
"A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games"
, 2023
Citation
Details
Vlatakis-Gkaragkounis, Emmanouil V and Giannou, Angeliki and Chen, Yudong and Xie, Qiaomin
"Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements"
, 2024
Citation
Details
Wei, Alexander and Hu, Wei and Steinhardt, Jacob
"More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize"
International Conference on Machine Learning
, 2022
Citation
Details
Wu, J. and Zhang, Z. and Feng, Z. and Wang, Z. and Yang, Z. and Jordan, M. I. and Xu, H.
"Markov Persuasion Processes and Reinforcement Learning"
ACM Conference on Economics and Computation
, 2022
Citation
Details
Yai Cai and Michael I. Jordan and Tianyi Lin and Argyris Oikonomou and Emmanouil V. Vlatakis-Gkaragkounis
"Curvature-Independent Last-Iterate Convergence for Games on RiemannianManifolds"
arXivorg
, 2023
Citation
Details
Zanette, Andrea
"When is Realizability Sufficient for Off-Policy Reinforcement Learning?"
ICML 2023
, 2023
Citation
Details
Zanette, Andrea and Brunskill, Emma and Wainwright, Martin J.
"Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning"
NEURIPS Conference 2021
, 2021
Citation
Details
Zanette, Andrea and Wainwright, Martin
"Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning."
International Conference on Machine Learning
, 2022
Citation
Details
Zanette, Andrea and Wainwright, Martin J.
"Bellman Residual Orthogonalization for Offline Reinforcement Learning"
ArXivorg
, 2022
Citation
Details
Zhang, Ruiqi and Frei, Spencer and Bartlett, Peter L
"Trained Transformers Learn Linear Models In-Context"
Journal of machine learning research
, v.25
, 2024
Citation
Details
Zrnic, Tijana and Mazumdar, Eric and Sastry, Shankar and Jordan, Michael I
"Who Leads and Who Follows in Strategic Classification?"
ArXivorg
, 2021
Citation
Details
(Showing: 1 - 10 of 76)
(Showing: 1 - 76 of 76)
Please report errors in award information by writing to: awardsearch@nsf.gov.