Award Abstract # 2023505
Foundations of Data Science Institute
NSF Org: |
DMS
Division Of Mathematical Sciences
|
Recipient: |
REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
|
Initial Amendment Date: |
August 31, 2020 |
Latest Amendment Date: |
August 21, 2023 |
Award Number: |
2023505 |
Award Instrument: |
Continuing Grant |
Program Manager: |
Tracy Kimbrel
tkimbrel@nsf.gov
(703)292-0000
DMS
Division Of Mathematical Sciences
MPS
Direct For Mathematical & Physical Scien
|
Start Date: |
September 1, 2020 |
End Date: |
August 31, 2025 (Estimated) |
Total Intended Award Amount: |
$5,900,308.00 |
Total Awarded Amount to Date: |
$4,720,248.00 |
Funds Obligated to Date: |
FY 2020 = $1,180,062.00
FY 2021 = $1,180,062.00
FY 2022 = $1,380,062.00
FY 2023 = $980,062.00
|
History of Investigator: |
-
Peter
Bartlett
(Principal Investigator)
bartlett@stat.berkeley.edu
-
Josh
Hug
(Co-Principal Investigator)
-
Bin
Yu
(Co-Principal Investigator)
-
Michael
Jordan
(Co-Principal Investigator)
-
Martin
Wainwright
(Co-Principal Investigator)
|
Recipient Sponsored Research Office: |
University of California-Berkeley
1608 4TH ST STE 201
BERKELEY
CA
US
94710-1749
(510)643-3891
|
Sponsor Congressional District: |
12
|
Primary Place of Performance: |
University of California-Berkeley
Sponsored Projects Office
Berkeley
CA
US
94710-1749
|
Primary Place of Performance Congressional District: |
12
|
Unique Entity Identifier (UEI): |
GS3YEVSS12N6
|
Parent UEI: |
|
NSF Program(s): |
TRIPODS Transdisciplinary Rese
|
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT
|
Program Reference Code(s): |
048Z,
075Z,
079Z
|
Program Element Code(s): |
041Y00
|
Award Agency Code: |
4900
|
Fund Agency Code: |
4900
|
Assistance Listing Number(s): |
47.049, 47.070
|
ABSTRACT
The Foundations of Data Science Institute (FODSI) brings together a large and diverse team of researchers and educators from UC Berkeley, MIT, Boston University, Bryn Mawr College, Harvard University, Howard University, and Northeastern University, with the aim of advancing the theoretical foundations for the field of data science. Data science has emerged as a central science for the 21st century, a widespread approach to science and technology that exploits the explosion in the availability of data to allow empirical investigations at unprecedented scale and scope. It now plays a central role in diverse domains across all of science, commerce and industry. The development of theoretical foundations for principled approaches to data science is particularly challenging because it requires progress across the full breadth of scientific issues that arise in the rich and complex processes by which data can be used to make decisions. These issues include the specification of the goals of data analysis, the development of models that aim to capture the way in which data may have arisen, the crafting of algorithms that are responsive to the models and goals, an understanding of the impact of misspecifications of these models and goals, an understanding of the effects of interactions, interventions and feedback mechanisms that affect the data and the interpretation of the results, concern about the uncertainty of these results, an understanding of the impact of other decision-makers with competing goals, and concern about the economic, social, and ethical implications of automated data analysis and decision-making. To address these challenges, FODSI brings together experts from many cognate academic disciplines, including computer science, statistics, mathematics, electrical engineering, and economics. Institute research outcomes have strong potential to directly impact the many application domains for data science in industry, commerce, science and society, facilitated by mechanisms that directly involve a stream of institute-trained personnel in industrial partners' projects, and by public activities designed to nurture substantive interactions between foundational and use-inspired research communities in data science. The institute also aims to educate and mentor future leaders in data science, through the further development of a pioneering undergraduate program in data science, and by training a diverse cohort of graduate students and postdocs with an innovative approach that emphasizes strong mentorship, flexibility, and breadth of collaboration opportunities. In addition, the institute plans to host an annual summer school that will deliver core curriculum and a taste of foundational research to a diverse group of advanced undergraduates, graduate students, and postdocs. It aims to broaden participation and increase diversity in the data science workforce, bringing the excitement of data science to under-represented groups at the high school level, and targeting diverse participation in the institute's public activities. And it will act as a nexus for research and education in the foundations of data science, by convening public events, such as summer schools and research workshops and other collaborative research opportunities, and by providing models for education, human resource development, and broadening participation.
The scientific focus of the institute will encompass the full range of issues that arise in data science -- modeling issues, inferential issues, computational issues, and societal issues ? and the challenges that emerge from the conflicts between their competing requirements. Its research agenda is organized around eight themes. Three of these themes focus on key challenges arising from the rich variety of interactions between a decision maker and its environment, not only the classical view of data that is processed in a batch or a stream, but also sequential interactions with feedback (the control perspective), experimental interactions designed to answer "what if" questions (the causality perspective), and strategic interactions involving other actors with conflicting goals (the economic perspective). The other research themes focus on opportunities for major impacts across disciplinary boundaries: on elucidating the algorithmic landscape of statistical problems, and in particular the computational complexity of statistical estimation problems, on sketching, sampling, and sub-linear time algorithms designed to address issues of scalability in data science problems; on exploiting statistical methodology in the service of algorithms; and on using breakthroughs in applied mathematics to address computational and inferential challenges. Intellectual contributions to societal issues in data science will feature throughout this set of themes. The institute will exploit strong connections with its scientific and industrial partners to ensure that these research directions enjoy a rich engagement with a broad range of commercial, technological and scientific application domains. Its sequence of research workshops and a collaborative research program will serve the broader research community by nurturing additional research in these key challenge areas. The institute will be led by a steering committee that will seek the help of an external advisory board to prioritize its research themes and activities throughout its lifetime. Its educational programs will include curriculum development from K-12 through undergraduate, a graduate level visit program, and a postdoc training model, aimed at empowering the next generation of leaders to fluidly work across conventional disciplinary boundaries while being mindful of the full set of scientific issues. The institute will undertake a multi-pronged effort to recruit, engage and support the full range of groups traditionally under-represented in mathematics, computer science and statistics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
(Showing: 1 - 10 of 53)
(Showing: 1 - 53 of 53)
Bartlett, Peter L. and Bubeck, Sebastien and Cherapanamjeri, Yeshwanth
"Adversarial Examples in Multi-Layer Random ReLU Networks"
Advances in Neural Information Processing Systems
, v.34
, 2021
Citation Details
Chatterji, Niladri S. and Long, Philip M. and Bartlett, Peter L.
"When does gradient descent with logistic loss find interpolating two-layer networks?"
Journal of machine learning research
, v.22
, 2021
Citation Details
Wei, Alexander and Hu, Wei and Steinhardt, Jacob
"More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize"
International Conference on Machine Learning
, 2022
Citation Details
Chatterji, Niladri S. and Bartlett, Peter L. and Long, Philip M.
"Oracle lower bounds for stochastic gradient sampling algorithms"
Bernoulli
, v.28
, 2022
Citation Details
Bartlett, Peter L. and Montanari, Andrea and Rakhlin, Alexander
"Deep learning: a statistical viewpoint"
Acta numerica
, 2021
Citation Details
Daskalakis, Constantinos and Stefanou, Patroklos and Yao, Rui and Zampetakis, Manolis
"Efficient Truncated Linear Regression with Unknown Noise Variance"
Advances in neural information processing systems
, 2021
Citation Details
Pacchiano, Aldo and Lee, Jonathan and Bartlett, Peter L. and Nachum, Ofir
"Near Optimal Policy Optimization via REPS"
Advances in neural information processing systems
, v.34
, 2021
Citation Details
Zrnic, Tijana and Mazumdar, Eric and Sastry, Shankar and Jordan, Michael I
"Who Leads and Who Follows in Strategic Classification?"
ArXivorg
, 2021
Citation Details
Fannjiang, Clara and Bates, Stephen and Angelopoulos, Anastasios N. and Listgarten, Jennifer and Jordan, Michael I.
"Conformal prediction for the design problem"
Proceedings of the National Academy of Sciences of the United States of America
, 2022
Citation Details
Liu, Z. and Lu, M. and Wang, Z. and Jordan, M. I. and Yang, Z.
"Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy"
International Conference on Machine Learning
, 2022
Citation Details
Angelopoulos, Anastasios N and Kohli, Amit P and Bates, Stephen and Jordan, Michael I and Malik, Jitendra and Alshaabi, Thayer and Upadhyayula, Srigokul and Romano, Yaniv
"Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging"
International Conference on Machine Learning
, 2022
Citation Details
Wu, J. and Zhang, Z. and Feng, Z. and Wang, Z. and Yang, Z. and Jordan, M. I. and Xu, H.
"Markov Persuasion Processes and Reinforcement Learning"
ACM Conference on Economics and Computation
, 2022
Citation Details
Mou, Wenlong and Flammarion, Nicolas and Wainwright, Martin J. and Bartlett, Peter L.
"Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity"
Bernoulli
, v.28
, 2022
https://doi.org/10.3150/21-BEJ1343
Citation Details
Yai Cai and Michael I. Jordan and Tianyi Lin and Argyris Oikonomou and Emmanouil V. Vlatakis-Gkaragkounis
"Curvature-Independent Last-Iterate Convergence for Games on RiemannianManifolds"
arXivorg
, 2023
Citation Details
Michael I. Jordan and Tianyi Lin and Emmanouil V. Vlatakis-Gkaragkounis
"First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces"
Advances in neural information processing systems
, 2023
Citation Details
Zanette, Andrea and Wainwright, Martin J.
"Bellman Residual Orthogonalization for Offline Reinforcement Learning"
ArXivorg
, 2022
Citation Details
Perdomo, Juan and Simchowitz, Max and Agarwal, Alekh and Bartlett, Peter L.
"Towards a Dimension-Free Understanding of Adaptive Linear Control"
Proceedings of the 34th Conference on Learning Theory (COLT2021)
, 2021
Citation Details
Shen, Dennis and Ding, Peng and Sekhon, Jasjeet and Yu, Bin
"Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data"
arXivorg
, 2022
Citation Details
Nikhil Ghosh, Song Mei
"The Three Stages of Learning Dynamics in High-dimensional Kernel Methods"
ArXivorg
, 2021
Citation Details
Tan, Yan Shuo and Agarwal, Abhineet and Yu, Bin
"A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds"
Proeedings of the International Workshop on Artificial Intelligence and Statistics
, 2022
Citation Details
Assos, Angelos and Attias, Idan and Dagan, Yuval and Daskalakis, Constantinos and Fishelson, Maxwell
"Online Learning and Solving Infinite Games with an ERM Oracle"
COLT 2023
, 2023
Citation Details
Pacchiano, Aldo and Bartlett, Peter L. and Jordan, Michael I.
"An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit"
Proceedings of the 34th International Conference on Algorithmic Learning Theory
, 2023
Citation Details
Agarwal, Abhineet and Tan, Yan Shuo and Ronen, Omer and Singh, Chandan and Yu, Bin
"Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods"
Proceedings of Machine Learning Research
, 2022
Citation Details
Duncan, James and Kapoor, Rush and Agarwal, Abhineet and Singh, Chandan and Yu, Bin
"VeridicalFlow: a Python package for building trustworthy data science pipelines with PCS"
Journal of Open Source Software
, v.7
, 2022
https://doi.org/10.21105/joss.03895
Citation Details
Hsu, Aliyah R. and Cherapanamjeri, Yeshwanth and Park, Briton and Naumann, Tristan and Odisho Anobel Y. and Yu, Bin
"An investigation into the effects of pre-training data distributions for pathology report classification"
arXivorg
, 2023
Citation Details
Kandiros, Vardis and Daskalakis, Constantinos and Dagan, Yuval and Choo, Davin
"Learning and Testing Latent-Tree Ising Models Efficiently"
COLT 2023
, 2023
Citation Details
Nasseri, Keyan and Singh, Chandan and Duncan, James and Kornblith, Aaron and Yu, Bin
"Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data"
ArXivorg
, 2022
Citation Details
Bartlett, Peter L. and Indyk, Piotr and Wagner, Tal
"Generalization Bounds for Data-Driven Numerical Linear Algebra"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation Details
Michael I. Jordan and Guy Kornowski and Tianyi Lin and Ohad Shamir and Manolis Zampetakis
"Deterministic Nonsmooth Nonconvex Optimization"
Conference on Learning Theory
, 2023
Citation Details
Bartlett, Peter L. and Montanari, Andrea and Rakhlin, Alexander
"Deep learning: a statistical viewpoint"
Acta Numerica
, v.30
, 2021
https://doi.org/10.1017/S0962492921000027
Citation Details
Guo, Wenshuo and Jordan, Michael and Zampetakis, Emmanouil
"Robust Learning of Optimal Auctions"
Advances in neural information processing systems
, 2021
Citation Details
Muehlebach, Michael and Jordan, Michael I
"On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems"
ArXivorg
, 2021
Citation Details
Agarwal, Abhineet and Kenney, Ana M. and Tan, Yan Shuo and Tang, Tiffany M. and Yu, Bin
"MDI+: A Flexible Random Forest-Based Feature Importance Framework"
arXivorg
, 2023
Citation Details
Bhatia, Kush and Bartlett, Peter L. and Dragan, Anca D. and Steinhardt, Jacob
"Agnostic Learning with Unknown Utilities"
Leibniz international proceedings in informatics
, v.185
, 2021
https://doi.org/10.4230/LIPIcs.ITCS.2021.55
Citation Details
Chatterji, Niladri S. and Long, Philip M. and Bartlett, Peter L.
"The interplay between implicit bias and benign overfitting in two-layer linear networks"
Journal of machine learning research
, 2022
Citation Details
Tan, Yan Shuo and Singh, Chandan and Nasseri, Keyan and Agarwal, Abhineet and Duncan, James and Ronen, Omer and Epland, Matthew and Kornblith, Aaron and Yu, Bin
"Fast Interpretable Greedy-Tree Sums (FIGS)"
ArXivorg
, 2023
Citation Details
Ha, Wooseok and Singh, Chandan and Lanusse, Francois and Upadhyayula, Srigokul and Yu, Bin
"Adaptive wavelet distillation from neural networks through interpretations"
Advances in neural information processing systems
, 2021
Citation Details
Mou, Wenlong and Pananjady, Ashwin and Wainwright, Martin J. and Bartlett, Peter L.
"Optimal and instance-dependent guarantees for Markovian linear stochastic approximation"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation Details
Chatterji, Niladri and Pacchiano, Aldo and Bartlett, Peter L. and Jordan, Michael I.
"On the Theory of Reinforcement Learning with Once-per-Episode Feedback"
Advances in neural information processing systems
, v.34
, 2021
Citation Details
Frei, Spencer and Chatterji, Niladri and Bartlett, Peter L.
"Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation Details
Nika Haghtalab and Michael I. Jordan and Eric Zhao
"A Unifying Perspective on Multi-Calibration: Unleashing Game Dynamics for Multi-Objective Learning*"
arXivorg
, 2023
Citation Details
Guo, W.
"No-Regret Learning in Partially-Informed Auctions"
International Conference on Machine Learning
, 2022
Citation Details
Frei, Spencer and Vardi, Gal and Bartlett, Peter L. and Srebro, Nathan and Hu, Wei
"Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data"
Proceedings of ICLR 2023
, 2023
Citation Details
Singh, Chandan and Ha, Wooseok and Yu, Bin
"Interpreting and Improving Deep-Learning Models with Reality Checks"
Lecture notes in computer science
, 2022
https://doi.org/10.1007/978-3-031-04083-2_12
Citation Details
Cherapanamjeri, Yeshwanth and Tripuraneni, Nilesh and Bartlett, Peter L. and Jordan, Michael I.
"Optimal Mean Estimation without a Variance"
Proceedings of the 35th Conference on Learning Theory (COLT2022)
, 2022
Citation Details
Zanette, Andrea and Brunskill, Emma and Wainwright, Martin J.
"Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning"
NEURIPS Conference 2021
, 2021
Citation Details
Pacchiano, Aldo and Ghavamzadeh, Mohammad and Bartlett, Peter L. and Jiang, Heinrich
"Stochastic Bandits with Linear Constraints"
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
, v.130
, 2021
Citation Details
Zanette, Andrea and Wainwright, Martin
"Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning."
International Conference on Machine Learning
, 2022
Citation Details
Perdomo, Juan and Krishnamurthy, Akshay and Bartlett, Peter L. and Kakade, Sham
"A Sharp Characterization of Linear Estimators for Offline Policy Evaluation"
Journal of machine learning research
, 2023
Citation Details
Zanette, Andrea
"When is Realizability Sufficient for Off-Policy Reinforcement Learning?"
ICML 2023
, 2023
Citation Details
(Showing: 1 - 10 of 53)
(Showing: 1 - 53 of 53)
Please report errors in award information by writing to: awardsearch@nsf.gov.