NSF Award Search: Award # 2023505 - Foundations of Data Science Institute

Award Abstract # 2023505

Foundations of Data Science Institute

NSF Org:	DMS Division Of Mathematical Sciences
Recipient:	REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
Initial Amendment Date:	August 31, 2020
Latest Amendment Date:	August 16, 2024
Award Number:	2023505
Award Instrument:	Continuing Grant
Program Manager:	Stacey Levine slevine@nsf.gov (703)292-2948 DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences
Start Date:	September 1, 2020
End Date:	August 31, 2026 (Estimated)
Total Intended Award Amount:	$5,900,308.00
Total Awarded Amount to Date:	$5,900,308.00
Funds Obligated to Date:	FY 2020 = $1,180,062.00 FY 2021 = $1,180,062.00 FY 2022 = $1,380,062.00 FY 2023 = $980,062.00 FY 2024 = $1,180,060.00
History of Investigator:	Peter Bartlett (Principal Investigator) bartlett@stat.berkeley.edu Bin Yu (Co-Principal Investigator) Michael Jordan (Co-Principal Investigator) Martin Wainwright (Co-Principal Investigator) Josh Hug (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of California-Berkeley 1608 4TH ST STE 201 BERKELEY CA US 94710-1749 (510)643-3891
Sponsor Congressional District:	12
Primary Place of Performance:	University of California-Berkeley Sponsored Projects Office Berkeley CA US 94710-1749
Primary Place of Performance Congressional District:	12
Unique Entity Identifier (UEI):	GS3YEVSS12N6
Parent UEI:
NSF Program(s):	TRIPODS Transdisciplinary Rese
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT 01002223DB NSF RESEARCH & RELATED ACTIVIT 01002324DB NSF RESEARCH & RELATED ACTIVIT 01002425DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	048Z, 075Z, 079Z
Program Element Code(s):	041Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.049, 47.070

ABSTRACT

The Foundations of Data Science Institute (FODSI) brings together a large and diverse team of researchers and educators from UC Berkeley, MIT, Boston University, Bryn Mawr College, Harvard University, Howard University, and Northeastern University, with the aim of advancing the theoretical foundations for the field of data science. Data science has emerged as a central science for the 21st century, a widespread approach to science and technology that exploits the explosion in the availability of data to allow empirical investigations at unprecedented scale and scope. It now plays a central role in diverse domains across all of science, commerce and industry. The development of theoretical foundations for principled approaches to data science is particularly challenging because it requires progress across the full breadth of scientific issues that arise in the rich and complex processes by which data can be used to make decisions. These issues include the specification of the goals of data analysis, the development of models that aim to capture the way in which data may have arisen, the crafting of algorithms that are responsive to the models and goals, an understanding of the impact of misspecifications of these models and goals, an understanding of the effects of interactions, interventions and feedback mechanisms that affect the data and the interpretation of the results, concern about the uncertainty of these results, an understanding of the impact of other decision-makers with competing goals, and concern about the economic, social, and ethical implications of automated data analysis and decision-making. To address these challenges, FODSI brings together experts from many cognate academic disciplines, including computer science, statistics, mathematics, electrical engineering, and economics. Institute research outcomes have strong potential to directly impact the many application domains for data science in industry, commerce, science and society, facilitated by mechanisms that directly involve a stream of institute-trained personnel in industrial partners' projects, and by public activities designed to nurture substantive interactions between foundational and use-inspired research communities in data science. The institute also aims to educate and mentor future leaders in data science, through the further development of a pioneering undergraduate program in data science, and by training a diverse cohort of graduate students and postdocs with an innovative approach that emphasizes strong mentorship, flexibility, and breadth of collaboration opportunities. In addition, the institute plans to host an annual summer school that will deliver core curriculum and a taste of foundational research to a diverse group of advanced undergraduates, graduate students, and postdocs. It aims to broaden participation and increase diversity in the data science workforce, bringing the excitement of data science to under-represented groups at the high school level, and targeting diverse participation in the institute's public activities. And it will act as a nexus for research and education in the foundations of data science, by convening public events, such as summer schools and research workshops and other collaborative research opportunities, and by providing models for education, human resource development, and broadening participation.

The scientific focus of the institute will encompass the full range of issues that arise in data science -- modeling issues, inferential issues, computational issues, and societal issues ? and the challenges that emerge from the conflicts between their competing requirements. Its research agenda is organized around eight themes. Three of these themes focus on key challenges arising from the rich variety of interactions between a decision maker and its environment, not only the classical view of data that is processed in a batch or a stream, but also sequential interactions with feedback (the control perspective), experimental interactions designed to answer "what if" questions (the causality perspective), and strategic interactions involving other actors with conflicting goals (the economic perspective). The other research themes focus on opportunities for major impacts across disciplinary boundaries: on elucidating the algorithmic landscape of statistical problems, and in particular the computational complexity of statistical estimation problems, on sketching, sampling, and sub-linear time algorithms designed to address issues of scalability in data science problems; on exploiting statistical methodology in the service of algorithms; and on using breakthroughs in applied mathematics to address computational and inferential challenges. Intellectual contributions to societal issues in data science will feature throughout this set of themes. The institute will exploit strong connections with its scientific and industrial partners to ensure that these research directions enjoy a rich engagement with a broad range of commercial, technological and scientific application domains. Its sequence of research workshops and a collaborative research program will serve the broader research community by nurturing additional research in these key challenge areas. The institute will be led by a steering committee that will seek the help of an external advisory board to prioritize its research themes and activities throughout its lifetime. Its educational programs will include curriculum development from K-12 through undergraduate, a graduate level visit program, and a postdoc training model, aimed at empowering the next generation of leaders to fluidly work across conventional disciplinary boundaries while being mindful of the full set of scientific issues. The institute will undertake a multi-pronged effort to recruit, engage and support the full range of groups traditionally under-represented in mathematics, computer science and statistics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 76)

Show All

Agarwal, Abhineet and Kenney, Ana M. and Tan, Yan Shuo and Tang, Tiffany M. and Yu, Bin "MDI+: A Flexible Random Forest-Based Feature Importance Framework" arXivorg , 2023 Citation Details

Agarwal, Abhineet and Tan, Yan Shuo and Ronen, Omer and Singh, Chandan and Yu, Bin "Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods" Proceedings of Machine Learning Research , 2022 Citation Details

Angelopoulos, Anastasios N and Kohli, Amit P and Bates, Stephen and Jordan, Michael I and Malik, Jitendra and Alshaabi, Thayer and Upadhyayula, Srigokul and Romano, Yaniv "Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging" International Conference on Machine Learning , 2022 Citation Details

Assos, Angelos and Attias, Idan and Dagan, Yuval and Daskalakis, Constantinos and Fishelson, Maxwell "Online Learning and Solving Infinite Games with an ERM Oracle" COLT 2023 , 2023 Citation Details

Bartlett, Peter L. and Bubeck, Sebastien and Cherapanamjeri, Yeshwanth "Adversarial Examples in Multi-Layer Random ReLU Networks" Advances in Neural Information Processing Systems , v.34 , 2021 Citation Details

Bartlett, Peter L. and Indyk, Piotr and Wagner, Tal "Generalization Bounds for Data-Driven Numerical Linear Algebra" Proceedings of the 35th Conference on Learning Theory (COLT2022) , 2022 Citation Details

Bartlett, Peter L and Long, Philip M and Bousquet, Olivier "The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima" Journal of machine learning research , v.24 , 2023 Citation Details

Bartlett, Peter L. and Montanari, Andrea and Rakhlin, Alexander "Deep learning: a statistical viewpoint" Acta numerica , 2021 Citation Details

Bartlett, Peter L. and Montanari, Andrea and Rakhlin, Alexander "Deep learning: a statistical viewpoint" Acta Numerica , v.30 , 2021 https://doi.org/10.1017/S0962492921000027 Citation Details

Bhatia, Kush and Bartlett, Peter L. and Dragan, Anca D. and Steinhardt, Jacob "Agnostic Learning with Unknown Utilities" Leibniz international proceedings in informatics , v.185 , 2021 https://doi.org/10.4230/LIPIcs.ITCS.2021.55 Citation Details

Bhatia, Kush and Ma, Yi-An and Dragan, Anca D. and Bartlett, Peter L. and Jordan, Michael I. "Bayesian Robustness: A Nonasymptotic Viewpoint" Journal of the American Statistical Association , 2023 https://doi.org/10.1080/01621459.2023.2174121 Citation Details

(Showing: 1 - 10 of 76)

Show All

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error