Award Abstract # 2033569
A1: A Multi-Scale Open Knowledge Network for Biomedicine

NSF Org: ITE
Innovation and Technology Ecosystems
Recipient: REGENTS OF THE UNIVERSITY OF CALIFORNIA, SAN FRANCISCO, THE
Initial Amendment Date: August 14, 2020
Latest Amendment Date: September 15, 2022
Award Number: 2033569
Award Instrument: Cooperative Agreement
Program Manager: Jemin George
jgeorge@nsf.gov
 (703)292-2251
ITE
 Innovation and Technology Ecosystems
TIP
 Dir for Tech, Innovation, & Partnerships
Start Date: September 1, 2020
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $4,999,998.00
Total Awarded Amount to Date: $5,299,996.00
Funds Obligated to Date: FY 2020 = $2,999,998.00
FY 2021 = $2,000,000.00

FY 2022 = $299,998.00
History of Investigator:
  • Sergio Baranzini (Principal Investigator)
    sergio.baranzini@ucsf.edu
  • Sui Huang (Co-Principal Investigator)
  • Sharat Israni (Co-Principal Investigator)
  • James Brase (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-San Francisco
1855 FOLSOM ST STE 425
SAN FRANCISCO
CA  US  94103-4249
(415)476-2977
Sponsor Congressional District: 11
Primary Place of Performance: The Regents of the University of California, San Francisco
675 Nelson Rising Lane
San Francisco
CA  US  94143-0003
Primary Place of Performance
Congressional District:
11
Unique Entity Identifier (UEI): KMH5K9V7S518
Parent UEI: KMH5K9V7S518
NSF Program(s): CA-HDR: Convergence Accelerato,
Convergence Accelerator Resrch
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 095Y00, 131Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.083, 47.084

ABSTRACT

The NSF Convergence Accelerator supports use-inspired, team-based, multidisciplinary efforts that address challenges of national importance and will produce deliverables of value to society in the near future.

This project will create an organization of data called a knowledge network that will allow doctors, researchers, the pharmaceutical industry, and citizen scientists to much more effectively understand and explore biomedicine. It will connect vast amounts of data in a way that allows important new questions to be asked, helping to discover the root of a biological process, identify cures for diseases, recognize pharmaceuticals that could be relevant previously unexplored conditions, and much more. The platform will enable and support biomedical applications created by third parties. The adoption of those enabled tools has the potential to have significant societal impacts: reducing healthcare costs, health disparities and accelerating therapeutics, ultimately improving the quality of life for every American.

Healthcare costs Americans almost one-fifth of the entire US GDP. Health disparities, major public health issues, drug discovery complexity, and overall costs continue to grow dramatically. The mechanisms underlying human health are so complex that the human brain cannot integrate the ever-growing body of available knowledge relevant to treating patients or discovering therapies. This hampers the generation of new knowledge, specifically in the biomedical sciences and its implications for human health. The goal of this project, a biomedical open knowledge network (OKN), is to integrate billions of biomedical concepts into a knowledge engine that will enable doctors, drug developers, researchers, and citizen scientists to produce biologically meaningful answers to biomedical questions ? rapidly and cheaply. This OKN will incorporate billions of factual relationships among biomedical concepts, allowing specialists to generalists to explore biomedicine in its whole might.

The team supported by this project is part of a group pioneering the paradigm of knowledge networks in biomedicine. The effort brings together partners with expertise in search tools (including Google), graph theory (from Lawrence Livermore National Labs), and collaboration with the National Center for Advancing Translational Sciences? Biomedical Data Translator (part of the National Institutes of Health), as well as working with other academic nonprofit institutions (the Institute for Systems Biology, Indiana University, UC San Diego, and Stanford. The ambitious effort represents convergence research including expertise across all aspects of biomedicine and data science, integrating doctors, researchers, epistemologists, database specialists, computer scientists, and statisticians.

During Phase I of the Convergence Accelerator Program, the team developed and made available a fully functional biomedical knowledge network, the Scalable Precision Medicine Knowledge Engine (SPOKE, spoke.rbvi.ucsf.edu). This success supports the likelihood that the team will produce deliverables in this Phase II project that will have a positive impact on society.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nelson, Charlotte A and Bove, Riley and Butte, Atul J and Baranzini, Sergio E "Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis" Journal of the American Medical Informatics Association , v.29 , 2021 https://doi.org/10.1093/jamia/ocab270 Citation Details
Fecho, Karamarie and Thessen, Anne E. and Baranzini, Sergio E. and Bizon, Chris and Hadlock, Jennifer J. and Huang, Sui and Roper, Ryan T. and Southall, Noel and Ta, Casey and Watkins, Paul B. and Williams, Mark D. and Xu, Hao and Byrd, William and Dan?ík "Progress toward a universal biomedical data translator" Clinical and Translational Science , v.15 , 2022 https://doi.org/10.1111/cts.13301 Citation Details
Baranzini, Sergio E. and Börner, Katy and Morris, John and Nelson, Charlotte A. and Soman, Karthik and Schleimer, Erica and Keiser, Michael and Musen, Mark and Pearce, Roger and Reza, Tahsin and Smith, Brett and Herr, II, Bruce W. and Oskotsky, Boris and "A biomedical open knowledge network harnesses the power of AI to understand deep human biology" AI Magazine , v.43 , 2022 https://doi.org/10.1002/aaai.12037 Citation Details
Nelson, Charlotte A. and Acuna, Ana Uriarte and Paul, Amber M. and Scott, Ryan T. and Butte, Atul J. and Cekanaviciute, Egle and Baranzini, Sergio E. and Costes, Sylvain V. "Knowledge Network Embedding of Transcriptomic Data from Spaceflown Mice Uncovers Signs and Symptoms Associated with Terrestrial Diseases" Life , v.11 , 2021 https://doi.org/10.3390/life11010042 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The human brain cannot possibly integrate the vast and rapidly growing amount of information modern societies have been able to amass. Our knowledge network (SPOKE) integrates hundreds of millions of biomedical concepts into a knowledge engine to enable doctors, drug developers and citizen scientists connect the dots and produce a biologically meaningful answer to these questions. 

 

Over the course of the project period, SPOKE has grown in size by a factor of 15, with 40 million nodes representing various biomedical & health related concepts and 100 million edges representing connections among those concepts. More than 50 independent data sources have now been integrated into SPOKE by the team, increasing capabilities for network exploration in the areas of molecular pathways as well as the environmental factors that may affect health.

 

During the award period, the team expanded development of a simple knowledge network visualizer (the SPOKE neighborhood explorer) to support interactive exploration. Most importantly, a commercial entity was spun off from the core SPOKE team in partnership with UCSF Innovation Ventures.  Mate Bioservices develops a comprehensive network interface with customizable analysis tools to meet the needs of a variety of user profiles including biomedical scientists, drug developers, clinicians, and citizen scientists. The minimum viable product for this interface, BRIC (Biomedical Researchers’ Intellectual Companion), will be made available by Q1 2024.   

 

Recognizing that there are significant ethical, legal, and social implications associated with the development and use of such powerful tools for biomedical knowledge exploration, the team has generated a risk assessment report to address usage risk, responsibilities to mitigate negative impacts, potential mosaic effect, validity, trustworthiness and generalizability. 

 

As part of its academic development, the team examined the baseline global features of SPOKE and continues to monitor these features to ensure that structural artifacts do not emerge. The SPOKE Team also partnered with Oak Ridge National Laboratory to achieve exaflops computing, a work recognized by reaching the finalist stage of the Gordon Bell Prize competition. The team has also demonstrated the utility of the network and the validity of a network-based analysis approach to generate hypotheses and provide prognostic information about patients. In March of 2020 SPOKE integrated the newly published SARS-CoV-2 Interactome into the network to explore some potential pathways of viral activity and were able to identify pathways and predict promising therapeutic measures that were later confirmed experimentally in the literature. 

 

We have also shown the utility of the network when observed data is overlayed, and signature patterns emerge as feature weights are modeled across the network. For example, our team recently published work on an algorithm to embed millions of deidentified electronic health records (EHRs) into SPOKE to produce high-dimensional, knowledge-guided patient health signatures that were subsequently used as features in a machine learning environment to classify patients at risk of developing a chronic disease. Our model predicted disease status 3 years before being diagnosed with multiple sclerosis (MS). The method outperformed predictions using EHRs alone, and the biological drivers of the classifiers provided insight into the underpinnings of prodromal MS. 

 

As a notable outcome of this award, we recently incorporated Mate Bioservices, a company that commercializes applications of SPOKE. Mate Bioservices secured the exclusive license for SPOKE from UCSF, and filed a patent for SPOKE and our proprietary algorithms. Mate has already retained a Chief Executive Officer, who is actively seeking customers and investment opportunities from venture capital as well the NSF SBIR funding mechanism. Mate already has revenue from contracts with industry and academia and the team is currently in negotiations with potential investors to raise capital for expansion

 

In summary, throughout our participation in the NSF Convergence Accelerator program, we have developed and made available a fully functional biomedical KN. We have established a Governance Committee for SPOKE, guiding expansion of the KN with additional knowledge sources; and developed an open access network visualization tool. We have completed a number of analyses and published results validating the network’s quality and utility. We secured rights for commercialization of SPOKE-powered products, and established a company to support commercialization of four products powered by the KN.

 

 

 

 

 

 

 


Last Modified: 12/16/2023
Modified by: Sergio E Baranzini

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page