Biomedical Data Management / Workshop at VLDB 2026

Community of biomedical informatics and data management researchers and practitioners who engage in collaborative efforts to identify emerging problem areas, develop novel solutions and help accelerate the pace of innovation in healthcare.


Speakers

Nils Gehlenborg
Associate Professor / Department of Biomedical Informatics / Harvard University
His research group develops visual interfaces and computational techniques that enable scientists and clinicians to efficiently interact with biomedical data. They developed the HubMAP data portal for discovering, visualizing and downloading standardized multi-modal spatial and single-cell data from healthy human tissues.
Tim Poterba
Software Engineer and Entrepreneur / Former Co-founder and CTO of E9 Genomics
Spent over 8 years as a software engineer at the Neale Lab of the Broad Institue where he was a lead developer of Hail, a scalable data analytics framework for genomic data that provides distributed processing of declarative queries over relational bioinformatics-aware data structures.
Jeff Fried
Director of Platform Strategy & Innovation / InterSystems
Shapes scalable data platforms and interoperability solutions used in demanding enterprise and healthcare environments. He has over 25 years of experience leading product and platform strategy at major technology firms including Microsoft, holds 15 patents, authored more than 50 technical papers, and co-authored three books.

Program

Where: VLDB 2026 in Boston, USA.
When: August 31st or September 4th (TBD).

The workshop program will feature invited and proposed talks, panel discussions, guided breakout sessions and networking opportunities. Details TBD.

Contributing

As an inaugural workshop, our goal is to hear as many voices as possible, providing the attendees with a thorough overview of the current state of the field. To complement our selection of esteemed speakers, we encourage the community to submit proposals for talks that will (1) highlight key problems and/or potential solutions, and, optionally, (2) outline visions of collaborative projects. Hence, we will accept two types of submissions:

Review process

All submissions will receive double-blind peer-reviews from a program committee consisting of both biomedical researchers and data management researchers. Each submission will be assigned to at least two reviewers with relevant expertise. The review criteria will focus on the clarity of the problem statement, the relevance to biomedical data management, the potential for cross-disciplinary collaboration, and the feasibility of the proposed project (for Project Talk Proposals). We will also consider the novelty and potential impact of the proposed ideas.

About

Vision

This interdisciplinary workshop will focus on data management techniques, tools, and systems with direct applications to the unique challenges in biomedical research and healthcare. Our goal is to build a lasting community centered around these topics and spark fruitful collaborations.

Biomedical research and healthcare are increasingly data-driven, yet practitioners regularly struggle with data management challenges.

We are witnessing a proliferation of data collection technologies (e.g., electronic health records, high-throughput sequencing, medical imaging), the growing adoption of computational methods for data analysis, and the recognition that data is crucial for unlocking new scientific insights and improving patient care. Furthermore, the scale of data that is being collected is growing exponentially, driven by the decreasing costs of data acquisition technologies and the increasing digitization of healthcare systems. However, the people who produce, analyze, and interpret biomedical data (e.g., clinicians, digital health specialists, bioinformaticians etc.) often lack the expertise and tools to effectively manage and analyze these multi-modal datasets at scale. This means that scientific progress is often limited by overwhelming data management challenges.

Data management researchers are well-equipped to tackle many of these challenges, and are actively seeking new research directions.

Data management researchers have spent decades developing techniques, tools, and systems for managing large-scale data. They possess valuable expertise in areas such as data integration, data quality, data governance, and scalable analytics. However, as evidenced by some points raised at panels at SIGMOD and VLDB 2025, there is a growing push for data management researchers to actively pursue new, high-impact application domains whose requirements can inspire fundamentally new system designs, benchmarks, and end-to-end deployments. Biomedical data management is one such domain, presenting unique research opportunities that can directly impact and improve human lives.

Bringing these two communities together can unlock the potential for novel solutions and rapidly accelerate scientific progress.

However, this is by no means a trivial task. One noteworthy challenge is that data management researchers often lack exposure to real-world biomedical datasets (due to privacy restrictions), as well as the domain knowledge necessary to understand the specific challenges and requirements of biomedical data management. Conversely, biomedical researchers often lack awareness of the latest advances in data management research and how these techniques can be applied to their specific problems. As a result, there is a gap between the capabilities of existing data management systems and the needs of biomedical researchers and clinicians. Our goal is to bridge this gap by bringing together experts from both communities to identify pressing biomedical data management challenges and explore opportunities for collaboration that can lead to the development of novel data management solutions tailored to the biomedical domain.

Target audience

This workshop brings together two key groups, each one bringing a deep understanding of their own domain of expertise and an interest in collaborating with the other side on projects that have the potential to advance both fields:

Motivating scenario

To provide an illustrative example of the kinds of challenges that this workshop aims to address, consider the scenario of a molecular tumor board (MTB). It is a multidisciplinary meeting in which experts from multiple disciplines (e.g., oncologists, molecular biologists, pathologists, surgeons, genetic counselors) discuss a complex patient case and converge on a treatment plan. Decisions must be made quickly, transparently, and with a clear trail of supporting evidence. Such evidence is usually found by integrating highly multimodal evidence: clinical data (diagnoses, medications, adverse events, patient history, and family), high-throughput omics data (genome, transcriptome, proteome, genetic variations), histopathological lab results, and tissue imaging, all of which are often originating from different institutions.

Given the number of patients, their uniqueness, and the time constraints of all the practitioners involved, the board operates under tight time constraints and high stakes. Time spent discussing a typical patient’s case is measured in minutes, while the preparation phase can take several hours of manual data preparation and analysis.

In practice, tumor board preparation often turns into an ad-hoc integration exercise across siloed systems and heterogeneous formats. This setting exposes a set of recurring data management challenges: data assembly across scales, statistical data quality assessments, integration across multiple external data sources, intuitive data access, and scalability. A well-designed biomedical data management system could substantially reduce these frictions, enabling a secure, patient-centric, multimodal view that is assembled reliably and quickly, transforming tumor board preparation from a tedious, error-prone integration task into a reproducible workflow.

Topics of interest

This workshop will feature presentations and discussions related to the following topics:

Specific goals

Organizers

Bojan Karlaš
Postdoc / Harvard University (affiliated with HMS, MGB, DFCI, and the Broad)
Works on developing interpretable deep learning pipelines for extracting clinically meaningful insights from pathology images. He obtained his PhD at ETH Zurich, working on data management systems for ML with a particular focus on data debugging.
Gerardo Vitagliano
Postdoc / Data Systems Group / MIT CSAIL
Builds interactive and user-friendly data systems, allowing domain experts to analyze large-scale multimodal datasets. His research involves active collaborations with clinicians and biomedical researchers to impact real-world healthcare.
Benjamin M. Gyori
Associate professor / Northeastern University
Works on large-scale data integration and knowledge assembly in biomedicine. His research combines computational systems modeling, ML, NLP, and human–machine interaction to improve our understanding of complex human biology.
Ulf Leser
Full professor / Humboldt-Universität zu Berlin
Developed new tools for management, integration, and analysis of biomedical data. Interested in biomedical data management, text mining, infrastructures for large-scale scientific data analysis, and statistical bioinformatics, with a focus on cancer research.