Federated Learning in Health AI Part 1: The Imperative for Federated Learning
Why is federated learning important in Health AI? Discover the insights in this blog.
Deep Learning, a subfield of Machine Learning (ML), has seen a dramatic resurgence in past years. This resurgence has been largely driven by the exponential increases in computational power and the availability of massive new datasets. Healthcare and medicine stand to benefit immensely from deep learning because of the sheer volume of data being generated as well as the increasing proliferation of medical devices and digital record systems. Deep learning has demonstrated potential in a wide range of clinical tasks.
The medical applications of deep learning in healthcare range from the usage of CNN-like architectures for Skin Lesion Classification from Dermoscopic images to the use of large transformer-like architectures for Histopathological Whole Slide Image Analysis. Will it replace doctors?
Well, there is no semantics in a Deep Learning Model for medical images. The model doesn’t understand medical concepts, but only a matrix of numbers. It is safe to say, DL is just a tool to aid the medical professionals; not replace them.
The need for such tools arises from practical challenges in healthcare. Professionals are often confronted with large volumes of patient data, time consuming diagnostic processes, and increasing demands for accuracy and efficacy. AI systems can analyze vast datasets rapidly, highlight patterns that may not be immediately visible to human observers, and support decision-making in ways that complement clinical expertise. It is only natural to imagine the applications of deep learning ever expanding in healthcare to solve more and more problems. AI seems to be the way forward. However, there are a few caveats and probably the most crucial of them all is Data.
The Fuel of AI Solutions: Data
The reliability of AI models is strongly tied to the volume and diversity of training data. It is safe to herald data as the “fuel” of modern AI systems. Large datasets improve statistical reliability, while diversity ensures generalization across patient demographics, clinical practices, and data acquisition protocols.
Healthcare data are also progressively being available from diverse sources including clinical institutions, patient individuals, insurance companies, pharmaceutical industries, and others. Therefore, it is reasonable to conclude that a large volume of healthcare data is being generated. While this conclusion may be somewhat debatable in the context of smaller economies, it holds even more strongly for countries such as the United States. This large volume should, in theory, be positive for developing reliable AI solutions.
Difficulties with Central Health Data Repository
However, in healthcare, the data are often fragmented across institutions. Consequently, models trained on isolated data sources often under-perform when deployed in different clinical settings. A natural solution is to centralize healthcare data into a single repository for AI training. This could help ensure that the datasets are sufficiently large and diverse, as required for building robust AI systems. Nevertheless, this approach raises significant concerns.
Health data are highly sensitive and protected under regulations such as HIPAA and GDPR.
Centralized data transfer and storage increase the attack surface and therefore the risk of security breaches.
Institutions may be reluctant to share proprietary or patient data, leading to potential conflicts of interest regarding ownership and trust.
Healthcare data are highly variable in terms of formats, clinical practices, acquisition protocols, and documentation standards, which complicates integration.
Towards the solution for Healthcare Data Fragmentation
An alternative approach is to avoid moving the data at all. Instead, the model itself could be sent to the different institutions, trained locally on their data, and then updated collectively without requiring direct data sharing. In this way, the learning process can benefit from the richness of distributed datasets while respecting privacy and institutional boundaries.
If such systems can also tolerate a degree of heterogeneity across sites, they may enable the development of robust AI solutions without the need of centralization. Thankfully, this paradigm of AI Development exists and we should all be grateful for Federated Learning.
We will cover various aspects of Federated Learning in Health AI in our next blog. In the meantime, you can visit federated.withgoogle.com for more information on Federated Learning. You can also check out the PriFed Symposium, an initiative by HAINet, which brings together stakeholders to discuss Federated Learning for health data.
Bibek Niroula is an AI researcher at Multimodal Learning Lab, NAAMII. He is a computer engineer whose research interests include AI in Healthcare, Federated Learning, Multimodal Learning, and/or some form of the intersection of them. He also holds an ISC2 Certification in Cybersecurity and has a strong interest in supporting others in navigating the digital space safely.






