Kassaye Yitbarek Yigzaw, Johan Gustav Bellika (Supervisor), Gunnar Hartvigsen (Supervisor), Anders Andersen (Supervisor), Fred Godtlibsen (Supervisor), Stein Olav Skrøvseth (Supervisor), Gro Karine Rosvold Berntsen (Supervisor), Towards Practical Privacy-Preserving Distributed Statistical Computation of Health Data, UiT The Arctic University of Norway, Tromsø, 2017.
Healthcare providers and patients are collecting large amounts of detailed electronic health data. These data have enormous potential for scientific discoveries that help improve healthcare systems’ effectiveness, efficiency, and quality of care. However, the use of health data should promote public good, while protecting the privacy of the stakeholders (i.e., patients and healthcare providers). This goal is particularly challenging when the data are distributed across several healthcare providers and patients. Several secure protocols have been proposed for statistical analysis of distributed health data. However, there has been very limited practical use of these protocols that indicated the need to develop efficient and scalable frameworks that are suitable for practical use. This thesis discusses the design and evaluation of a framework for statistical analysis of health data horizontally partitioned across healthcare providers, while protecting privacy. The proposed framework is scalable and allows easy integration of a wide variety of functions. The framework also contains a secure deduplication protocol for preprocessing distributed data before statistical analyses. The deduplication protocol was experimentally evaluated in situ among three actual microbiology laboratories and in vitro among 20 simulated microbiology laboratories. The evaluation results showed that the framework is scalable and efficient for practical use. This thesis also discusses the design and evaluation of a framework that supports privacy-preserving statistical analysis of questionnaire data distributed across patients. The framework contains a set of secure protocols for computing several statistical analyses. It was theoretically and experimentally evaluated. The experimental evaluations were performed on an actual questionnaire dataset that was collected for a medical study and a simulated questionnaire dataset. The evaluations results showed that the framework is scalable and efficient for practical use.