Christopher A Cassa PhD - Fellows

Research Fellow


Christopher Cassa, Ph.D., a graduate of the Harvard-MIT Division of Health Sciences and Technology, is a research fellow at the Children’s Hospital Informatics Program in Children’s Hospital Boston and at Harvard Medical School. Applying quantitative approaches, he has conducted genome-wide investigations with regard to scientific and clinical validity of variants. More recently, Dr. Cassa has been deeply involved in The Gene Partnership at Children’s Hospital Boston where he has leveraged his computational expertise to interpret and communicate variants that are identified in research cohorts. In these studies, Dr. Cassa has developed algorithms to conduct whole genome interpretations that identify a patient’s most likely pathogenic and clinically significant variants, and has separately assessed the clinical and syntactic validity of previously described disease-associated variant databases.

Dr. Cassa has also researched a wide range of medical privacy and identifiability issues. Applying quantitative approaches, he has helped developed two anonymization techniques for geographical data and investigated the re-identification potential of geographical data shared in textual and map form. His most recent work has investigated the ability to infer genotypes from family members of research proband, and how readily research datasets can be used to identify family members and familial phenotypes.


The Gene Partnership: Return of Research Results

The Gene Partnership is a new compact between patients and researchers. The study allows researchers access to rich clinical phenotypes and genotypes of patients and provides interested patients with clinically significant and valid observations from research derived from their data. We have designed and developed infrastructure for return of research results, by which patients can be notified of genetic findings that are relevant to them when derived from a study that has Institutional Review Board (IRB) approval and the message has been formulated by an adjudicating ethical review body called the Informed Cohort Oversight Board (ICOB). This system reconnects patient subjects and researchers in a manner respectful of privacy and research oversight that maximizes both the public and individual benefit of biorepository research. With NLM support (1-RC1-LM010470-01), we have developed a system that includes a pipeline and a web application to enable investigators to report research discoveries and detailed genetic variant annotations to an oversight committee for review. Using this application, the oversight committee can review whether the study findings are suitable to communicate to participants, and if applicable, to manage and personalize messages that participants in the studies would receive.

The TGP targeted messaging mechanism employs PCHR infrastructure to allow messages to be precisely targeted to only the relevant subjects while keeping the researchers fully unaware of the identity of the subjects (i.e. preserving their anonymity) and enabling the subjects to decide whether or not to receive or act on these messages (i.e. preserving their autonomy). Moreover, the PCHR utilities in TGP allow the subject to continue to add to the research database and to refine the phenotypic data and to contribute additional biomaterials. In this fashion, TGP creates an ongoing, increasingly refined research cohort while allowing the patients/subjects to personally benefit from their involvement.

Medical Privacy and Identifiability

Electronic transmission of protected health information has become pervasive in research, clinical, and public health investigations, posing substantial risk to patient privacy. From clinical genetic screenings to publication of data in research studies, these activities have the potential to disclose identity, medical conditions, and hereditary data. To enable an era of personalized medicine, many research studies are attempting to correlate individual clinical outcomes with genomic data, leading to thousands of new investigations. Critical to the success of many of these studies is research participation by individuals who are willing to share their genotypic and clinical data with investigators, necessitating methods and policies that preserve privacy with such disclosures.

We explore quantitative models that allow research participants, patients and investigators to fully understand these complex privacy risks when disclosing medical data. This modeling will improve the informed consent and risk assessment process, for both demographic and medical data, each with distinct domain-specific scenarios. First, the de-identification and anonymization of geospatial datasets containing information about patient home addresses will be examined, using mathematical skewing algorithms as well as a linear programming approach. Next, we consider the re-identification potential of geospatial data, commonly shared in both textual form and in printed maps in journals and public health practice. We also explore methods to quantify the anonymity afforded when using these anonymization techniques. Last, we discuss the disclosure risk for genomic data, investigating both the risk of re-identification for SNPs and mutations, as well as the disclosure impact on family members.


Research Fellow, Harvard Medical School

Lecturer, Massachusetts Institute of Technology

Select Publications

. ();

1 Autumn St #543 Informatics
Boston, MA 02215