PrivGen : Privacy-preserving sharing and processing of genetic data

Labex CominLabs & Labex Genmed Project

1 Context and objectives

Cloud computing has emerged as a successful paradigm allowing individuals and companies to flexibly store and process large amounts of data without a need to purchase and maintain their own networks and computer systems. However, cloud applications are subject to new security risks and risks to the privacy of data concerning the disclosure, ownership, and integrity of data. These risks are mainly due to the fact that data owners have to store their data at external providers that in turn may transmit data to third-party service providers, which often lead to loss of ownership over data, risks of disclosure at third-party sites or during transmission of data, and the risk of improper access to security-critical and private data by non-authorized parties. Moreover, it may be possible to combine data from several sources, thus worsening these security and privacy problems.

These problems are particularly important in the context of the sharing and processing of genetic data. Recent technological advances in molecular biology allow a better decryption of the human genome and harness it at the scale of hundreds or thousands of individuals. In order to facilitate an optimal use of sequencing data, it is necessary to share them with the scientific community. It is by collecting and confronting these data that we will have a better understanding of the genetic architecture of human diseases [AR15, Top15]. In PRIVGEN, the interest has been given to shared Genome-Wide Association Studies the purpose of which is to establish if a gene is involved in a disease [MCA+12]. In such a context, privacy issues are of major concern. Several studies have shown that genetic data are not anonymous and that they can be used to identify the individuals involved in a study [CMM12]. In the case of a network of genomic data sets on servers, or beacons, researchers from Stanford demonstrate that is possible to identify a genome (an individual) or a relative genome in a database of 1000 genomes, through 5000 simple requests [SB15a]. Thus if a malevolent person knows the genome of one individual, he can test different databases and identify the ones where the genome is present. If he accesses to the relationship phenotypic/pathologies, he will get information about the possible diseases the individual may suffer of. Beyond the fact a genome uniquely identifies its owner [SB15b], it is also partly shared with relatives and can inform about health/behavior of these relatives. Data producers may also have questions about what is really made of their data, and if they are exploited for the purposes that have been originally foreseen [EN14]. Knowing the origin of the data (authenticity) and that they have not been modified (integrity) contribute to the confidence users can have in the data they manipulate. Notice also that as part of health-care records, genomic data are very valuable for hackers [Sch18]. Right recognition of the scientists who are at the origin of the studies that have produced a data set is also of importance. Their work should be referenced. Genetic applications thus require a large set of security and privacy properties to be satisfied. Being able to access a secure platform that allows genetic data mutualization as well as access to high-performance computation is a key issue for the genetic community, internationally.

A large set of mechanisms has been developed to ensure such properties including a wide range of cryptographic mechanisms (a/symmetric, homomorphic, attribute based cryptography, …), watermarking schemes, fragmentation-based techniques, schemes that force the local execution of security-critical computations, etc. However, the current state of the usage of these mechanisms in order to satisfy security and privacy properties of cloud applications is unsatisfactory, in particular, for two fundamental reasons. First, cloud applications have to satisfy a large number of different security and privacy properties at once, thus requiring harnessing several different mechanisms that may interact at the same time. Second, they typically consist of calculations involving computations at different sites that are executed on behalf of multiple stakeholders. For these reasons, a cloud application requires the usage of compositions of several security and privacy mechanisms that are applied to compositions of complex computations. Currently, however, no systematic means for the composition of such mechanisms and their usage in cloud applications exists. Beyond, there is also a need for new multipurpose security mechanisms able to satisfy simultaneously several security objectives (e.g. confidentiality, privacy, traceability). Much work has to be done in this direction in order to enable the merging of different security mechanisms into a single digital content protection tool.

In order to address the problems motivated above, PRIVGEN objectives have been organized into three challenges:

• Mechanisms for a continuous digital content protection – The objective is to merge different security mechanisms into one data protection tool for continuous and multipurpose security protection.

• Composition of security and privacy-protection mechanisms – The concern here is to provide a development approach for secure and privacy-preserving distributed genetic applications.

• Distributed processing and sharing of genetic data – The finality of this challenge is to provide a method and platform for sharing a minimum set of relevant genomic information while maintaining privacy.

2 Mechanisms for a continuous digital content protection

The objective of this challenge is to provide new security mechanisms for data by merging different data protection solutions in order to provide a continuous and multipurpose security protection for externalized data.

Two main contributions were foreseen: i) the identification of the constraints and limits of actual security tools regarding the deployment of complex computations in outsourced environment; ii) the development of joint-security mechanisms compliant and configurable by a security service composition language.

2.1 Limits of actual security tools

In PRIVGEN, we focus on securing Genome-Wide Association Studies (GWAS) [WCVS19]. GWAS corresponds to an observational study to see if a set of genetic variants in genomes of different individuals (e.g. single nucleotide polymorphisms (SNP) located in a genomic region) is associated with a disease. Such a study can be seen as a statistical analysis and is usually performed in-between two parties: a Genomic Research Unit (GRU) with sequencing data on a sample of individuals affected by a disease D (cases) and (2) a Genomic Research Center (GRC) with sequencing data on a sample of healthy individuals (controls).

Securing externalized genetic association studies does not simply mean securing the storage and transmission of genomic data [KJLM08, GAAM+17]. Indeed, one party involved in such a study may not want other parties access his data, the objective and the conclusions of the study, these ones being highly valuable assets. Under the honest but curious adversarial model, we considered in our work, the trust a party can have in a cloud service provider and other parties is quite relative. Thus, the data analysis algorithm itself and the way it is shared between parties have to be secured. Different methods have been proposed to perform privacy-preserving GWAS. They can be differentiated depending on the cryptographic techniques they rely on: Differential Privacy (DP) [THHA15], Homomorphic Encryption (HE) [LYS15, BMA+18], Secure Multiparty Computation (SMC) [Blo19] and Secure Hardware (SH) [SAM+19]. Notice that none of them consider all users’ data are externalized.

On this basis and in order to refine the security needs for genetic data in outsourced environments as well as to identify the limits of the above solutions we worked on two aspects:

• Securing a simple case-control association based on logistic regression. We decided to work on this algorithm first for the non-geneticist PRIVGEN members to enter in the domain, second, because it was not secure and, at least, because it has not been yet externalized. To do so, we opted for HE and SMC. Our choice in not using DP stands on the fact it introduces a noise in the data that can interfere with GWAS analysis. To make the solution more efficient, we suggest an original data packing strategy. It reduces communication complexity and computation complexity as this one allows processing data in parallel. A conference paper is actually under writing [DRT+20].

• Securing neural network based machine learning in this context. We succeed to build the first Secure Multi-Layer-Perceptron able to learn on encrypted data without disclosing the resulting model [BCGC18]. More clearly, this SMLP : i) can be trained by a cloud server with homomorphically encrypted data; ii) has all its parameters homomorphically encrypted giving thus no clues to the cloud; and iii) can be used for classifying new encrypted data returning the classification result encrypted. It does not require extra-communications between the server and the user. It is based on the Rectified Linear Unit (ReLU) activation function that we secure with no approximation contrarily to actual SMLP solutions.

In a general way, all above solutions suffer of the limits of HE in terms of efficiency (GWAS in clear can already take several days [MB09]) and accuracy (function approximation for non-linear function). We solve most of the computational overhead induced by HE by proposing a new privacy preserving GWAS framework where GRU acts as a proxy. We also introduce an unforeseen privacy constraint.

2.2 Joint-security mechanisms

When outsourcing data into the cloud, users may have questions about the integrity, the confidentiality and the traceability of their data. At the same time, the cloud service provider may want to protect the data under his responsibility. We developed different crypto-watermarking and watermarking solutions to respond these problems.

Under the hypothesis genetic data are stored homomorphically encrypted into relational databases, we proposed a crypto-watermarking tools [NCB+18] that allows the cloud to watermark such database in a dynamic way in order to protect the integrity of data. This scheme is dynamic in the sense data can be updated by their respective owner (addition, suppression and modification of data) without having to re-watermark the whole database. Moreover, this scheme allows the cloud identifying which elements of the database have been modifying (storage errors or malicious modifications). Notice that with our scheme the watermarking process does not modify owners’ data.

To complete this work from the user/client point of view, we worked on specific watermarking modulations for genetic data externalized for GWAS studies. It is important to notice that if there exists data-hiding data for genetic data, these ones mainly focus on cellular DNA for steganographic purpose [WHCS19], copyright protection [RKY19] or for data storage [BGH+16]. We are the first to provide watermarking schemes for genetic data used in GWAS study. Such pieces of genome data are collected from one or a set of individuals and stored in a VCF file [VCF19]. Our solutions are robust or reversible but have been designed so as to ensure the distortion induced by watermarking processes do not interfere with the GWAS algorithms studied in PRIVGEN. Both are derived from database watermarking [CC17] due to closeness of VCF files with relational database. The robust one [RMEG19] can be used for traitor tracing and copyright protection. The reversible scheme is based on a extension of the histogram shifting modulation [GRD+20].

3 Compositional secure and privacy-aware genetic analyses

The objective of this challenge is to provide a compositional development approach for secure and privacy-preserving distributed genetic applications. Two main contributions were targeted: i) a practical model for such applications and ii) corresponding programmatic means.

3.1 Limitations of current development approaches for genetic applications

Geneticists and engineers from the biomedical field rely intensively on cluster infrastructures to run their experiments and analyses for high performance. Tools already exist, like the Nextflow and SnakeMake workflo engines, for the definition of dataflows and workflows that can interface with cluster architectures. However, the increasing need for data and computations distributed as part of large collaborative studies points to significant limitations of these tools. Indeed, projects like the GWA studies or I-CAN [BGS+19] are increasingly requiring large amounts of data that are analyzed using distributed computations performed on heterogeneous software infrastructures with different security and privacy policies. Furthermore, different partners provide different computational and storage facilities, such as local clusters and federated raids. In this context, data flows cannot be defined anymore in terms of simple dataflows or workflows such as provided by the commonly-used Nextflow and SnakeMake tools and development environments.
Support is missing, in particular, to store data separately such that data anonymity is preserved and data identification supported. Support is also lacking for the tracability of data, its encryption and the relocation of data as needed by distributed analyses. From a pragmatic viewpoint, current tools only support medical workflows that do not allow to represent high-level architectural abstractions, such as different cloud domains.

As part of the PRIVGEN project, we have provided two contributions that respond to the above limitations:

An architecture-based method for the definition of high-level definition of genetic analyses in terms of compositions of different types of cloud infrastructures as well as organization-owned computing and data storage infrastructures.
An object-oriented framework supporting compositions of a large range of security and privacy-preservation operators.

3.2 Architecture-based distributed processing and sharing of genetic data

We have defined a method that enables medical and bioinformatics researchers to compose high-level architectures for genetic analyses based on a set of architectural elements, such as (private, public, hybrid) clouds and computational and storage infrastructures that are provided by clouds or individual organizations.
These elements can be connected into (logical or physical) architectures that enable large-scale distributed cooperations to be represented. These cooperations satisfy security and privacy properties that are composed in terms of the properties of the basic architectural elements (hospitals, clouds) and their composition properties.

3.3 Compositional secure and privacy-aware distributed analyses

We have complemented the high-level architectures by a program-level framework that provides advanced security and privacy operators as well as corresponding composition means. These operators include advanced encryption mechanisms, such as homomorphic and attribute-based encryption, data base fragmentation mechanisms (useful, for instance, to prevent data re-identification), data relocation in more secure, e.g. local, contexts and several watermarking operators.
These operators can be used to implement security and privacy policies that have been statically defined as part of the architectures introduced in Sec. 3.2

4 Distributed processing and sharing of genetic data

The objective of this last challenge was to provide a method and platform to perform association testing against a reference panel of healthy individuals that are compared against patients. We went further with the integration of two platforms.

4.1 Platform: Architecture-based distributed processing and sharing of genetic data

We have implemented the framework introduced in Sec. 3.3 in Java. This implementation eases the use of fragmentation and a/symmetric encryption by wrapping them within an API that classifies algorithms by their properties. It also provides the first integration of watermarking operators for genetic algorithms.
The framework also allows the handling of abstractions commonly used in medical and bioinformatics concepts, such as VCF files and attribute files (like FAM files) with the perspective to generate contingency tables, encrypt, fragment and watermark the attribute file. We also support SGX-based hardware-level security and privacy.
Once the distributed architecture is known and the workflow defined and implemented, the question of system reconfiguration of outsourced data or and remote processing comes into play. We have devised a corresponding framework for the reconfiguration process.
Finally, we have defined and implemented two case studies that illustrate the different components of our approach to large-scale distributed genetic analyses.

4.2 Platform PRIVAS: a tool to perform Privacy Preserving Association Studies

PRIVAS is a new privacy-preserving framework and a concrete platform to perform rare variant case control association tests with information provided by two parties: a Genomic Research Unit (GRU) with sequencing data from individuals affected by a disease D (cases); a Genomic Research Center (GRC) with sequencing data from healthy individuals (controls). To search for genes containing rare variants involved in D, GRU needs to compare all cases against all controls using association tests (i.e. genome-wide association study). The main originality of our proposal [TRD+19][RTD+20] is twofold : i) it positions GRC as a proxy between GRU and the server making it possible to use classical cryptographic tools (secret key based cryptographic hashing nd PGP) to securely conduct association tests with no computation complexity increase, contrarily to actual state of the art proposals; ii) we satisfy a new important constraint: GRU identity should remain unknown from the server as this knowledge can give it clues about GRU data. This framework is generic to different GWAS algorithms (WSS, , SCAT and CAST) and has been integrated as the Platform PRIVAS on the supercomputer Datarmor of IFREMER (http://lysine.univ-brest.fr/team/index.php/tools/privas/).

5 Publications, talks, demonstrator and valorization

5.1 Accepted publications

• Fatima-zahra Boujdad, Alban Gaignard, Mario Südholt, Wilmer Garz_on-Alfonso, Luis Daniel Benavides Navarro, and Richard Redon. On distributed collaboration for biomedical analyses. In Workshop on Clusters, Clouds and Grids for Life Sciences 2019, pages 1-10, Larnaca, Cyprus, May 2019. IEEE.

• Reda Bellafqira, Gouenou Coatrieux, Emmanuelle Genin, and Michel Cozic. Secure multilayer perceptron based on homomorphic encryption. In Digital Forensics and Watermarking – 17th International Workshop, IWDW 2018, Jeju Island, Korea, October 22-24,2018, Proceedings, pages 322-336. Springer, 2018

• David Niyitegeka, Gouenou Coatrieux, Reda Bellafqira, Emmanuelle G_enin, and Javier Franco-Contreras. Dynamic watermarking-based integrity protection of homomorphically encrypted databases – application to outsourced genetic data. In Digital Forensics and Watermarking – 17th International Workshop, IWDW 2018, Jeju Island, Korea, October 22-24,2018, Proceedings, pages 151-166, 2018

• Fatima-zahra Boujdad and Mario Südholt. Constructive Privacy for Shared Genetic Data. In CLOSER 2018 – 8th International Conference on Cloud Computing and Services Science, Proceedings of CLOSER 2018, pages 1{8, Funchal, Madeira, Portugal, March 2018

• Javier Franco Contreras and Gouenou Coatrieux. Protection of relational databases by means of watermarking: Recent advances and challenges. Security in computing and communications, pages 101-123, 2017

5.2 Submitted publications and publications under preparation

• Boujdad Fatima-Zahra, Niyitegeka David, Bellafqira Reda, Genin Emmanuelle, Coatrieux Gouenou, and Südholt Mario. A hybrid cloud architecture for performing privacy-preserving genome-wide association studies. 14th International Conference on Cloud Computing Security and Applications, under preparation, 2020

• Bellafqira Reda, Ludwing Thomas, Niyitegeka David, Genin Emmanuelle, and Coatrieux Gouenou. Privacy-preserving genome-wide association study for rare mutations – a secure framework for externalized statistical analysis. IEEE Access, submitted, 2020

• Niyitegeka David, Bellafqira Reda, Ludwing Thomas, Genin Emmanuelle, and Coatrieux Gouenou. Secure collapsing test based on fully homomorphic encryption. In European Conference on Genetic Programming, under preparation. Springer, 2020

• Wilmer Garzon, Daniel Benavides, Mario Südholt. A survey of analytics tools for distributed biomedical analyses. Journal of Biomedical Informatics. Under preparation.

• Coatrieux Gouenou, Bellafqira Reda, Niyitegeka David, Ludwing Thomas, and Genin Emmanuelle. Lossless watermarking for genomic data. IEEE Transactions on Information Forensics and Security, under preparation, 2020

• Bellafqira Reda, El Ghadi Musab, Genin Emmanuelle, and Coatrieux Gouenou. Robust watermarking for genetic data traceability in externalized gwas frameworks. IEEE/ACM transactions on computational biology and bioinformatics, under preparation, 2019

5.3 Invited talks and posters

• T. Ludwig, Invited talk, PrivAS : a tool to perform Privacy Preserving Association Studies, 11ème réunion annuelle de l’Institut Thématique Multi-Organismes Technologies pour la santé, 2-3 octobre 2019.

• E. Génin, Invited talk, FranceGenRef â€“ Results and Perspectives, Third GENMED workshop, Institut Pasteur, Paris, 2018.

• G. Coatrieux, Invited talk, Security of Medical and of Genetic Data, Inserm Workshop, Use of next generation sequencing data in the study of human diseases: statistical methods and applications, 2017.

• M. Südholt, Invited talk: “Compositional security and privacy for biomedical analyses using shared genetic data”, Seminar IBF Biosphere, CHU Nantes, 2018.

• M. Südholt, Invited talk: “Sharing and processing of genetic data”, Seminar “IA and Health”, IMT, Paris, 2018.

• All PRIVGEN Members, Poster, PRIVGEN: Privacy-preserving sharing and processing of genetic data, poster, 11ème réunion annuelle de l’Institut Thématique Multi-Organismes Technologies pour la santé, 2-3 octobre 2019.

5.4 Demonstrators

• PRIVAS – a tool to perform Privacy Preserving Association Studies, http://lysine.univ-brest.fr/team/index.php/tools/privas/. This solution aims at giving access to a secure platform for genetic research units internationally.

• COSHED – Constructive Security for Shared Data for Biomedical Analyses. Demo and case studies to be published<

References

[AR15] Samuel J Aronson and Heidi L Rehm. Building the foundation for genomics in precision medicine. Nature, 526(7573):336, 2015.

[BCGC18] Reda Bellafqira, Gouenou Coatrieux, Emmanuelle Genin, and Michel Cozic. Secure multilayer perceptron based on homomorphic encryption. In International Workshop on Digital Watermarking, pages 322–336. Springer, 2018.

[BGH+16] Meinolf Blawat, Klaus Gaedke, Ingo Huetter, Xiao-Ming Chen, Brian Turczyk, Samuel Inverso, Benjamin W Pruitt, and George M Church. Forward error correction for dna data storage. Procedia Computer Science, 80:1011–1022, 2016.

[BGS+19] Fatima-zahra Boujdad, Alban Gaignard, Mario Südholt, Wilmer Garzón-Alfonso, Luis Daniel Benavides Navarro, and Richard Redon. On distributed collaboration for biomedical analyses. In Workshop on Clusters, Clouds and Grids for Life Sciences 2019, pages 1–10, Larnaca, Cyprus, May 2019. IEEE.

[Blo19] Jonathan M Bloom. Secure multi-party linear regression at plaintext speed. arXiv preprint arXiv:1901.09531, 2019.

[BMA+18] Charlotte Bonte, Eleftheria Makri, Amin Ardeshirdavani, Jaak Simm, Yves Moreau, and Frederik Vercauteren. Privacy-preserving genome-wide association study is practical. IACR Cryptology ePrint Archive, 2018:955, 2018.

[BS18] Fatima-zahra Boujdad and Mario Südholt. Constructive Privacy for Shared Genetic Data. In CLOSER 2018 – 8th International Conference on Cloud Computing and Services Science, Proceedings of CLOSER 2018, pages 1–8, Funchal, Madeira, Portugal, March 2018.

[CC17] Javier Franco Contreras and Gouenou Coatrieux. Protection of relational databases by means of watermarking: Recent advances and challenges. Security in computing and communications, pages 101–123, 2017.

[CMM12] Christopher A Cassa, Rachel A Miller, and Kenneth D Mandl. A novel, privacy-preserving cryptographic approach for sharing sequencing data. Journal of the American Medical Informatics Association, 20(1):69–76, 2012.

[DRT+20] Niyitegeka David, Bellafqira Reda, Ludwing Thomas, Genin Emmanuelle, and Coatrieux Gouenou. Secure collapsing test based on fully homomorphic encryption. In European Conference on Genetic Programming, page under writing. Springer, 2020.

[EN14] Yaniv Erlich and Arvind Narayanan. Routes for breaching and protecting genetic privacy. Nature, 15:409, 2014.

[GAAM+17] Reza Ghasemi, Md Momin Al Aziz, Noman Mohammed, Massoud Hadian Dehkordi, and Xiaoqian Jiang. Private and efficient query processing on outsourced genomic databases. IEEE journal of biomedical and health informatics, 21(5):1466–1472, 2017.

[GRD+20] Coatrieux Gouenou, Bellafqira Reda, Niyitegeka David, Ludwing Thomas, and Genin Emmanuelle. Lossless watermarking for genomic data. IEEE Transactions on Information Forensics and Security, page under writing, 2020.

[KJLM08] Murat Kantarcioglu, Wei Jiang, Ying Liu, and Bradley Malin. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on information technology in biomedicine, 12(5):606–617, 2008.

[LYS15] Wen-Jie Lu, Yoshiji Yamada, and Jun Sakuma. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. In BMC medical informatics and decision making, volume 15, page S1. BioMed Central, 2015.

[MB09] Bo Eskerod Madsen and Sharon R. Browning. A groupwise association test for rare mutations using a weighted sum statistic. PLOS Genetics, 5(2):1–11, 02 2009.

[MCA+12] Nicola Milia, Alessandra Congiu, Paolo Anagnostou, Francesco Montinaro, Marco Capocasa, Emanuele Sanna, and Giovanni Destro Bisol. Mine, yours, ours? sharing data on human genetic variation. PloS one, 7(6):e37552, 2012.

[NCB+18] David Niyitegeka, Gouenou Coatrieux, Reda Bellafqira, Emmanuelle Génin, and Javier Franco-Contreras. Dynamic watermarking-based integrity protection of homomorphically encrypted databases – application to outsourced genetic data. In Digital Forensics and Watermarking – 17th International Workshop, IWDW 2018, Jeju Island, Korea, October 22-24, 2018, Proceedings, pages 151–166, 2018.

[RKY19] Mohammad Saidur Rahman, Ibrahim Khalil, and Xun Yi. A lossless dna data hiding approach for data authenticity in mobile cloud based healthcare systems. International Journal of Information Management, 45:276–288, 2019.

[RMEG19] Bellafqira Reda, El Ghadi Musab, Genin Emmanuelle, and Coatrieux Gouenou. Robust watermarking for genetic data traceability in externalized gwas frameworks. IEEE/ACM transactions on computational biology and bioinformatics, page under writing, 2019.

[RTD+20] Bellafqira Reda, Ludwing Thomas, Niyitegeka David, Genin Emmanuelle, and Coatrieux Gouenou. Privacy-preserving genome-wide association study for rare mutations – a secure framework for externalized statistical analysis. IEEE Access, page submitted, 2020.

[SAM+19] Md Nazmus Sadat, Al Aziz, Md Momin, Noman Mohammed, Feng Chen, Xiaoqian Jiang, and Shuang Wang. Safety: Secure gwas in federated environment through a hybrid solution. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 16(1):93–102, 2019.

[SB15a] Suyash S Shringarpure and Carlos D Bustamante. Privacy risks from genomic data-sharing beacons. The American Journal of Human Genetics, 97(5):631–646, 2015.

[SB15b] Suyash S Shringarpure and Carlos D Bustamante. Privacy risks from genomic data-sharing beacons. The American Journal of Human Genetics, 97(5):631–646, 2015.

[Sch18] Jennifer Schlesinger. Dark web is fertile ground for stolen medical records, 2018.

[THHA15] Florian Tramèr, Zhicong Huang, Jean-Pierre Hubaux, and Erman Ayday. Differential privacy with bounded priors: reconciling utility and privacy in genome-wide association studies. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1286–1297. ACM, 2015.

[Top15] Eric J Topol. The big medical data miss: challenges in establishing an open medical resource. Nature Reviews Genetics, 16(5):253, 2015.

[TRD+19] Ludwig Thomas, Bellafqira Reda, Niyitegeka David, Salas Daniel, Perseil Isabelle, Coatrieux Gouenou, and Génin Emmanuelle. Privas: a tool to perform privacy-preserving association studies. In JOBIM, 2019.

[VCF19] The variant call format specificationvcfv4.3 and bcfv2.2, 2019.

[WCVS19] Maggie Haitian Wang, Heather J Cordell, and Kristel Van Steen. Statistical methods for genome-wide association studies. In Seminars in cancer biology, volume 55, pages 53–60. Elsevier, 2019.

[WHCS19] Yanfeng Wang, Qinqin Han, Guangzhao Cui, and Junwei Sun. Hiding messages based on dna sequence and recombinant dna technique. IEEE Transactions on Nanotechnology, 18:299–307, 2019.