GeneImp is an R package that implements genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing.
In this setting data have a high-level of missingness or uncertainty and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on pre-phasing for computational efficiency, and without definite genotype calls the pre-phasing task becomes computationally expensive.
GeneImp does not require pre-phasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalises on the existence of very large reference panels and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered.
GeneImp imputation is based on a sliding window, where each window corresponds to a short region of the genome, and is performed chromosome by chromosome independently for each target individual, therefore it can be trivially parallelized to run on systems of any size, subject to RAM availability.
At the moment the package only supports files in Variant Call Format (VCF).
To run GeneImp, it’s enough to specify the target and reference file names, in which case the default settings (which we expect to be a good starting point) for all other parameters will apply:
imputevcf(vcfname, ref.vcfname)
GeneImp requires most memory during the generation of the big.matrix
object to store the reference panel. This operation needs to be done only once: in subsequent runs, GeneImp will only need to load the existing file-backed object (the .bin and .desc file generated during the first read).
If you are working on a cluster with limited memory on each of the nodes, you may want to consider running the generation of the file-backed objects somewhere else, then copy the generated .bin and .desc files in the directory specified by temp.dir
(by default the current directory).
GeneImp is already parallelised over samples, as each sample is imputed independently of all others. If you have computational resources, you can increase parallelism by splitting target data into chromosomes (but we do not recommend splitting chromosomes into chunks, as imputation at the edges of each chunk will not be as good).
Even further parallelism can be obtained by assigning samples to different target files, which would then need to be merged manually at the completion of the imputation.
The GeneImp package contains an object file (impute.o) pre-compiled for some Linux 64-bit systems (see below for licensing information related to it). If you encounter compilation errors or require further support, contact Pharmatics.
Current version: GeneImp Version 1.3 (19 March 2018)
Older versions:
The R portion of the package is Copyright (C) 2015-2018 Athina Spiliopoulou, Marco Colombo, Paul McKeigue (University of Edinburgh) and is released under the GNU General Public License version 2 as published by the Free Software Foundation (see https://www.gnu.org/licenses/gpl-2.0.html for full text).
File impute.o is governed by the following license:
Copyright (c) [2015-2016] Pharmatics Limited All Rights Reserved.
PROPRIETARY AND CONFIDENTIAL
NOTICE: All information contained herein is, and remains the property
of Pharmatics Limited. The intellectual and technical concepts contained
herein are proprietary to Pharmatics Limited, may be covered by patents
and patent applications, and are protected by trade secret or copyright law.
Copying and exploitation of this software is permitted strictly for
academic non-commercial research and teaching purposes only. Without
limitation, any reproduction, modification, sub-licensing,
redistribution of this software, or any commercial use of any part of
this software including corporate or for-profit research or as the basis
of commercial product or service provision, is strictly forbidden unless
a license is obtained from Pharmatics Limited by contacting
info@pharmaticsltd.com.
THE CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR PARTICULAR PURPOSE.
IN NO EVENT SHALL PHARMATICS LIMITED BE LIABLE TO ANY PARTY FOR DIRECT,
INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST
PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION,
EVEN IF PHARMATICS LIMITED HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.