Create a Reference Panel

Deprecated Documentation

⚠️ This documentation is deprecated and is no longer maintained. The latest documentation can be found at Michigan Imputation Server 2.

This tutorial will help you to create your own reference panel and integrate it into Michigan Imputation Server.

Required Software

  • To create the m3vcf files for imputation, please use Minimac3.
  • To create the bcf files for phasing, please use bcftools and tabix.
  • To create the legend files for QC, please use vcftools or bcftools.

Folder Structure

We recommend the following folder structure:

my-ref-panel
├── cloudgene.yaml
├── bcfs
|   ├── chr1.bcf
    ├── chr1.bcf.csi
|   ├── ...
    ├── chr22.bcf
|   └── chr22.bcf.csi
├── legends
|   ├── chr1.legend.gz
|   ├── ...
|   └── chr22.legend.gz
├── m3vcfs
|   ├── chr1.m3vcf.gz
|   ├── ...
|   └── chr22.m3vcf.gz
├── map
|   └── genetic_map_hg19_withX.txt.gz
└── README.md

Init (build GRCh37/hg19)

Create a new folder and add a cloudgene.yaml file.

name:  My Reference Panel name
id: unique-id
description: a short description
category: RefPanel
version: 1.0.0
website: http://my-reference-panel.com

properties:
  hdfs: ${hdfs_app_folder}/m3vcfs/chr$chr.m3vcf.gz
  legend: ${local_app_folder}/legends/chr$chr.legend.gz
  mapEagle: ${hdfs_app_folder}/map/genetic_map_hg19_withX.txt.gz
  refEagle: ${hdfs_app_folder}/bcfs/chr$chr.bcf
  build: hg19
  samples:
    all: 2504
    mixed: -1
  populations:
    all: ALL
    mixed: Other/Mixed

installation:

  - import:
      source: ${local_app_folder}/bcfs
      target: ${hdfs_app_folder}/bcfs

  - import:
      source: ${local_app_folder}/m3vcfs
      target: ${hdfs_app_folder}/m3vcfs

  - import:
      source: ${local_app_folder}/map
      target: ${hdfs_app_folder}/map

Adaptions for build GRCh38/hg38

For a reference panel build 38, the following options must be added to properties in the cloudgene.yaml file:

    mapMinimac: ${app_hdfs_folder}/map/geneticMapFile.b38.map.txt   
    mapEagle: ${app_hdfs_folder}/map/genetic_map_hg38_withX.txt.gz
    build: hg38

Prepare VCF files

Michigan Imputaiton Server requires each chromosome in a seperated file. Chromosome X must be split into three parts: chrX.PAR1, chrX.PAR2 and chrX.nonPAR. Use bcftools to split by region:

bcftools view <vcf-input> -r <region> -o <vcf-out> -O z

Chromosome X regions GRCh37/hg19

Use the following regions for the -r option:

X:60001-2699520 (chrX.PAR1)
X:2699521-154931043 (chrX.nonPAR)
X:154931044-155260560 (chrX.PAR2)

Chromosome X regions GRCh38/hg38

chrX:10001-2781479 (chrX.PAR1)
chrX:2781480-155701382 (chrX.nonPAR)
chrX:155701383-156030895 (chrX.PAR2)

Create bcf files

BCF files are required for phasing with eagle.

for chr in `seq 1 22` X.nonPAR X.PAR1 X.PAR2
do
    bcftools view chr${chr}.vcf.gz -O b -o chr${chr}.bcf
    bcftools index chr${chr}.bcf
done

Create m3vcf files

m3vcf files are used to store large reference panels in a compact way. Learn more about the file format here. For GRCh38/hg38, --mychromosome must be added, since chromosomes are coded as chr1 - chr22.

for chr in `seq 1 22` X.nonPAR X.PAR1 X.PAR2
do
    Minimac3 --refHaps chr${chr}.vcf.gz --processReference --prefix m3vcfs/chr${chr} --rsid
done

Create legend files

A legend file is a tab-delimited file consisting of 5 columns (id, position, a0, a1, all.aaf).

echo "id position a0 a1 all.aaf" > header
    for chr in `seq 1 22` X
    do
    bcftools query -f '%CHROM %POS %REF %ALT %AC %AN\n' chr${chr}.bcf |  awk -F" " 'BEGIN { OFS = " " } {print $1":"$2 " " $2 " " $3 " "$4  " "  $5/$6}' | cat header - | bgzip > chr${chr}.legend.gz 
done

or in case AC / AN is not defined:

echo "id position a0 a1 all.aaf" > header
for chr in `seq 1 22`
do
    vcftools --gzvcf chr${chr}.vcf.gz --freq --out chr${chr}
    sed 's/:/\t/g' chr${chr}.frq | sed 1d | awk '{print $1":"$2" "$2" "$5" "$7" "$8}' > chr${chr}.legend
    cat header chr${chr}.legend | bgzip > chr${chr}.legend.gz
    rm chr${chr}.legend
done

Reference genetic maps

The genetic maps for eagle (hg19/hg38) can be found here.

Integrate your new reference panel

The created folder structure must be compressed to a zip archive and can now be integrated into Michigan Imputation Server. Please see here to start a Docker container and integrate the panel. A full working zip archive for Hapmap can be found here.