Removing host sequences in microbiome datasets

Last updated on Dec 2, 2019 1 min read 0 Comments

Removing host sequences to alleviate the time consuming assembly tasks is helpful when the host genome is available. There are a few steps that need to be followed to achieve the “dehosting” process.

1. Bowtie2 mapping to the host

Mapping all reads to the host genome allows to know which are the reads that need to be eliminated.

a. Create bowtie2 index database (host_DB) from host reference genome

bowtie2-build host_genome.fna host_DB

b. bowtie2 mapping against host sequence database, keep both mapped and unmapped reads (paired-end reads)

bowtie2 -x host_DB -1 SAMPLE_r1.fastq -2 SAMPLE_r2.fastq -S SAMPLE_mapped_and_unmapped.sam

c. Convert file `.sam` to `.bam`

samtools view -bS SAMPLE_mapped_and_unmapped.sam > SAMPLE_mapped_and_unmapped.bam

2. filter required unmapped reads

a. SAMtools SAM-flag filter: get unmapped pairs (both ends unmapped)

samtools view -b -f 12 -F 256 SAMPLE_mapped_and_unmapped.bam > SAMPLE_bothEndsUnmapped.bam

-f 12 Extract only (-f) alignments with both reads unmapped: -F 256 Do not(-F) extract alignments which are:

3. split paired-end reads into separated fastq files .._r1 .._r2

a. Sort bam file by read name (-n) to have paired reads next to each other as required by bedtools

samtools sort -n SAMPLE_bothEndsUnmapped.bam SAMPLE_bothEndsUnmapped_sorted

b. Convert bam to fastq

bedtools bamtofastq -i SAMPLE_bothEndsUnmapped_sorted.bam -fq SAMPLE_host_removed_r1.fastq -fq2 SAMPLE_host_removed_r2.fastq

bowtie2 metagenomics

Removing host sequences in microbiome datasets

1. Bowtie2 mapping to the host

a. Create bowtie2 index database (host_DB) from host reference genome

b. bowtie2 mapping against host sequence database, keep both mapped and unmapped reads (paired-end reads)

c. Convert file `.sam` to `.bam`

2. filter required unmapped reads

a. SAMtools SAM-flag filter: get unmapped pairs (both ends unmapped)

3. split paired-end reads into separated fastq files .._r1 .._r2

a. Sort bam file by read name (-n) to have paired reads next to each other as required by bedtools

b. Convert bam to fastq

Andres S. Espindola

Assistant Professor

Removing host sequences in microbiome datasets

1. Bowtie2 mapping to the host

a. Create bowtie2 index database (host_DB) from host reference genome

b. bowtie2 mapping against host sequence database, keep both mapped and unmapped reads (paired-end reads)

c. Convert file .sam to .bam

2. filter required unmapped reads

a. SAMtools SAM-flag filter: get unmapped pairs (both ends unmapped)

3. split paired-end reads into separated fastq files .._r1 .._r2

a. Sort bam file by read name (-n) to have paired reads next to each other as required by bedtools

b. Convert bam to fastq

Andres S. Espindola

Assistant Professor

c. Convert file `.sam` to `.bam`