Genomic ranges overlap and intersection

2022 May 23 Bioconductor Genomic Ranges R


A simple tutorial of genomic ranges overlap summary and intersection with Bioconductor package GenomicRanges.

library(GenomicRanges)   # for GRanges overlaps
library(rtracklayer)     # for import/export

# load files to GRanges objects
# A_gr <- import.bed("A.bed")
# B_gr <- import.bed("B.bed")

# Toy examples:
A_gr <- GenomicRanges::GRanges(seqnames = "chr1", 
                               IRanges(start = c(1, 200), end = c(100, 300)),
                               strand = "+")

B_gr <- GenomicRanges::GRanges(seqnames = "chr1", 
                               IRanges(start = 40, end = 120),
                               strand = "-")

The overlap of A and B can be created with a pair-wide match table.

mtch_tbl <- findOverlaps(query = A_gr,          # e.g. peaks
                         subject = B_gr,        # e.g. genes
                         maxgap = -1L,          # defualt, or define a gap
                         ignore.strand = T)     # ignore.strand defualt is FALSE, to fix the strandness, but if the strand is "*", the results will not account the strand information

This table is useful for stats, and for pairing analysis.

A_match_num <- sum(countQueryHits(mtch_tbl) > 0)
B_match_num <- sum(countSubjectHits(mtch_tbl) > 0)

GRanges object is indexible, and this example gives the A intervals that overlap with B.

A_gr[countQueryHits(mtch_tbl) > 0]

To find their intersections is a different problem. The intersect() function allows to find the overlapped parts of A and B, but the defualt fixes the strandness.

intersect(A_gr, B_gr)
## GRanges object with 0 ranges and 0 metadata columns:
##    seqnames    ranges strand
##       <Rle> <IRanges>  <Rle>
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

intersect() can also take additional arguments

intersect(A_gr, B_gr, ignore.strand = T) 
## GRanges object with 1 range and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1    40-100      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Since it is also dispatched by GenomicRanges

methods("intersect")[grep("GenomicRanges", methods("intersect"))]
## [1] "intersect,GenomicRanges,GenomicRanges-method"
## [2] "intersect,GenomicRanges,Vector-method"       
## [3] "intersect,Vector,GenomicRanges-method"

To write the intersection object, it can be:

A_B_intersect_gr <- intersect(A_gr, B_gr, ignore.strand = T)
A_B_intersect_gr$name <- "A_and_B"
A_B_intersect_gr$score <- 0

export.bed(A_B_intersect_gr, "A_B_intersect.bed")