Genomic ranges overlap and intersection
2022 May 23
Bioconductor
Genomic Ranges
R
A simple tutorial of genomic ranges overlap summary and intersection with Bioconductor
package GenomicRanges
.
library(GenomicRanges) # for GRanges overlaps
library(rtracklayer) # for import/export
# load files to GRanges objects
# A_gr <- import.bed("A.bed")
# B_gr <- import.bed("B.bed")
# Toy examples:
A_gr <- GenomicRanges::GRanges(seqnames = "chr1",
IRanges(start = c(1, 200), end = c(100, 300)),
strand = "+")
B_gr <- GenomicRanges::GRanges(seqnames = "chr1",
IRanges(start = 40, end = 120),
strand = "-")
The overlap of A and B can be created with a pair-wide match table.
mtch_tbl <- findOverlaps(query = A_gr, # e.g. peaks
subject = B_gr, # e.g. genes
maxgap = -1L, # defualt, or define a gap
ignore.strand = T) # ignore.strand defualt is FALSE, to fix the strandness, but if the strand is "*", the results will not account the strand information
This table is useful for stats, and for pairing analysis.
A_match_num <- sum(countQueryHits(mtch_tbl) > 0)
B_match_num <- sum(countSubjectHits(mtch_tbl) > 0)
GRanges object is indexible, and this example gives the A intervals that overlap with B.
A_gr[countQueryHits(mtch_tbl) > 0]
To find their intersections is a different problem. The intersect()
function allows to find the overlapped parts of A and B, but the defualt fixes the strandness.
intersect(A_gr, B_gr)
## GRanges object with 0 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
intersect()
can also take additional arguments
intersect(A_gr, B_gr, ignore.strand = T)
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 40-100 *
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Since it is also dispatched by GenomicRanges
methods("intersect")[grep("GenomicRanges", methods("intersect"))]
## [1] "intersect,GenomicRanges,GenomicRanges-method"
## [2] "intersect,GenomicRanges,Vector-method"
## [3] "intersect,Vector,GenomicRanges-method"
To write the intersection object, it can be:
A_B_intersect_gr <- intersect(A_gr, B_gr, ignore.strand = T)
A_B_intersect_gr$name <- "A_and_B"
A_B_intersect_gr$score <- 0
export.bed(A_B_intersect_gr, "A_B_intersect.bed")