TF class annotation
2023 Nov 6
Transcription factor
Transcription
Data
Trascription factor classification
There are many resources that provide TF class annotation, based on DNA binding domain amino acid sequence. For example, TFclass, TRANSFAC and JASPAR.
But here, I will process the KEGG version of TF class (originally from TRANSFAC), to a concise format. An example script is attached below:
download.file("https://www.genome.jp/kegg-bin/download_htext?htext=ko03000&format=htext&filedir=",
destfile = "data/ko03000.keg")
con <- file("data/ko03000.keg", open="r")
lin <- readLines(con)
.type <- NULL
.family <- NULL
.class <- NULL
out_tab <- data.frame()
for (i in seq_along(lin)) {
tmp <- lin[i]
init <- substring(tmp, 1, 1)
if (init == "A") {
.type <- gsub(".*<b>(.*)</b>.*", "\\1", tmp)
next
}
if (init == "B") {
.family <- gsub("B\\s*(.*)", "\\1", tmp)
next
}
if (init == "C") {
.class <- gsub("C\\s*(.*)", "\\1", tmp)
next
}
if (!is.null(.type) & init == "D") {
.KEGG <- gsub("D\\s*(.*)\\s\\s.*", "\\1", tmp)
.TF <- gsub("D\\s*.*\\s(.*);.*", "\\1", tmp)
.description <- gsub("D.*;\\s(.*)", "\\1", tmp)
out_tab <- rbind(out_tab,
data.frame(TF = .TF,
Family = .family,
Class = .class,
Type = .type,
Description = .description,
KEGG = .KEGG
))
}
}