TF class annotation

2023 Nov 6 Transcription factor Transcription Data

Trascription factor classification

There are many resources that provide TF class annotation, based on DNA binding domain amino acid sequence. For example, TFclass, TRANSFAC and JASPAR.

But here, I will process the KEGG version of TF class (originally from TRANSFAC), to a concise format. An example script is attached below:

download.file("https://www.genome.jp/kegg-bin/download_htext?htext=ko03000&format=htext&filedir=", 
              destfile = "data/ko03000.keg")

con <- file("data/ko03000.keg", open="r")
lin <- readLines(con)

.type <- NULL
.family <- NULL
.class <- NULL

out_tab <- data.frame()

for (i in seq_along(lin)) {
  tmp <- lin[i]
  init <- substring(tmp, 1, 1)
  
  if (init == "A") {
    .type <- gsub(".*<b>(.*)</b>.*", "\\1", tmp)
    next
  }
  
  if (init == "B") {
    .family <- gsub("B\\s*(.*)", "\\1", tmp)
    next
  }
  
  if (init == "C") {
    .class <- gsub("C\\s*(.*)", "\\1", tmp)
    next
  }
  
  if (!is.null(.type) & init == "D") {
    .KEGG <- gsub("D\\s*(.*)\\s\\s.*", "\\1", tmp)
    .TF <- gsub("D\\s*.*\\s(.*);.*", "\\1", tmp)
    .description <- gsub("D.*;\\s(.*)", "\\1", tmp)
    out_tab <- rbind(out_tab,
                     data.frame(TF = .TF,
                                Family = .family,
                                Class = .class,
                                Type = .type,
                                Description = .description,
                                KEGG = .KEGG
                                ))
  }
}