2023 Nov 6 Transcription factor Transcription Data

There are many resources that provide TF class annotation, based on DNA binding domain amino acid sequence. For example, TFclass, TRANSFAC and JASPAR.

But here, I will process the KEGG version of TF class (originally from TRANSFAC), to a concise format. An example script is attached below:

              destfile = "data/ko03000.keg")

con <- file("data/ko03000.keg", open="r")
lin <- readLines(con)

.type <- NULL
.family <- NULL
.class <- NULL

out_tab <- data.frame()

for (i in seq_along(lin)) {
  tmp <- lin[i]
  init <- substring(tmp, 1, 1)
  if (init == "A") {
    .type <- gsub(".*<b>(.*)</b>.*", "\\1", tmp)
  if (init == "B") {
    .family <- gsub("B\\s*(.*)", "\\1", tmp)
  if (init == "C") {
    .class <- gsub("C\\s*(.*)", "\\1", tmp)
  if (!is.null(.type) & init == "D") {
    .KEGG <- gsub("D\\s*(.*)\\s\\s.*", "\\1", tmp)
    .TF <- gsub("D\\s*.*\\s(.*);.*", "\\1", tmp)
    .description <- gsub("D.*;\\s(.*)", "\\1", tmp)
    out_tab <- rbind(out_tab,
                     data.frame(TF = .TF,
                                Family = .family,
                                Class = .class,
                                Type = .type,
                                Description = .description,
                                KEGG = .KEGG