TSSG/TSSW

Softberry Inc, http://www.softberry.com/

Overview

Eukaryotic PolII promoter 예측 프로그램 중에 가장 정확하다고 알려진 TSSG와 TSSW는 regulatory motifs 데이타베이스에서 서열의 기능적인 특징들을 결합하는 분석 툴을 기초로 하고 있습니다. TSSG는 현재 가장 정확하다고 알려진 stand-alone promoter 예측 프로그램입니다. TSSG는 promoters를 50~60% 정도 정확하게 예측하고, TSSG에 의해 예측되어진 promoters의 80~85%는 확실한 promoters라고 할 수 있습니다. 그에 비해 TSSW의 정확성은 약간 낮습니다.

Features

  • TSSG는 가장 정확한 포유류 promoter 예측 프로그램입니다. 아래 테이블은 Liu and States (2002) Genome Research 12:462-469 에서 인용한 것으로, 서로 다른 promoter 예측 프로그램에서 알려진 mRNAs를 가지고 유전자에서 promoter search를 한 결과입니다. 이 결과로TSSG가 가장 적은 false positive predictions을 나타낸다는 것을 알 수 있습니다.

TSSG.jpg

  • TSSG는 promoter.dat 파일을 사용합니다.
  • TSSG 프로그램은 약 5000bp에서 하나 정도의 false positive prediction을 합니다.
  • 아래 테이블은 TSSG와 Prestridge's algorithms이 10개의 테스트 유전자를 가지고 TSS 위치를 찾는 정확성을 나타낸 것입니다.

TSSG1.jpg

References

Solovyev V.V., Salamov A.A. (1997) 

The Gene-Finder computer tools for analysis of human and model organisms genome sequences. 

In Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (eds.Rawling C.,Clark D., Altman R.,Hunter L.,Lengauer T.,Wodak S.), Halkidiki, Greece, AAAI Press,294-302. 

Solovyev V.V. (2001) 

Statistical approaches in Eukaryotic gene prediction. 

In Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127. 

Solovyev VV, Shahmuradov IA. (2003)  

PromH: Promoters identification using orthologous genomic sequences. 

Nucleic Acids Res. 31(13):3540-3545. 

TSSG/TSSW outpot

용어 설명

  • First line - name of your sequence;
  • second and third lines - LDF threshold and the length of presented sequence
  • Fourth line - Number of predicted promoter regions
  • Next lines - positions of predicted sites, their 'weights' and TATA box position (if found)

Position shows the first nucleotide of the transcript (TSS position)

After that functional motifs are given for each predicted region; (+) or (-) reflects the direct or complementary chain

S... means a particular motif identificator from the Ghosh data base

TSSG output

HSCALCAC     7637 bp    DNA             PRI       14-MAR-1995
 Length of sequence-      7637
 Threshold for LDF-  4.00
     1 promoter(s)  were predicted
 Pos.:   1820 LDF- 16.65 TATA box predicted at   1804
 Transcription factor binding sites:
for promoter at position -    1820
  1764 (-) S00098       AACCAAT
  1608 (-) S01152       AAGTGA
  1741 (+) S01153       AARKGA
  1608 (-) S01153       AARKGA
  1657 (+) S01090       AATGA
  1617 (-) S01027       ACGCCC
  1577 (+) S00534       ACGTCA
  1580 (-) S00534       ACGTCA
  1580 (-) S01257       ACGTCAT
..............................

TSSW output

HSCALCAC     7637 bp    DNA             PRI       14-MAR-1995    
     Length of sequence-      7637
 Threshold for LDF-  4.00
     2 promoter(s)  were predicted
 Pos.:   1834 LDF- 11.08 TATA box predicted at   1804
 Pos.:   7031 LDF-  4.64 TATA box predicted at   7001
 Transcription factor binding sites:
for promoter at position -    1834
  1752 (+) CHICK$ACRA   CCGCCC
  1762 (-) HS$BAC_03    CCAAT
  1764 (-) RAT$ALBU_2   AACCAAT
  1757 (-) HS$APOE_08   GGGCGG
  1575 (+) HS$ACHGON_   TGACGTCA
  1582 (-) HS$ACHGON_   TGACGTCA
  1758 (+) MOUSE$A21C   ATTGG
  1745 (+) MOUSE$A21C   gcccagccctcccATTGGtggagacg
  1609 (+) Y$CYC1_09    ctcatttggcgagcGTTGGt
  1724 (+) AD$E2L_04    TGACgcA
  1577 (+) AD$E4_16     ACGTCA
  1580 (-) AD$E4_16     ACGTCA
  1580 (-) AD$E4_18     ACGTCAT
  1655 (+) HS$EGFR_15   TCAAT
..............................

TSSP

  • TSSP는 TSSG/TSSW와 유사하지만 다른점은 RegSite DB (Plants DB)를 사용한다는 점이 다릅니다.

    • RegSite DB는 Softberry사에서 개발한 DB입니다.

  • References
    Weixiong Zhang 1,2,* et al., Bioinformatics 2005 21(14):3074-3081 
    
    Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana 
    
    ... sites (TSSs). To predict TSSs, we combined an A.thaliana cDNA database and a software, TSSP (SoftBerry, http://www.softberry.com). As ... 
  • TSSP output
    • 용어 설명
      First line - name of your sequence; 
      Second and Third lines - LDF threshold and the length of presented sequence 
      4th line - The number of predicted promoter regions 
      Next lines - positions of predicted sites, their 'weights' and TATA box position (if found) 
      
      Position shows the first nucleotide of the transcript (TSS position) 
      After that functional motifs are given for each predicted region; (+) or (-) reflects the direct or complementary chain; Fields like "RSP00004 tagaCACGTaga" mean a particular motif identificator with found similar sequence from the Softberry Regsite-Plant data base.
    • output
      tssp  Wed Jul 10 02:52:32 EDT 2002
      >gi|1902902|dbj|AB001920.1| Oryza sativa (japonica cultivar-group) gene for phos
       Length of sequence-      5871
       Thresholds for TATA+ promoters -  0.02, for TATA-/enhancers -  0.04
           2 promoter/enhancer(s) are predicted
       Promoter Pos:   1522 LDF-  0.13 TATA box at   1488    18.93
       Enhancer Pos:   1597 LDF-  0.12
       Transcription factor binding sites/RegSite DB:
      for promoter at position -    1522
        1468 (-) RSP00004     tagaCACGTaga
        1459 (+) RSP00010     cACGTG
        1456 (+) RSP00011     ctccACGTGgt
        1461 (+) RSP00016     caTGCAC
        1468 (-) RSP00016     caTGCAC
        1256 (-) RSP00026     gcttttgaTGACtTcaaacac
        1460 (+) RSP00065     ACGTGgcgc
        1460 (+) RSP00066     ACGTGccgc
        1459 (+) RSP00069     tACGTG
        1341 (+) RSP00071     GACGTC
        1346 (-) RSP00071     GACGTC
        1452 (-) RSP00096     GGTTT
        1432 (+) RSP00129     CACGAC
        1281 (+) RSP00148     CGACG
        1284 (+) RSP00148     CGACG
        1315 (+) RSP00148     CGACG
        1335 (+) RSP00148     CGACG
        1340 (+) RSP00148     CGACG
        1365 (+) RSP00148     CGACG
        1434 (+) RSP00148     CGACG
        1458 (+) RSP00148     CGACG
        1347 (-) RSP00148     CGACG
        1474 (+) RSP00162     ACACccGagctaaccacaac
        1348 (+) RSP00241     CGGTCA
        1387 (+) RSP00339     RTTTTTR
        1264 (-) RSP00397     AGTGGCGG
        1268 (+) RSP00422     ACCGAC
        1459 (+) RSP00423     GACGTG
        1464 (-) RSP00424     CACGTC
        1369 (-) RSP00431     rdygRCRGTTRs
        1278 (-) RSP00432     cVacGGTaGGTgg
        1249 (-) RSP00436     TTGACT
        1260 (+) RSP00463     atttcatggCCGACctgcttttt
        1260 (+) RSP00464     acttgatggCCGACctctttttt
        1260 (+) RSP00465     aatatactaCCGACcatgagttct
        1265 (+) RSP00466     actaCCGACatgagttccaaaaagc
        1440 (+) RSP00469     GNGGTG
        1260 (-) RSP00469     GNGGTG
        1440 (+) RSP00470     GTGGNG
        1263 (-) RSP00470     GTGGNG
        1257 (-) RSP00470     GTGGNG
        1390 (+) RSP00477     TTTAA
        1385 (+) RSP00508     gcaTTTTTatca
        1502 (-) RSP00508     gcaTTTTTatca
        1469 (+) RSP00518     tccctACACgcGtcacaattc
        1465 (+) RSP00519     caattcaggACACgtGccctcttca
        1474 (+) RSP00521     ACACccG
        1474 (+) RSP00523     ACACgcG
        1474 (+) RSP00524     ACACgtG
      for promoter at position -    1597
        1468 (-) RSP00004     tagaCACGTaga
        1459 (+) RSP00010     cACGTG
        1456 (+) RSP00011     ctccACGTGgt
        1461 (+) RSP00016     caTGCAC
        1468 (-) RSP00016     caTGCAC
        1460 (+) RSP00065     ACGTGgcgc
        1460 (+) RSP00066     ACGTGccgc
        1459 (+) RSP00069     tACGTG
        1341 (+) RSP00071     GACGTC
        1346 (-) RSP00071     GACGTC
        1452 (-) RSP00096     GGTTT
        1432 (+) RSP00129     CACGAC
        1315 (+) RSP00148     CGACG
        1335 (+) RSP00148     CGACG
        1340 (+) RSP00148     CGACG
        1365 (+) RSP00148     CGACG
        1434 (+) RSP00148     CGACG
        1458 (+) RSP00148     CGACG
        1347 (-) RSP00148     CGACG
        1474 (+) RSP00162     ACACccGagctaaccacaac
      
      
      ..............................
      


CategoryProduct

TSSG_TSSW_TSSP (last edited 2012-03-17 17:24:18 by localhost)










  • Immutable Page
  • Info
  • Attachments