TSSG/TSSW
Softberry Inc, http://www.softberry.com/
Overview
Eukaryotic PolII promoter 예측 프로그램 중에 가장 정확하다고 알려진 TSSG와 TSSW는 regulatory motifs 데이타베이스에서 서열의 기능적인 특징들을 결합하는 분석 툴을 기초로 하고 있습니다. TSSG는 현재 가장 정확하다고 알려진 stand-alone promoter 예측 프로그램입니다. TSSG는 promoters를 50~60% 정도 정확하게 예측하고, TSSG에 의해 예측되어진 promoters의 80~85%는 확실한 promoters라고 할 수 있습니다. 그에 비해 TSSW의 정확성은 약간 낮습니다.
Features
- TSSG는 가장 정확한 포유류 promoter 예측 프로그램입니다. 아래 테이블은 Liu and States (2002) Genome Research 12:462-469 에서 인용한 것으로, 서로 다른 promoter 예측 프로그램에서 알려진 mRNAs를 가지고 유전자에서 promoter search를 한 결과입니다. 이 결과로TSSG가 가장 적은 false positive predictions을 나타낸다는 것을 알 수 있습니다.
- TSSG는 promoter.dat 파일을 사용합니다.
- TSSG 프로그램은 약 5000bp에서 하나 정도의 false positive prediction을 합니다.
- 아래 테이블은 TSSG와 Prestridge's algorithms이 10개의 테스트 유전자를 가지고 TSS 위치를 찾는 정확성을 나타낸 것입니다.
References
Solovyev V.V., Salamov A.A. (1997) The Gene-Finder computer tools for analysis of human and model organisms genome sequences. In Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (eds.Rawling C.,Clark D., Altman R.,Hunter L.,Lengauer T.,Wodak S.), Halkidiki, Greece, AAAI Press,294-302.
Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127.
Solovyev VV, Shahmuradov IA. (2003) PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31(13):3540-3545.
TSSG/TSSW outpot
용어 설명
- First line - name of your sequence;
- second and third lines - LDF threshold and the length of presented sequence
- Fourth line - Number of predicted promoter regions
- Next lines - positions of predicted sites, their 'weights' and TATA box position (if found)
Position shows the first nucleotide of the transcript (TSS position)
After that functional motifs are given for each predicted region; (+) or (-) reflects the direct or complementary chain
S... means a particular motif identificator from the Ghosh data base
TSSG output
HSCALCAC 7637 bp DNA PRI 14-MAR-1995
Length of sequence- 7637
Threshold for LDF- 4.00
1 promoter(s) were predicted
Pos.: 1820 LDF- 16.65 TATA box predicted at 1804
Transcription factor binding sites:
for promoter at position - 1820
1764 (-) S00098 AACCAAT
1608 (-) S01152 AAGTGA
1741 (+) S01153 AARKGA
1608 (-) S01153 AARKGA
1657 (+) S01090 AATGA
1617 (-) S01027 ACGCCC
1577 (+) S00534 ACGTCA
1580 (-) S00534 ACGTCA
1580 (-) S01257 ACGTCAT
..............................
TSSW output
HSCALCAC 7637 bp DNA PRI 14-MAR-1995
Length of sequence- 7637
Threshold for LDF- 4.00
2 promoter(s) were predicted
Pos.: 1834 LDF- 11.08 TATA box predicted at 1804
Pos.: 7031 LDF- 4.64 TATA box predicted at 7001
Transcription factor binding sites:
for promoter at position - 1834
1752 (+) CHICK$ACRA CCGCCC
1762 (-) HS$BAC_03 CCAAT
1764 (-) RAT$ALBU_2 AACCAAT
1757 (-) HS$APOE_08 GGGCGG
1575 (+) HS$ACHGON_ TGACGTCA
1582 (-) HS$ACHGON_ TGACGTCA
1758 (+) MOUSE$A21C ATTGG
1745 (+) MOUSE$A21C gcccagccctcccATTGGtggagacg
1609 (+) Y$CYC1_09 ctcatttggcgagcGTTGGt
1724 (+) AD$E2L_04 TGACgcA
1577 (+) AD$E4_16 ACGTCA
1580 (-) AD$E4_16 ACGTCA
1580 (-) AD$E4_18 ACGTCAT
1655 (+) HS$EGFR_15 TCAAT
..............................
TSSP
TSSP는 TSSG/TSSW와 유사하지만 다른점은 RegSite DB (Plants DB)를 사용한다는 점이 다릅니다.
RegSite DB는 Softberry사에서 개발한 DB입니다.
- References
Weixiong Zhang 1,2,* et al., Bioinformatics 2005 21(14):3074-3081 Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana ... sites (TSSs). To predict TSSs, we combined an A.thaliana cDNA database and a software, TSSP (SoftBerry, http://www.softberry.com). As ...
- TSSP output
- 용어 설명
First line - name of your sequence; Second and Third lines - LDF threshold and the length of presented sequence 4th line - The number of predicted promoter regions Next lines - positions of predicted sites, their 'weights' and TATA box position (if found) Position shows the first nucleotide of the transcript (TSS position) After that functional motifs are given for each predicted region; (+) or (-) reflects the direct or complementary chain; Fields like "RSP00004 tagaCACGTaga" mean a particular motif identificator with found similar sequence from the Softberry Regsite-Plant data base.
- output
tssp Wed Jul 10 02:52:32 EDT 2002 >gi|1902902|dbj|AB001920.1| Oryza sativa (japonica cultivar-group) gene for phos Length of sequence- 5871 Thresholds for TATA+ promoters - 0.02, for TATA-/enhancers - 0.04 2 promoter/enhancer(s) are predicted Promoter Pos: 1522 LDF- 0.13 TATA box at 1488 18.93 Enhancer Pos: 1597 LDF- 0.12 Transcription factor binding sites/RegSite DB: for promoter at position - 1522 1468 (-) RSP00004 tagaCACGTaga 1459 (+) RSP00010 cACGTG 1456 (+) RSP00011 ctccACGTGgt 1461 (+) RSP00016 caTGCAC 1468 (-) RSP00016 caTGCAC 1256 (-) RSP00026 gcttttgaTGACtTcaaacac 1460 (+) RSP00065 ACGTGgcgc 1460 (+) RSP00066 ACGTGccgc 1459 (+) RSP00069 tACGTG 1341 (+) RSP00071 GACGTC 1346 (-) RSP00071 GACGTC 1452 (-) RSP00096 GGTTT 1432 (+) RSP00129 CACGAC 1281 (+) RSP00148 CGACG 1284 (+) RSP00148 CGACG 1315 (+) RSP00148 CGACG 1335 (+) RSP00148 CGACG 1340 (+) RSP00148 CGACG 1365 (+) RSP00148 CGACG 1434 (+) RSP00148 CGACG 1458 (+) RSP00148 CGACG 1347 (-) RSP00148 CGACG 1474 (+) RSP00162 ACACccGagctaaccacaac 1348 (+) RSP00241 CGGTCA 1387 (+) RSP00339 RTTTTTR 1264 (-) RSP00397 AGTGGCGG 1268 (+) RSP00422 ACCGAC 1459 (+) RSP00423 GACGTG 1464 (-) RSP00424 CACGTC 1369 (-) RSP00431 rdygRCRGTTRs 1278 (-) RSP00432 cVacGGTaGGTgg 1249 (-) RSP00436 TTGACT 1260 (+) RSP00463 atttcatggCCGACctgcttttt 1260 (+) RSP00464 acttgatggCCGACctctttttt 1260 (+) RSP00465 aatatactaCCGACcatgagttct 1265 (+) RSP00466 actaCCGACatgagttccaaaaagc 1440 (+) RSP00469 GNGGTG 1260 (-) RSP00469 GNGGTG 1440 (+) RSP00470 GTGGNG 1263 (-) RSP00470 GTGGNG 1257 (-) RSP00470 GTGGNG 1390 (+) RSP00477 TTTAA 1385 (+) RSP00508 gcaTTTTTatca 1502 (-) RSP00508 gcaTTTTTatca 1469 (+) RSP00518 tccctACACgcGtcacaattc 1465 (+) RSP00519 caattcaggACACgtGccctcttca 1474 (+) RSP00521 ACACccG 1474 (+) RSP00523 ACACgcG 1474 (+) RSP00524 ACACgtG for promoter at position - 1597 1468 (-) RSP00004 tagaCACGTaga 1459 (+) RSP00010 cACGTG 1456 (+) RSP00011 ctccACGTGgt 1461 (+) RSP00016 caTGCAC 1468 (-) RSP00016 caTGCAC 1460 (+) RSP00065 ACGTGgcgc 1460 (+) RSP00066 ACGTGccgc 1459 (+) RSP00069 tACGTG 1341 (+) RSP00071 GACGTC 1346 (-) RSP00071 GACGTC 1452 (-) RSP00096 GGTTT 1432 (+) RSP00129 CACGAC 1315 (+) RSP00148 CGACG 1335 (+) RSP00148 CGACG 1340 (+) RSP00148 CGACG 1365 (+) RSP00148 CGACG 1434 (+) RSP00148 CGACG 1458 (+) RSP00148 CGACG 1347 (-) RSP00148 CGACG 1474 (+) RSP00162 ACACccGagctaaccacaac ..............................
- 용어 설명












