A Tomato Sequence - tagged Connector (STC) Database Lee, S . , Mao, L . , Main, D . , Wood, T . , Wing, R . A . Clemson University Genomics Institute, 100 Jordan Hall, Clemson, SC 29631 USA In an effort to develop a tomato STC database for genome sequencing, we are sequencing the ends of BAC clones from a 15x genome equivalent L . esculentum BAC library (Budiman, 2000) . To date, we have generated 4,990 tomato STCs with 4,310 of them (86 . 4%) having an average sequence length of 372 . 4 high quality bases . All STCs were searched against SwissProt using FASTX (Pearson, 1988) and against all plant sequences downloaded from GenBank, using FASTA (Pearson, 1988) . With a cutoff expectation (E) value of <10 - 5, 1,756 sequences (35 . 19%) were found to show homology with known sequences . As shown in Fig . 1, about 40% of the 1,756 STCs share sequence similarity to defined gene - related sequences . Various retrotransposons comprise another 40% of all the STCs having a match with GenBank, suggesting that retrotransposons are a major component of the tomato genome . STCs homologous to non - LTR retrotransposons were also found and reported here for the first time in the tomato genome according to our GenBank search results . STCs similar to repetitive elements constitute 13% of these sequences . The remaining STCs (6%), which we labeled miscellaneous DNA, were homologous with GenBank sequences that are poorly annotated or constitute non - genomic DNA, such as chloroplast and mitochondrial DNA .   Retrotransposon polyproteins, i . e . T17459 (GenBank acc . no . , gypsy - like, tomato), Lere1 (copia - like, tomato, Mao et al . unpublished) and CAA73798 . 1 (GenBank acc . no . , non - LTR, Beta vulgaris), were used as queries to search against all the tomato STCs sequences using FASTA and TFASTA (Pearson, 1988) .   A total of 304 STCs were obtained, of which 195 were homologous to gypsy - like retrotransposons, while the numbers of STCs that were homologous to copia - like and non - LTR retrotransposons were 92 and 17, respectively . It is interesting that the ratio of tomato STCs homologous to each type of retrotransposons are similar to that shown in rice (Fig . 2), i . e . gypsy - like retrotransposons make up more than half of the total STCs homologous to retrotransposons (Mao, 2000) .   As sequencing has progressed, the number of STCs that have no homology to GenBank sequences has decreased from 70% in our previous study of 1205 STCs (Budiman, 2000) to 64% . We expect that this number will continue to decrease, although slowly due to the expected large number of retrotransposon sequences in the tomato genome . The 4,990 tomato BAC ends and the results of the FASTX and FASTA searches are accessible