~ ~ and tang Teacher A laboratory of people developed ~ ~
Cd-hit is a very widely used program for clustering and comparing protein or nucleotide sequences. Cd-hit was originally developed by Dr Weizhong Liat Dr Adam Godzik's Lab at the Burnham Institute (now Sanford-burnham M Edical (Institute)
Cd-hit is very fast and can handle extremely large databases. Cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in Unde Rstanding the data structure and correct the bias within a dataset.
The Cd-hit package has Cd-hit, cd-hit-2d, Cd-hit-est, cd-hit-est-2d, cd-hit-454, Cd-hit-para, Psi-cd-hit, CD-HIT-OTU, CD-H It-lap, Cd-hit-dup and over a dozen scripts.
- Cd-hit (cd-hit-est) clusters similar proteins (DNAs) into clusters that meet a user-defined similarity threshold.
- cd-hit-2d (cd-hit-est-2d) compares 2 datasets and identifies the sequences in DB2 that is similar to DB1 above a Threshol D.
- cd-hit-454 identifies natural and artificial duplicates from pyrosequencing reads.
- Cd-hit-otu clusters RRNA tags into OTUs
- Cd-hit-dup identifies duplicates from single or paired Illumina reads
- Cd-hit-lap identifies overlapping reads
The usage of other programs and scripts can is found in Cd-hit User's Guide.
Cd-hit is currently maintained by the Dr. Li's group (http://weizhongli-lab.org/) at J Craig Venter Institute. We thank the support from national Center for the Resources (Grant # 1r01rr025030, 2008-2011). We Thank all users, bugs, give us suggestions and comments.
Download:
Https://github.com/weizhongli/cdhit/releases
Extract
Installation: sudo make
Freemao
Fafu
CD hit use