CHIP-SEQ Process Report

Source: Internet
Author: User
Tags first row aspera connect


I. Summary

The experiment aims to understand the basic principles of CHIP-SEQ. By imitating the process of the literature "targeting super enhancer associated oncogenes in oesophageal squamous cell carcinoma", we learned to download data using NCBI and EBI databases, Familiar with the basic operation of Linux, and use the R language to draw, using Python or shell to write scripts for basic data processing, through the FASTQC, Bowtie, Macs, Samtools, Rose and other software for data processing, and the prediction results are analyzed and discussed.

second, the material 1. Hardware platform

Processor: Intel (R) Core (TM) i7-4710mq CPU @ 2.50GHz 2.50GHz

installation Memory (RAM): 16.0GB 2. System Platform

Windows 8.1,ubuntu 3. Software platform

①aspera Connect②fastqc③bowtie

④macs 1.4.2⑤igv⑥rose 4. Database Resources

NCBI database: https://www.ncbi.nlm.nih.gov/;

ebi database: http://www.ebi.ac.uk/; 5. Research Object

Added h3k27ac antibody-treated TE7 cell line sequencing data and its blank control group

Added h3k27ac antibody-treated KYSE510 cell line and its blank control group

Background: Esophageal squamous cell carcinoma (OSCC) is an invasive malignancy, this article screened by high-throughput small molecule inhibitors, found a highly effective anticancer material, a specific CDK7 inhibitor THZ1. Rna-seq shows that low doses of THZ1 selectively inhibit the production of some carcinogenic genes, and further characterization of these THZ1-sensitive genomic functions suggests that they are often combined with super-enhancer (SE). Chip-seq interprets the mechanism of inhibition of CDK7 in OSCC cells.

Highlights of this article: the location of SE in OSCC cells and the identification of many SE-related regulatory components, and the discovery of small molecule THZ1-specific inhibition of SE-related transcription, showing strong anti-cancer resistance.

Article pmid:27196599

third, the method 1, aspera software download and installation

Enter the downloads interface of the Aspera website, select aspera Connect Server, click the Wwindows icon, select v3.6.2 version, click Download To download.

Chart 1 Download of aspera

Installation configuration under Linux refer to the blog post:

http://blog.csdn.net/likelet/article/details/8226368 2, chip-seq data download

1) Select NCBI's Geo Datasets database, enter GSE76861, open GSM2039110, GSM2039111, 2039112, GSM2039113 to get their SRX serial number.

Figure 2 Chip-seq Data

Figure 3 Obtaining the SRA number

2) Enter Ebi, get ASCP download address

Figure 4 ASCP Download Address

3) Download and unzip using aspera

Aspera download command and gunzip decompression command (nohup+ command +& can be run in the background)

3, FASTQC quality Inspection 3.1 FASTQC installation

Ubuntu software package comes with FASTQC

installation command Apt-get install FASTQC 3.2 using FASTQC for quality inspection

FASTQC command:

Fastqc-o. -T 5-f fastq SRR3101251.FASTQ &

-O. Output the results to the current directory

-T 5 means running 5 threads

-F FASTQ SRR3101251.FASTQ represents the input file

(to perform four times for four FASTQ files respectively) 4, using bowtie to reads mapping installation of 4.1 bowtie

Ubuntu software package comes with bowtie

installation command Apt-get install bowtie 4.2 Downloads Human reference Genome

In the literature, the sequence is compared to the human reference genome Grch37/hg19.

Bowtie's official website has a human reference genome hg19 already indexed files.

Figure 5 Bowtie Hg19 built-in index

To perform the decompression command again: Unzip Hg19.ebwt.zip 4.3 using bowtie for comparison

Bowtie command:

5. Macs looking for installation of peak Enrichment Area 5.1 Macs14

To Liu Xiaolo Lab website Download http://liulab.dfci.harvard.edu/MACS/Download.html

After extracting, switch to folder directory, execute

python setup.py install 5.2 Using Macs modeling to find the Peaks enrichment area

Macs Command:

6, IGV visualization 6.1 data Normalization normalised

Write a Python program to normalised the wig file

calculates rpm for TE7_H3K27AC and kyse510_h3k27ac wig files (treat files in wig folders generated after Macs)

RPM Formula: (number of reads in a position ÷ total reads on all chromosomes) x1000000 6.2 using Wigtobigwig conversion format

6.3 Installing IGV (Integrative Genomics Viewer) to visualize results

Download Windows version from IGV official website Http://software.broadinstitute.org/software/igv/download Install as prompted

Direct click to open Igv.jar or run as Administrator on the bat file

first, load the hg19 genome, then load two normalised bw files to 7. Rose Identification Enhancer 7.1 Rose program installation

The rose program can be downloaded to http://younglab.wi.mit.edu/super_enhancer_code.html and has 2.7G sample data 7.2 Data preprocessing

7.3 Running the Rose program

7.4 Making a gene annotation

7.5 Writing R programs to draw enhancer and neighboring genes

Figure 6 TE7.R Program

Figure 7 KYSE510.R Program

Iv. Results 1, chip-seq data download

Chip-seq data Download and extract results

Figure 8 CHIP-SEQ Data

2, fastqc Quality inspection

Data quality Checks

Chart 9 Quality Check file

Chart 10 Quality Check Results

3, using bowtie to reads Mapping 3.1 genome file

Figure 11 Human reference Genome HG19 index 3.2 Mapping Results

Chart mapping Overall Results

Figure 13 generated Sam files

4. MACS Search 4.1MACS result file for peak Enrichment area

Chart TE7 Experimental control group results

Chart KYSE510 Experimental control group results interpretation of 4.2 macs results

Peaks.xls from left to right are: the peak of the chromosome name, the beginning of the peak position, the end of the peak, the length of the peak, the height of the peak, affixed to the number of reads tags, pvalue (expressed confidence), the peak of the enrichment degree, FDR false positive rate (the smaller the peak the better)

Chart: Peaks.xls file

Negative_peaks.xls when a control group is present, Macs will perform two peak calling. For the first time, the experimental group (treatment) was the experimental group, the control group was the control group, the second was reversed, the experimental group was the control group and the control group was the experimental group. This is equivalent to a file that was calculated after the reversal.

Chart: Negative_peaks.xls

The peaks.bed file is equivalent to a simplified version of Peaks.xls, from left to right: The chromosome name of the peak, the starting position of the peak, the end of the peak, the Macs name of the peak, the pvalue (indicating confidence level)

Chart peaks.bed File

Summits.bed is the peak file, from left to right: The chromosome name of the peak, the position of the peak, the Macs name of the peak, the height of the peak

Chart summits.bed File

The Macs_wiggle folder is divided into control folder and treat folder, which contains the control group and treat group each 50bp, affixed reads number. The first column is the position of the chromosome, and the second is the number of labels (reads) that start at the corresponding position in the first row, extending 50bp, in total.

Chart Wiggle folder under Afterfiting_all.wig file

MODEL.R files can be run using R, draw pictures of the Shuangfeng model PDF

Chart MODEL.R File

Chart TE7 Shuangfeng Model chart KYSE510 Shuangfeng model

5, IGV to peaks visualization 5.1Normalised, wig file and literature data comparison

Chart Peaks Overall statistical comparison 5.2 IGV Peaks overall visualization

Chart IGV visualization 6. Rose Analysis Results 6.1 Data preprocessing results

Samtools convert Sam files to BAM files, and sort, then index

Chart: Bam file and Bai index 6.2 Rose Program Enhancer Classification Results

Chart TE7 Enhancer Classification Results

Chart KYSE510 Enhancer Classification Results

Peaks_ AllEnhancers.table.txt files from left to right are, enhancer area name ID, chromosome position, enhancer start position, end position, how many enhancer stitches are connected, enhancer size, treat group peak height, control group Peak High degree, enhancer size rank, whether super enhancer

Chart Peaks_AllEnhancers.table.txt File

Peaks_plot_points.png picture, ordinate for the peaks_AllEnhancers.table.txt in the G,h column subtraction results, and reduce the height of the control group after the peak, the horizontal axis for all enhancer rankings, The more likely the superenhancer are, the more they depend on the right side of the graph.

Chart: Te7_peaks_plot_points.png chart to kyse510_peaks_plot_points.png 6.3 Gene Annotation Results

Allenhancers_enhancer_to_gene.txt column J starts with the name of the gene closest to enhancer

Allenhancers_gene_to_enhancer.txt 1th column, followed by the name of the neighboring peak

Chart: Allenhancers_enhancer_to_gene.txt file

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.