![]() |
![]() |
|
Methods Library List
Bottom
Tips
Tip Page: ArrayQuest home>manage>Methods_Library_List
Below is the list of methods currently held in the ArrayQuest Methods Library. A key feature of ArrayQuest is that new analysis methods can be added to the Methods Library. Adding a new method to the library requires that it be submitted to the system administrator for approval and implementation. |
| 1 |
| No. | Script Contributors | Data Source, File type |
Method Title, Description, Parameters and Script Code |
| 10 | Saurin D. Jani | MUSC DNA Microarray DB; CEL |
RMA Normalization and Calculation of MAS5.0 Detection Calls for Affymetrix GeneChips: .CEL files only: Expression intensity and A/P call output Description:
Description of Method No. 10
Parameters:
This method performs Robust Multichip Analysis (RMA), generating normalized expression intensities for any number of Affymetrix GeneChip data sets. The method also calculates Affymetrix MAS5.0 detection calls (absence or presence) for all genes. The outputs of this method are an Excel file containing normalized expression intensity values and MAS5.0 detection calls as well as Box plots and histograms of the data before and after normalization. Bioconductor and R packages used in this method:
Input or Requirements:
Script Contributor:Author: Saurin D. Jani, M.S. Address: Cell Biology and Anatomy Department, Medical University of South Carolina, Charleston, SC 29425, USA Email: jani@musc.edu Tel.: 843-792-1340 Output or Result FilesThis method will create the following five output files. One Microsoft Excel file of normalized intensities transformed into log base 2 for all genes, four JPEG files of box plots and histograms of expression intensities before and after normalization and one PDF file called Affymetrix_files_Quality_Control_Report.pdf
Setting Parameters for Method 10
Copyright Notice This page and all pages on this site are copyrighted (2006) by Gary Argraves. This page may not be reproduced in whole or in part without the express permission of the author. -Notes for Paramaeter Code AND Parameter File List Entries-------------------
1. To run this analysis method the user must manually enter sample names
'aliases' into the 2nd line of the "Parameter Code" after the equal
sign (each entry separated by a comma). The order of sample
filename aliases listed in the Parameter Code section must correspond
with the order of actual filenames listed in the File Parameter List.
2. DO NOT MODIFY "-write" lines
3. The user may choose any alias to describe the corresponding filename.
The use of the terms 'control' and 'exp' in the example below
is arbitrary. Simply replace the alias names after the = sign with
alias names of your choice.
Parameter File List Entry:
4. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script10_samples.txt'
Group Sample Names (filename alias) = control1, control2, control3, exp1, exp2, exp3
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# List actual filenames: SEE Note #4 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla This is the key word to instruct 'pgrd' to launch this script.
2
3 # -------------------------------------------------------------------------------
4 # Method Title : RMA Normalization of Affymetrix Data:.CEL files only:
5 # Expression intensity output
6 # Method
7 # Description : This method performs Robust Multiple Chip Analysis (RMA)
8 # and generates normalized expression intensities for any number
9 # of Affymetrix GeneChip data sets. The outputs of this method are
10 # an Excel file containing normalized expression intensity values
11 # as well as Box plots and histograms of the data before and after
12 # normalization.
13 # -------------------------------------------------------------------------------
14 # This method uses the following R/BioC programs:
15 # Bioconductor
16 # packages : Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,
17 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
18 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
19 # prada, siggenes, stam, multtest
20 # For more information on Bioconductor packages:
21 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
22 # R
23 # contributed
24 # packages : cluster, stats, XML
25 # For more information on R contributed packages:
26 # http://cran.r-project.org/src/contrib/PACKAGES.html
27 #
28 # Biocondutor
29 # metadata
30 # packages : Annotation, CDF and Probe packages
31 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
32
33 # -------------------------------------------------------------------------------
34 # Author : Suarin Jani, M.S.
35 # Contact : Department of Cell Biology and Anatomy
36 # Medical University of South Carolina
37 # Charleston, SC - 29425, USA
38 # Email : jani@musc.edu
39 # Tel. : (843)-792-1340
40 # -------------------------------------------------------------------------------
41
42 # ####################################################################################
43 # Load Bioconductor Libraries from R library direcotry
44 # ####################################################################################
45 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
46 source("BioCLibs1.R");
47 system("rm BioCLibs1.R");
48 # ####################################################################################
49 # Load Error Library from R library direcotry
50 # ####################################################################################
51 system("scp $centerHost:/home/powerGene/lib/R_script/errorLib.R .");
52 # system("cp /saurin/powerGene/PG_Libs/errorLib.R .");
53 source("errorLib.R");
54 system("rm errorLib.R");
55
56 # ####################################################################################
57 # Reading CEL files
58 # ####################################################################################
59 # ReadAffy() function from Bioconductor reads all .CEL files that exist in the
60 # current method's working directory.
61 # Below shown "paramFilenames.txt" file is generated on ArrayQuest which has
62 # all .CEL file names as entered by User in order
63 # to copy those files to the method's working directory.
64
65 my.filenames <- readLines("paramFilenames.txt");
66 parFile <- "paramFilenames.txt";
67
68 # my.filenames <- list.files(,"CEL"); # user can use this commnad for
69 # local analysis where my.filenames has all file names that have ".CEL" extention.
70 f <- as.matrix(my.filenames);
71 n <- nrow(f); # get the number of CEL files in the working directory.
72
73 # Check if CEL file does not exist (i.e.,n == 0), generate error message and
74 # quit R session. errors.txt will be generated as shown below.
75 if (n == 0)
76 { msg <- "Error File";
77 msg <- "This Method requires Affymetrix GeneChip .CEL files. Use .CEL files.";
78 print(msg);
79 Errmsg <- as.matrix(msg);
80 write(Errmsg,"errors.txt");
81 q("no",1); # script will quit from R
82 }
83
84 # Make empty Phenodata and read .CEL files into "abatch" object
85 myf <- my.filenames;
86 # my.pheno <- read.phenoData(sampleNames = my.filenames);
87 my.pheno <- read.AnnotatedDataFrame(parFile,sep = "\n", header = FALSE);
88 abatch <- read.affybatch(filenames = my.filenames,phenoData = my.pheno);
89
90 # ####################################################################################
91 # Input parameter parsing - Starts.
92 # ####################################################################################
93 # Based on input parameters modify existing Pheno DATA Object for microarray
94 # experiment and attach to "abatch" .CEL file object
95 # Below shown "script10_samples.txt" file has all parameter information entered by User
96
97 s1 <- readLines("script10_samples.txt");
98 smp1 <- as.matrix(unlist(strsplit(s1[1], "=",fixed = TRUE)));
99 smp1[2] <- trimWhiteSpace(smp1[2]);
100 smpX <- as.matrix(unlist(strsplit(smp1[2], ",",fixed = TRUE)));
101 # ---------------------------------------------------------
102 smpX;
103 smpC <- as.character(smpX);
104 smpC;
105 # Removes white spaces from sample names
106 for(c in 1:n){ smpX[c] <- trimWhiteSpace(smpX[c]); }
107 # --------------
108 smpX;
109 smpC <- as.character(smpX);
110 smpC;
111 # ---------------------------------------------------------
112 totalfiles <- nrow(smpX);
113
114 # Generate errror file, if input files does not match with sample names provided
115
116 if(totalfiles != n)
117 {
118 msg <- paste(geterror(1),"\n",geterror(2),n,"\n",geterror(5),totalfiles,"\n");
119 print(msg);
120 Errmsg <- as.matrix(msg);
121 write(Errmsg,"errors.txt");
122 q("no"); # script will quit from R
123 }
124
125 sampleNames(abatch)<- as.character(smpC);
126 samples <- sampleNames(abatch);
127 print(sampleNames(abatch));
128 print(samples);
129 abatch;
130
131 # ------------------------------------------------------------------
132 # ---- Adding pheno data ------------
133 # ------------------------------------------------------------------
134 cl <- numeric();
135 cl <- c(cl,rep(1,n));
136 pheno = factor(as.character((cl)));
137 pheno;
138 tmp <- pData(abatch);
139 tmp <- cbind(tmp, cl = pheno);
140 pData(abatch) <- tmp;
141 abatch;
142 pData(abatch);
143
144
145 # ####################################################################################
146 # Input parameter parsing - Ends.
147 # ####################################################################################
148 # After reading .CEL files, annotation information (which Affymetrix GeneChip used) is
149 # extracted from .CEL files and stored in "chiptype" variable
150 chiptype <- annotation(abatch);
151
152
153 # ####################################################################################
154 # Generating Quality control Report of affy batch data, raw data. object is abatch
155 # ####################################################################################
156 # Affy GeneChip type Chicken Genome Array, Affy QCReport is not being generated.
157 # (Work in progress, but for now...below)
158
159 if(chiptype != "chicken")
160 {
161 QCReport(abatch,file = "Affymetrix_CEL_Files_Quality_Control_Report.pdf");
162 }
163
164 # Based on Affymetrix GeneChip, loading Affymetrix Environment Libraries which is
165 # used for annotations of Affymetrix Probe IDs
166 system("scp $centerHost:/home/powerGene/lib/R_script/AffyChipTyAnnotateEnv.R .");
167 source("AffyChipTyAnnotateEnv.R");
168 system("rm AffyChipTyAnnotateEnv.R");
169 # ####################################################################################
170 # Normalization of Affymetrix Raw data using RMA process
171 # ####################################################################################
172 # RMA function performs normalization, background correction and calculating
173 # expression values for affymetrix .CEL "abatch" object.
174
175 myRMA<- rma(abatch);
176 MatrixRMA <- exprs(myRMA); # converting expression object to Matrix
177 class(MatrixRMA); # displaying class of eset object
178 write.exprs(myRMA,file="Expression_data_after_RMA_normalization_for_all_genes.xls");
179
180 # ####################################################################################
181 # Normalization : MAS5 starts
182 # ####################################################################################
183
184 myMAS5<- mas5(abatch); # this function is fast and which reads only CEL file exists in current user dir.
185 MatrixMAS5 <- exprs(myMAS5); # converting this expression object to Matrix for future use
186
187 # ####################################################################################
188 # Generate MAS5 - P (presence) and A (absence) calls
189 # ####################################################################################
190
191 MASPACalls <- mas5calls(abatch);
192 write.exprs(MASPACalls,file="Expression_data_PMA_Calls_After_MAS5_Normalization_for_All_Genes.xls");
193
194 MatrixMASPACalls <- exprs(MASPACalls);
195
196 # ####################################################################################
197 # Generate Combination file of RMA and MAS5 PMA calls : All Genes
198 # ####################################################################################
199
200 CombA <- character();
201 CombB <- character();
202
203 for (i in 1:n)
204 {
205 CombA <- cbind(CombA,as.matrix(as.numeric(MatrixRMA[,i])),as.matrix(as.character(MatrixMASPACalls[,i])));
206 # old code CombB <- c(CombB,rep(colnames(MatrixRMA)[i],2))
207 CombB <- c(CombB,paste(colnames(MatrixRMA)[i],"RMA"),paste(colnames(MatrixRMA)[i],"MAS5"));
208 }
209
210 rownames(CombA) <- rownames(MatrixRMA);
211 colnames(CombA) <- CombB;
212 CombA <- as.matrix(CombA);
213
214 All_rows <- rownames(MatrixRMA);
215 NewC <- as.matrix(All_rows);
216 colnames(NewC) <- c("Probe IDs");
217 NewC;
218
219 NewCombX <- cbind(NewC,CombA);
220 write.table(NewCombX,file="Expression_data_Combination_of_RMA_and_MAS5_PMA_Calls.xls", append = TRUE, quote = TRUE, sep = "\t",eol = "\n", na = "NA", dec = ".", row.names = FALSE,col.names = TRUE, qmethod = c("escape", "double"));
221
222 # End of script
|
| 12 | Saurin D. Jani | MUSC DNA Microarray DB; CEL |
Find differentially expressed genes based on fold-change, p-value and/or FDR parameters: RMA normalization; .CEL files only: ClassA vs ClassB Description: Description of Method No. 12This method is used to analyze Affymetrix DNA microarray data that can be obtained from MUSC DNA microarray database or uploaded to the ArrayQuest web-server into a private database. This method is limited to analysis of a two condition experiment (e.g., control versus experimental or wild-type versus mutant). The method normalizes hybridization data using RMA, finds differentially expressed genes based on fold-change, t-test and FDR thresholds, collects annotations for these genes, performs hierarchical clustering of the gene expression profiles and renders a heat map of the expression profiles. A requirement is that there are two or more replicates for each condition. Bioconductor and R packages used in the development of this method:
Input or Requirements:
Script ContributorAuthor: Saurin D. Jani, M.S. Address: Cell Biology and Anatomy Department, Medical University of South Carolina, Charleston, SC 29425, USA Email: jani@musc.edu Tel.: 843-792-1340 Description of Analysis Steps
OutputThe method outputs the following results:
Setting Parameters for Method 12Parameters for executing this method are regulated by the following lines of text. This text is present in "Method parameter box" at Step 4 (Choose Data Set for Analysis) of the ArrayQuest.Line 1: -write 'file:script12_samples.txt' Line 2: Group A Sample Names (filename aliases) = control1, control2, control3 Line 3: Group B Sample Names (filename aliases) = exp1, exp2, exp3 Line 4: P-value = 0.01 Line 5: False discovery rate = 0.01 Line 6: Fold change = X Line 7: -write 'end'
This page and all pages on this site are copyrighted (2006) by Gary Argraves. This page may not be reproduced in whole or in part without the express permission of the author. -Notes for Paramaeter Code AND Parameter File List Entries----------------
1. To run this analysis method the user must manually enter sample
name aliases into the 2nd and 3rd lines of the "Parameter Code"
after the equal sign (each separated by a comma). The order of
sample filename aliases listed in the Parameter Code section
must correspond with the order of actual filenames listed in
the File Parameter List.
2. DO NOT MODIFY "-write" lines
3. The user may choose any alias to describe the corresponding
filename. The use of the terms 'control' and 'exp' in the
example below is arbitrary. Simply replace the alias names
after the = sign with alias names of your choice.
4. The user must group the actual filenames in the File Parameter
List so that all Group 'A' files are listed before all Group
'B' files.
5. The user may manually modify the filter settings for P-value,
false discovery rate and/or fold change. Insert an X for
those filters that you do not want to apply to the analysis.
Parameter File List Entry:
6. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script12_samples.txt'
Group A Sample Names (filename alias) = control1, control2, control3
Group B Sample Names (filename alias) = exp1, exp2, exp3
P-value = 0.01
False discovery rate = 0.01
Fold change = X
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# List the actual filenames below (each on a separate line): SEE Note #6 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla this is a key telling 'pgrd' how to launch this script
2
3 # ------------------------------------------------------------------------------
4 # Method Title: Determine the number of differentially expressed genes based on
5 # p-value, fold-change and/or FDR parameters:.CEL files only:
6 # Comprehensive output
7 # Method
8 # Description : This method can be used to analyze data from any two-condition
9 # microarray experiment (e.g., control and experimental or
10 # wild-type and mutant). A requirement is that there are two or mor
11 # replicates for each condition. The algorithm finds differentially
12 # expressed genes, collects annotations for these genes, performs
13 # hierarchical clustering of the gene expression profiles and renders
14 # a heatmap of the expression profiles.
15 # -----------------------+=--------------------------------------------------------
16 # This method uses the following:
17 # Bioconductor
18 # packages : Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,
19 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
20 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
21 # prada, siggenes, stam, multtest
22 # For more information on Bioconductor packages:
23 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
24 # R
25 # contributed
26 # packages : cluster, stats, XML
27 # For more information on R contributed packages:
28 # http://cran.r-project.org/src/contrib/PACKAGES.html
29 #
30 # Biocondutor
31 # metadata
32 # packages : Annotation, CDF and Probe packages
33 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
34 # -------------------------------------------------------------------------------
35 # Author : Suarin Jani, M.S.
36 # Contact : Department of Cell Biology and Anatomy
37 # Medical University of South Carolina
38 # Charleston, SC - 29425, USA
39 # Email : jani@musc.edu
40 # Tel. : (843)-792-5483
41 # -------------------------------------------------------------------------------
42
43 # Load Bioconductor Libraries from R library direcotry
44 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
45 # system("cp /saurin/powerGene/PG_Libs/BioCLibs1.R .");
46 source("BioCLibs1.R");
47 system("rm BioCLibs1.R");
48
49 # Load Error Library from R library direcotry
50
51 system("scp $centerHost:/home/powerGene/lib/R_script/errorLib.R .");
52 # system("cp /saurin/powerGene/PG_Libs/errorLib.R .");
53 source("errorLib.R");
54 system("rm errorLib.R");
55
56 # Pre-processing of microarray data - Starts.
57 # ReadAffy() function from Bioconductor reads all .CEL files that exist in the
58 # current method's working directory.
59 # Below shown "paramFilenames.txt" file is generated on ArrayQuest which has all
60 # .CEL file names as entered by User in order
61 # to copy those files to the method's working directory.
62 my.filenames <- readLines("paramFilenames.txt");
63 parFile <- "paramFilenames.txt";
64
65 # my.filenames <- list.files(,"CEL"); # user can use this commnad for local
66 # analysis where my.filenames has all file names that have ".CEL" extention.
67 f <- as.matrix(my.filenames);
68 n <- nrow(f); # geting number of CEL files in user directroy and storing in to n
69
70 # Check if CEL file does not exist (i.e.,n == 0), generate error message and quit
71 # R session. "errors.txt" will be generated as shown below.
72 if(n == 0)
73 {
74 msg <- "Error File";
75 msg <- "This Method requires Affymetrix GeneChip .CEL files ONLY. Use .CEL files";
76 print(msg);
77 Errmsg <- as.matrix(msg);
78 write(Errmsg,"errors.txt");
79 q("no",1); # script will quit from R
80 }
81
82
83 # Make empty Phenodata and read .CEL files into "abatch" object
84 myf <- my.filenames;
85 # my.pheno <- read.phenoData(sampleNames = my.filenames);
86 my.pheno <- read.AnnotatedDataFrame(parFile,sep = "\n", header = FALSE);
87 abatch <- read.affybatch(filenames = my.filenames,phenoData = my.pheno);
88
89 # Input parameter parsing - Starts.
90 # Based on input parameters modify existing Pheno DATA Object for microarray
91 # experiment and attach to "abatch" .CEL file object
92 # Below shown "script12_samples.txt" file has all parameter information
93 # entered by User
94 #
95 # -write 'file:script12_samples.txt'
96 # Group A Samples Names (filename alias) = control1,control2,control3
97 # Group B Samples Names (filename alias) = exp1,exp2,exp3
98 # P-value = 0.01
99 # False discovery rate = 0.01
100 # Fold change = X
101 # -write 'end'
102
103 s1 <- readLines("script12_samples.txt");
104 smp1 <- as.matrix(unlist(strsplit(s1[1], "=",fixed = TRUE)));
105 smp1[2] <- trimWhiteSpace(smp1[2]);
106 smpX <- as.matrix(unlist(strsplit(smp1[2], ",",fixed = TRUE)));
107 controls <- nrow(smpX);
108 for (i in 1:controls)
109 {
110 smpX[i] <- trimWhiteSpace(smpX[i]);
111 }
112 smpC <- as.character(smpX);
113
114 # parsing experimental sample names
115
116 smp2 <- as.matrix(unlist(strsplit(s1[2], "=",fixed = TRUE)));
117 smp2[2] <- trimWhiteSpace(smp2[2]);
118 smpX <- as.matrix(unlist(strsplit(smp2[2], ",",fixed = TRUE)));
119 experimentals <- nrow(smpX);
120 for (i in 1:experimentals)
121 {
122 smpX[i] <- trimWhiteSpace(smpX[i]);
123 }
124 smpE <- as.character(smpX);
125
126 # Generate errror file if input files does not match with sample names provided
127
128 totalfiles <- controls + experimentals;
129 if(totalfiles != n)
130 {
131 msg <- paste(getErrMsgStr(1),"\n",getMsgStr(2),n,"\n",getMsgStr(3),controls,"\n",getMsgStr(4),experimentals);
132 print(msg);
133 Errmsg <- as.matrix(msg);
134 write(Errmsg,"errors.txt");
135 q("no",1); # script will quit from R
136 }
137
138 # merging all sample names
139 smpall <- c(smpC,smpE);
140
141 sampleNames(abatch)<- as.character(smpall);
142 samples <- sampleNames(abatch);
143
144 # generating Quality control Report of affy batch data, raw data. object is abatch
145
146 # QCReport(abatch,file = "ArrayQuest_Affymetrix_GeneChips_QC_Report.pdf");
147
148 # making mask information
149
150 p1 <- rep(0,controls);
151 p2 <- rep(1,experimentals);
152
153 mask <- c(p1,p2);
154 pheno = factor(as.character((mask)));
155 tmp <- pData(abatch);
156 tmp <- cbind(tmp, pheno1 = pheno);
157 pData(abatch) <- tmp;
158 # parsing number of GeneChips belong to control group and number of GeneChips
159 # belong to experimental group entered by User in parameters file
160 w = 1;
161 x = controls;
162 ty = experimentals;
163 if(x == ty)
164 {
165 y = 1 + ty;
166 }
167 if(x < ty)
168 {
169 y = ty;
170 }
171 if(x > ty)
172 {
173 y = 1 + x;
174 }
175 z = n;
176
177 # parsing p value cutoff,FDR cutoff, and Fold Change cutoff
178
179 pcutoff <- as.character(as.matrix(unlist(strsplit(s1[3], "=",fixed = TRUE))));
180 pcutoff<- trimWhiteSpace(pcutoff[2]);
181
182 FDRcutoff <- as.character(as.matrix(unlist(strsplit(s1[4], "=",fixed = TRUE))));
183 FDRcutoff<- trimWhiteSpace(FDRcutoff[2]);
184
185 FoldChangecutoff <- as.character(as.matrix(unlist(strsplit(s1[5], "=",fixed = TRUE))));
186 FoldChangecutoff<- trimWhiteSpace(FoldChangecutoff[2]);
187
188 if(pcutoff[1] != "X")
189 {
190 pcutoff <- as.numeric(pcutoff);
191 }
192 if(FDRcutoff != "X")
193 {
194 FDRcutoff <- as.numeric(FDRcutoff);
195 }
196 if(FoldChangecutoff != "X")
197 {
198 FoldChangecutoff <- as.numeric(FoldChangecutoff);
199 }
200 # Input parameter parsing - Ends.
201
202 # After reading .CEL files, annotation information (which Affymetrix GeneChip used) is
203 # extracted from .CEL files and store in "chiptype" variable
204
205 chiptype <- annotation(abatch);
206 chiptype
207
208 # Based on Affymetrix GeneChip, loading Affymetrix Environment Libraries which
209 # is used for annotations of Affymetrix Probe IDs
210 system("scp $centerHost:/home/powerGene/lib/R_script/AffyChipTyAnnotateEnv.R .");
211 # system("cp /saurin/powerGene/PG_Libs/AffyChipTyAnnotateEnv.R .");
212 source("AffyChipTyAnnotateEnv.R");
213 system("rm AffyChipTyAnnotateEnv.R");
214
215 # Loading Basic Functions library from ArrayQuest Library directory
216 system("scp $centerHost:/home/powerGene/lib/R_script/BasicFuncs.R .");
217 # system("cp /saurin/powerGene/PG_Libs/BasicFuncs.R .");
218 source("BasicFuncs.R");
219 system("rm BasicFuncs.R");
220
221 # Generating histogram of Affymetrix raw data - Before Normalization
222 # jpeg(filename="Histogram_before_Norm_affydata.jpeg",width=1000,height=1000);
223 # pops <- pData(abatch)[,1] + 1;
224 # hist(abatch,col=pops,type = "l", main = " ");
225 # title(main = " Histogram before normalization - Control vs Experimental " );
226 # dev.off(); # this function saves file names above jpeg function
227 #
228 # Generating boxplot of Affymetrix raw data - Before Normalization
229 # jpeg(filename="Boxplot_Before_Norm_affydata.jpeg",width=1000,height=1000);
230 # boxplot(abatch,col=c(1:n),names = samples);
231 # title(main = " Box plot before normalization - Control vs Experimental" );
232 # dev.off(); # this function saves file names above jpeg function
233
234 # Normalization of Affymetrix Raw data using RMA process - Starts.
235 # RMA function performs normalization, background correction and calculating
236 # expression values for affymetrix .CEL "abatch" object
237
238 myRMA<- rma(abatch);
239 MatrixRMA <- exprs(myRMA); # converting this expression object to Matrix
240 write.exprs(myRMA,file="Expression_data_after_RMA_normalization_for_all_genes.xls");
241
242 # expres2excle() function generates Excel file
243 # Normalization of Affymetrix Raw data using RMA process - Ends.
244
245 # Generate MAS5 - P (presence) and A (absence) calls
246
247 # PAcalls <- mas5calls(abatch);
248 # write.exprs(PAcalls,file="Expression_data_after_MAS5_normalization_for_all_genes.xls");
249
250 # expres2excle() function generates Excel file
251 # abatch.norm <- normalize(abatch);
252 # Above step is for displaying normalization graph to user
253
254 # Generating histogram of Affymetrix raw data - After Normalization
255 # jpeg(filename="Histogram_After_RMANorm_affydata.jpeg",width=1000,height=1000);
256 # pops <- pData(abatch)[,1] + 1;
257 # hist(abatch.norm,col=pops,type = "l", main = " ");
258 # title(main = " Histogram after normalization - Control vs Experimental" );
259 # dev.off(); # this function saves file names above jpeg function
260 #
261 # Generating boxplot of Affymetrix raw data - After Normalization
262 # jpeg(filename="Boxplot_After_Normalization_affydata.jpeg",width=1000,height=1000);
263 # boxplot(abatch.norm,col=c(1:n),names = samples);
264 # title(main = " Box plot after normalization - Control vs Experimental" );
265 # dev.off(); # this function saves file names above jpeg function
266 # Pre-processing of microarray data - Ends.
267
268
269
270 # Calculate t test,Fold Change, False Discovery Rate (FDR - BH method)
271
272 # perform ttest - student's t test
273 # "scoresFUN" function takes normalized expression set and two index values to
274 # perform ttest using equal variance == TRUE
275 #
276
277 scoresFUN <- function(myRMA,a,b)
278 {
279 Index1 <- which(myRMA$pheno1==a)
280 Index2 <- which(myRMA$pheno1==b)
281
282 scores <- esApply(myRMA,1,function(x){
283 tmp <- t.test(x[Index2],x[Index1],var.equal=TRUE)
284 c(mean(tmp$estimate),-diff(tmp$estimate),tmp$statistic,tmp$p.value)
285 })
286 scores <- t(scores) # put genes back in rows
287 colnames(scores) <- c("A","M","t.stat","p.value")
288
289 return(scores);
290
291 } # scoresFUN ends here
292
293 # myGenes <- geneNames(myRMA); # getting probe IDs from expression set
294 myGenes <- featureNames(myRMA); # getting probe IDs from expression set
295
296 # Generating AllGenes.txt file, which contains gene Symbol names from whole GeneChip
297 # AnnCols <- aaf.handler();
298 # AnnCols <- c("Probe","Symbol","Description","Function","GenBank","LocusLink","UniGene");
299 AnnCols <- c("Probe","Symbol");
300 AllGenes <- as.character(myGenes);
301 AnnTable1 <- aafTableAnn(AllGenes,chiptype,AnnCols[2]);
302 # saveText(AnnTable1,file = "AllGenes.txt",header = FALSE);
303
304 allscores <- scoresFUN(myRMA,0,1);
305
306 # calling scoresFUN() function on normalized expression set
307 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
308 # Benjamini and Hochberg p-value adjustment
309 adj.p.values <- tmp$adjp[order(tmp$index),];
310
311 # Calculate FDR
312 allscores <- cbind(allscores,adj.p.values[,-1]);
313 colnames(allscores)[5] <- "FDR";
314
315 # Calculate Mean expression intensity for Group1 or Control Group
316 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
317 allscores <- cbind(allscores,Temp_G1Means);
318 colnames(allscores)[6] <- "Group1Means";
319
320 # Calculate Mean expression intensity for Group2 or Expremental Group
321 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
322 allscores <- cbind(allscores,Temp_G2Means);
323 colnames(allscores)[7] <- "Group2Means";
324
325 # Calculate Fold Change
326 sizeG1 <- nrow(Temp_G1Means);
327 sizeG2 <- nrow(Temp_G2Means);
328 Temp_FC <- numeric(length(Temp_G1Means));
329
330 for(f in 1:sizeG1)
331 {
332 fcg1 = as.double(Temp_G1Means[f]/Temp_G2Means[f]);
333 fcg2 = as.double(Temp_G2Means[f]/Temp_G1Means[f]);
334 if(fcg1 >= 1)
335 {
336 Temp_FC[f] <- fcg1;
337 }
338 else
339 {
340 Temp_FC[f] <- fcg2;
341 }
342
343 }
344 # Temp_FC <- as.matrix(as.double(Temp_G2Means/Temp_G1Means));
345 allscores <- cbind(allscores,Temp_FC);
346 colnames(allscores)[8] <- "FoldChange";
347
348 # Calculate Up or Down regulation of gene in terms of intensitiy
349 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
350 allscores <- cbind(allscores,Temp_updwn);
351 colnames(allscores)[9] <- "updwn";
352 # Generate Annotation report for all genes exist on GeneChip
353 # Adding all gene probe IDs to annotation report
354 # myGenes <- geneNames(myRMA);
355 myGenes <- featureNames(myRMA);
356 All_Genes_Annotation <- aafTable("Probe IDs" = myGenes);
357
358 # Adding group means to annotation report
359 Group1means <- as.matrix(allscores[,"Group1Means"]);
360 TGroup1means <- aafTable("Group1 Intensity means (Control)" = Group1means);
361 All_Genes_Annotation <- merge(All_Genes_Annotation,TGroup1means);
362
363 Group2means <- as.matrix(allscores[,"Group2Means"]);
364 TGroup2means <- aafTable("Group2 Intensity means (Experimental)" = Group2means);
365 All_Genes_Annotation <- merge(All_Genes_Annotation,TGroup2means);
366
367 # Adding Fold Change to annotation report
368 FoldChange <- as.matrix(allscores[,"FoldChange"]);
369 TFoldChange <- aafTable("Fold Change" = FoldChange);
370 All_Genes_Annotation <- merge(All_Genes_Annotation,TFoldChange);
371
372 # Adding Up regulation or Down regulation expression intensities of gene
373 updwn <- as.matrix(allscores[,"updwn"]);
374 Tupdwn <- aafTable("Up(+) or Down(-) in Experimental " = updwn);
375 All_Genes_Annotation <- merge(All_Genes_Annotation,Tupdwn);
376
377 # Adding FDR : using BH method to annotation report
378 fdr <- allscores[,"FDR"];
379 Tfdr <- aafTable("FDR - BH method" = fdr);
380 All_Genes_Annotation <- merge(All_Genes_Annotation,Tfdr);
381
382 # Adding t-test: t-statiscits value to annotation report
383 tstat <- allscores[,"t.stat"];
384 TTstat <- aafTable("t-stat" = tstat);
385 All_Genes_Annotation <- merge(All_Genes_Annotation,TTstat);
386
387 # Adding ttest: p-value to annotation report
388 pvalue <- allscores[,"p.value"];
389 Tpvalue <- aafTable("p.value" = pvalue);
390 All_Genes_Annotation <- merge(All_Genes_Annotation,Tpvalue);
391 # whole_cols <- aaf.handler();
392 whole_cols <- c("Probe","Symbol","Description","Function","GenBank","LocusLink","UniGene");
393
394 # Adding gene name,symbol,choromosom location,pathway info. to annotation report
395 All_Annotations <- aafTableAnn(myGenes,chiptype,whole_cols);
396 All_Genes_Annotation <- merge(All_Genes_Annotation,All_Annotations);
397 saveText(All_Genes_Annotation,file = "Annotation_Report_Whole_GeneChip.xls");
398 # Saving annotation report in Microsoft Excel file
399
400 # ####################################################################################
401 # Take below code's comment off, if user wants to get rid of low expressed genes
402 # non specific filtering
403 # f1 <- pOverA(0.25,log2(100));
404 # ff <- filterfun(f1);
405 # selected <- genefilter(myRMA,ff);
406 # sum(selected);
407 # esetSub <- myRMA[selected,];
408 # myRMA <- esetSub;
409 # MatrixRMA <- exprs(myRMA);
410 # ####################################################################################
411
412
413 # Differentially expressed Gene filteration - Starts
414
415 # myGenes <- geneNames(myRMA); # getting all probe IDs from expression set
416 myGenes <- featureNames(myRMA);
417
418 allscores <- scoresFUN(myRMA,0,1); # calling scoresFUN for t-test calculation
419 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
420 # p-value adjustment for FDR calculation
421 adj.p.values <- tmp$adjp[order(tmp$index),];
422
423 # Adding FDR to matrix of expression set
424 allscores <- cbind(allscores,adj.p.values[,-1]);
425 colnames(allscores)[5] <- "FDR";
426
427 # Adding Mean values of Group1 or Controls
428 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
429 allscores <- cbind(allscores,Temp_G1Means);
430 colnames(allscores)[6] <- "Group1Means";
431
432 # Adding Mean values of Group2 or Experimentals
433 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
434 allscores <- cbind(allscores,Temp_G2Means);
435 colnames(allscores)[7] <- "Group2Means";
436
437 # Adding Fold Change to expression set matrix
438 sizeG1 <- nrow(Temp_G1Means);
439 sizeG2 <- nrow(Temp_G2Means);
440 rm(Temp_FC);
441 Temp_FC <- numeric(length(Temp_G1Means));
442
443 for(f in 1:sizeG1)
444 {
445 fcg1 = as.double(Temp_G1Means[f]/Temp_G2Means[f]);
446 fcg2 = as.double(Temp_G2Means[f]/Temp_G1Means[f]);
447 if(fcg1 >= 1)
448 {
449 Temp_FC[f] <- fcg1;
450 }
451 else
452 {
453 Temp_FC[f] <- fcg2;
454 }
455
456 }
457 # Temp_FC <- as.matrix(as.double(Temp_G2Means/Temp_G1Means));
458 allscores <- cbind(allscores,Temp_FC);
459 colnames(allscores)[8] <- "FoldChange";
460
461 # Adding Up regulation or Down regulation of gene to expression set
462 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
463 allscores <- cbind(allscores,Temp_updwn);
464 colnames(allscores)[9] <- "updwn";
465
466 # Extract differentially expressed genes based on cutoff value
467 # entered by user in parameter box
468 allscoresExtra <- allscores;
469 myGenes <- rownames(allscoresExtra);
470
471 # Conditional logic which extracts genes based on cutoff value - Starts
472 # if User has specified : t-test P-value and FDR and Fold Change
473 if(pcutoff != "X" && FDRcutoff != "X" && FoldChangecutoff != "X" )
474 {
475 myGenes <- rownames(allscoresExtra);
476 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
477 allscores <- allscoresExtra[myDEGenestemp,]
478
479 myGenes <- rownames(allscores);
480 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
481 allscores <- allscoresExtra[myDEGenestemp,]
482
483 myGenes <- rownames(allscores);
484 myDEGenestemp <- myGenes[allscores[,"FoldChange"]>= FoldChangecutoff];
485 allscores <- allscoresExtra[myDEGenestemp,]
486 }
487
488 # if User have specified : t-test P-value and FDR
489 if(pcutoff != "X" && FDRcutoff != "X" && FoldChangecutoff == "X")
490 {
491 myGenes <- rownames(allscoresExtra);
492 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
493 allscores <- allscoresExtra[myDEGenestemp,]
494
495 myGenes <- rownames(allscores);
496 myDEGenestemp <- myGenes[allscores[,"FDR"] <= FDRcutoff];
497 allscores <- allscoresExtra[myDEGenestemp,]
498 }
499
500 # if User has specified : t-test P-value and Fold Change
501 if(pcutoff != "X" && FDRcutoff == "X" && FoldChangecutoff != "X")
502 {
503 myGenes <- rownames(allscoresExtra);
504 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
505 allscores <- allscoresExtra[myDEGenestemp,]
506
507 myGenes <- rownames(allscores);
508 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
509 allscores <- allscoresExtra[myDEGenestemp,]
510 }
511
512 # if User has specified : FDR and Fold Change
513 if(pcutoff == "X" && FDRcutoff != "X" && FoldChangecutoff != "X")
514 {
515 myGenes <- rownames(allscoresExtra);
516 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
517 allscores <- allscoresExtra[myDEGenestemp,];
518
519 myGenes <- rownames(allscores);
520 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
521 allscores <- allscoresExtra[myDEGenestemp,]
522 }
523
524 # if User has specified : t-test P-value only
525 if(pcutoff != "X" && FDRcutoff == "X" && FoldChangecutoff == "X")
526 {
527 myGenes <- rownames(allscoresExtra);
528 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
529 allscores <- allscoresExtra[myDEGenestemp,]
530 }
531
532 # if User have specified : FDR only
533 if(pcutoff == "X" && FDRcutoff != "X" && FoldChangecutoff == "X")
534 {
535 myGenes <- rownames(allscoresExtra);
536 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
537 allscores <- allscoresExtra[myDEGenestemp,]
538 }
539
540 # if User has specified : Fold Change only
541 if(pcutoff == "X" && FDRcutoff == "X" && FoldChangecutoff != "X")
542 {
543 myGenes <- rownames(allscoresExtra);
544 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
545 allscores <- allscoresExtra[myDEGenestemp,]
546 }
547 # Conditional logic , which extracts genes based on cutoff value - Ends.
548 # myDEGenestemp;
549 myDEGenestemp <- myDEGenestemp;
550
551 esetSub2X <- MatrixRMA[myDEGenestemp,];
552
553 # esetSub2 <- new("exprSet",exprs = esetSub2X); # this is old medthod to get exprsSet..now its not working.
554 # below is new medthod,using Pheno data to get exprsSet.
555 # pheno_esetSub2X <- read.phenoData(NULL, colnames(esetSub2X), FALSE);
556 # esetSub2 <- new("exprSet",exprs = esetSub2X, phenoData=pheno_esetSub2X);
557
558 esetSub2 <- new("ExpressionSet", assayData = assayDataNew(exprs=esetSub2X));
559
560 # making expression set out of differntially expressed genes
561 pData(esetSub2) <- pData(myRMA);
562
563 # if there are no differntially expressed genes extracted from above calculation
564 # generate error file and quit R session.
565 if(nrow(esetSub2X) == 0)
566 {
567 Errmsg <- "Error File";
568 msg <- c("No Genes filtered based on pvalue,FDR and FoldChange cutoffs.","To solve the problem take a look of Excel file called: Annotation_Report_Whole_GeneChip.xls and find cutoff value for P-value, Fold Change or FDR.","Change P-value, Fold Change or FDR values in order to filter genes.");
569 print(msg);
570 Errmsg <- as.matrix(msg);
571 write(Errmsg,"errors.txt");
572 q("no");
573 }
574
575 # ##########
576 # here, I am generating excel file of DE gene expression set
577 # write.exprs(esetSub2,file="Expression_data_Differntially_Expressed_Genes.xls");
578
579 write.table(exprs(esetSub2),file="Expression_data_Differentially_Expressed_Genes.csv", quote=FALSE, sep = ",", col.names = NA, row.names = TRUE);
580
581 # expres2excle() function generates Excel file
582 # #############
583
584 myDEGenes <- myDEGenestemp;
585 a <- as.character(myDEGenes);
586
587 # Generating ChangedGenes.txt file, which contains gene Symbol names from differentially expressed GeneChip
588 AnnCols <- aaf.handler();
589 AllGenes <- as.character(myDEGenes);
590 AnnTable2 <- aafTableAnn(AllGenes,chiptype,AnnCols[2]);
591 # saveText(AnnTable2,file = "ChangedGenes.txt",header = FALSE);
592
593
594 # write(a,file= "Differentially_Expressed_Genes_ProbeIDs.txt");
595 # Write affymetrix probe set IDs if there are differentially expressed genes
596 DEscores <- allscores;
597 # Get unique GeneIDs from Differentially expressed Probes IDs and print to a file
598 DELLids <- as.character(unique(as.matrix(na.omit(getLL(myDEGenes,chiptype)))));
599 # write(DELLids,"Differentially_Expressed_Genes_GeneIDs.txt");
600
601 # Displaying gene expression and annotation gene information in HTML format
602
603 DEGroup1Means <- as.matrix(DEscores[,"Group1Means"]);
604 DEGroup2Means <- as.matrix(DEscores[,"Group2Means"]);
605 DEfc <- as.matrix(DEscores[,"FoldChange"]);
606 DEupdwn <- as.matrix(DEscores[,"updwn"]);
607 DEfdr <- as.matrix(DEscores[,"FDR"]);
608 DEpvalues <- as.matrix(DEscores[,"p.value"]);
609
610 Annotate_Sig_RMAGenes <- as.character(myDEGenes);
611 RMA_anncols <- aaf.handler();
612 RMA_anntable <- aafTableAnn(Annotate_Sig_RMAGenes,chiptype,RMA_anncols);
613
614 RMA_Group1_means_table <- aafTable("Group1 (Control) Mean intensity" = DEGroup1Means);
615 RMA_anntable <- merge(RMA_anntable,RMA_Group1_means_table);
616
617 RMA_Group2_means_table <- aafTable("Group2 (Experimental) Mean intensity" = DEGroup2Means);
618 RMA_anntable <- merge(RMA_anntable,RMA_Group2_means_table);
619
620 RMA_updwn_table <- aafTable("UP regulation(+) or Down regulation(-) in Experimental group:[Exp - Control]" = DEupdwn);
621 RMA_anntable <- merge(RMA_anntable,RMA_updwn_table);
622
623 RMA_FC_table <- aafTable("Fold Change: Experimental/Control " = DEfc);
624 RMA_anntable <- merge(RMA_anntable,RMA_FC_table);
625
626 RMA_FDR_table <- aafTable("FDR: BH method " = DEfdr);
627 RMA_anntable <- merge(RMA_anntable,RMA_FDR_table);
628
629 RMA_pvalues_table <- aafTable("t-test pvalue " = DEpvalues);
630 RMA_anntable <- merge(RMA_anntable,RMA_pvalues_table);
631
632 # saving HTML and excel file of Annotation report of differntially expressed gene
633 saveHTML(RMA_anntable,"Annotation_Report_Significant_Genes.html", title = "Significant Genes Full Annotation Report - Control vs Experimental");
634 saveText (RMA_anntable, "Annotation_Report_Significant_Genes.xls",header = TRUE);
635 # my.filenames <- list.files(,"CEL");
636 # my.filenames <- readLines("paramFilenames.txt");
637 write(" ",file="Annotation_Report_Significant_Genes.html",append=TRUE);
638 write.table(my.filenames,file="Annotation_Report_Significant_Genes.html",append=TRUE,sep = "*",col.names ="Files:",row.names=TRUE);
639
640 # Generate Gene Ontology Report using myGOreports_1() function in Basic Function library
641 myGOreports_1(myDEGenes,chiptype,"GeneOntology_Cellular_Component_Report.xls","GeneOntology_Molecular_Function_Report.xls","GeneOntology_BiologicalProcess_Report.xls");
642
643 # Generation of Graph Definition File(GDF) file for Gene-Gene Interaction Network
644 # below code generates GeneIDs and Expression intensities of Genes in "," deliminated file
645 # GDF file can be used in GUESS software (http://www.hpl.hp.com/research/idl/projects/graphs/)
646 # For Differntially expressed genes
647 # sigxlsd <- read.delim("Annotation_Report_Significant_Genes.xls"); # readin Excel file
648 # AllSigs <- cbind(as.matrix(sigxlsd[,8]),as.matrix(sigxlsd[,14]),as.matrix(sigxlsd[,15]));
649 # colnames(AllSigs) <- c("GeneIDs","Group1","Group2");
650 # AllEset <- new("exprSet",exprs = AllSigs);
651 # write.exprs(AllEset,file="Graph_Expression_Data_of_Differentially_Expressed_Genes.xls");
652
653 # For All Genes located on Affymetrix GeneChip
654 # sigxlsd <- read.delim("Annotation_Report_Whole_GeneChip.xls"); # readin Excel file
655 # AllSigs <- cbind(as.matrix(sigxlsd[,13]),as.matrix(sigxlsd[,2]),as.matrix(sigxlsd[,3]));
656 # colnames(AllSigs) <- c("GeneIDs","Group1","Group2");
657 # AllEset <- new("exprSet",exprs = AllSigs);
658 # write.exprs(AllEset,file="Graph_Expression_Data_of_Whole_GeneChip.xls");
659 #
660 # Generate Pathway Heatmap
661 # All_DE_Pathway <- aafPathway(geneNames(esetSub2), chiptype);
662 All_DE_Pathway <- aafPathway(featureNames(esetSub2), chiptype);
663 unique(sapply(All_DE_Pathway, length));
664 pathlist <- do.call("c", All_DE_Pathway);
665 pathL <- length(pathlist);
666 pathmatrix <- sapply(pathlist, attributes);
667 pathnamesX <- unique(t(pathmatrix));
668 kegg <- as.list(envPath2Probes);
669 pathRow <- as.numeric(nrow(pathnamesX));
670 if(pathL >=1)
671 {
672 for(i in 1:pathRow)
673 {
674 pathID <- pathnamesX[,"id"][i];
675 pathID <- as.character(unlist(pathID));
676
677 pathName <- as.character(unlist(pathnamesX[,"name"][i]));
678 pathName <- trimWhiteSpace(pathName);
679 pathName <- sub("/", "_", pathName);
680 pathName <- sub(" ", "_", pathName);
681 pathName <- sub(" ", "_", pathName);
682 pathName <- sub(" ", "_", pathName);
683 pathName <- sub(" ", "_", pathName);
684 pathName <- sub("-", "_", pathName);
685 pathFile <- paste("Pathway",pathName,".jpeg",sep = "_");
686 pathSub <- paste(pathName,"Legend: Red = High, Blue = Low, White = Medium",sep = " ");
687 pathSub <- paste("Date of Analysis:",date(),pathSub,sep = " ");
688
689 Pathway2Heatmap(pathID,pathFile,pathSub);
690 }
691
692 } # if ends here
693 #
694 # Generate Heatmap of Differntially expressed genes
695 # MapW,MapH and MapMargin parameters as shown below will be changed based on number
696 # of differntially expressed genes in order to generate JPEG file of Heatmap
697 MapW = 1000;
698 MapH = 1000;
699 MapMargin = 50;
700 gNum <- nrow(esetSub2X) ; # gNum is number of differntially expressed genes
701
702 if (gNum <= 50)
703 {
704 MapW = 1000;
705 MapH = 1000;
706 MapMargin = 50;
707 }
708 if (gNum > 50 && gNum <= 100)
709 {
710 MapW = 1500;
711 MapH = 1500;
712 MapMargin = 75;
713 }
714 if (gNum > 100 && gNum <= 150)
715 {
716 MapW = 2000;
717 MapH = 2000;
718 MapMargin = 100;
719 }
720 if (gNum > 150 && gNum <= 200)
721 {
722 MapW = 3000;
723 MapH = 3000;
724 MapMargin = 160;
725 }
726 if (gNum > 200 && gNum <= 400)
727 {
728 MapW = 4000;
729 MapH = 4000;
730 MapMargin = 210;
731 }
732 if (gNum > 400 && gNum <= 800)
733 {
734 MapW = 6500;
735 MapH = 6500;
736 MapMargin = 190;
737 }
738 if (gNum > 800 )
739 {
740 MapW = 8000;
741 MapH = 8000;
742 MapMargin = 410;
743 }
744
745 # Hsub is title or information produced on heatmap JPEG file
746 # Hsub <- paste("Heatmap: Red = High, Blue = Low, White = Medium","Date of Analysis: ",date(),sep = " ");
747
748 Hsub <- paste("Date of Analysis:",date(),"Legend: Red = High, Blue = Low, White = Medium",sep = " ");
749
750 # HeatmapBreakFUN() used from Basic Function library
751 HeatmapBreakFUN(myDEGenes,"Heatmap_of_Differentially_Expressed_Genes.jpeg",Hsub);
752
753 # End of script
|
| 13 | Saurin D. Jani | MUSC DNA Microarray DB; CEL |
Determine the number of differientially expressed genes based on fold-change, p-value and/or FDR parameters: .CEL files only: Outputs the no. of genes Description: Description of Method No. 13 This script is used to assess how different analysis threshold settings (i.e., fold change, t-test p-value, false discovery rate) affect the number of differentially expressed genes found in a two-condition experiment (control versus experimental, wild-type vs. mutant etc.). This method performs three normazlitation algorithms (i.e., RMA, MAS5, GCRMA) on Affy GeneChips. The output is an HTML table listing threshold settings and the accompanying numbers of differentially expressed genes. A requirement is that there are two or more replicates for each condition. Bioconductor and R packages used in the development of this method:
Input or Requirements:
-Notes for Paramaeter Code AND Parameter File List Entries------------
1. To run this analysis method the user must manually enter sample name
'aliases' into the 2nd and 3rd lines of the "Parameter Code" after the
equal sign (each separated by a comma). The order of sample
filename aliases listed in the Parameter Code section must correspond
with the order of actual filenames listed in the File Parameter List.
2. DO NOT MODIFY "-write" line
3. The user may choose any alias to describe the corresponding filename.
The use of the terms 'control' and 'exp' in the example below
is arbitrary. Simply replace the alias names after the = sign with
alias names of your choice.
4. The user must group the actual filenames in the File Parameter List
so that all Group 'A' files are listed before all Group 'B' files.
5. The user may manually modify the filter settings for P-value, False
discovery rate and/or Fold change. Insert an X for those filters that
you do not want to apply to the analysis.
Parameter File List Entry:
6. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script13_samples.txt'
Group A Sample Names (filename aliases) = control1, control2, control3
Group B Sample Names (filename aliases) = exp1, exp2, exp3
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# Enter actual filenames (each on a separate line): SEE Note #6 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla this is a key telling 'pgrd' how to launch this script
2
3 # ------------------------------------------------------------------------------
4 # Method Title: Determine the number of differentially expressed genes based on
5 # p-value, fold-change and/or FDR parameters:.CEL files only:
6 # Outputs the no. of genes
7 # Method
8 # Description : This script is used to assess how different analysis threshold
9 # settings (i.e., fold change, t-test p-value, false discovery rate)
10 # affect the number of differentially expressed genes found in a
11 # two-condition experiment (control versus experimental, wild-type
12 # vs. mutant etc.). The output is an HTML table listing threshold
13 # settings and the accompanying numbers of differentially expressed genes.
14 # -------------------------------------------------------------------------------
15 # This method uses the following:
16 # Bioconductor
17 # packages :Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,
18 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
19 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
20 # prada, siggenes, stam, multtest
21 # For more information on Bioconductor packages:
22 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
23 # R
24 # contributed
25 # packages : cluster, stats, XML
26 # For more information on R contributed packages:
27 # http://cran.r-project.org/src/contrib/PACKAGES.html
28 #
29 # Biocondutor
30 # metadata
31 # packages : Annotation, CDF and Probe packages
32 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
33 # ------------------------------------------------------------------------------
34 # Author : Suarin Jani, M.S.
35 # Contact : Department of Cell Biology and Anatomy
36 # Medical University of South Carolina
37 # Charleston, SC - 29425, USA
38 # Email : jani@musc.edu
39 # Tel. : (843)-792-5483
40 # -------------------------------------------------------------------------------
41
42 # Load Bioconductor Libraries from R library direcotry
43 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
44 source("BioCLibs1.R");
45 system("rm BioCLibs1.R")
46
47 # Load Error Library from R library direcotry
48
49 system("scp $centerHost:/home/powerGene/lib/R_script/errorLib.R .");
50 # system("cp /saurin/powerGene/PG_Libs/errorLib.R .");
51 source("errorLib.R");
52 system("rm errorLib.R");
53
54 # Pre-processing of microarray data - Starts.
55 # ReadAffy() function from Bioconductor reads all .CEL files that exist in the
56 # current method's working directory.
57 # Below shown "paramFilenames.txt" file is generated on ArrayQuest which has all
58 # .CEL file names as entered by User in order
59 # to copy those files to the method's working directory.
60 my.filenames <- readLines("paramFilenames.txt");
61
62 # my.filenames <- list.files(,"CEL"); # user can use this commnad for local
63 # analysis where my.filenames has all file names that have ".CEL" extention.
64 f <- as.matrix(my.filenames);
65 n <- nrow(f); # geting number of CEL files in user directroy and storing in to n
66
67 # Check if CEL file does not exist (i.e.,n == 0), generate error message and quit
68 # R session. "errors.txt" will be generated as shown below.
69 if(n == 0)
70 {
71 msg <- "Error File";
72 msg <- "This Method requires Affymetrix GeneChip .CEL files ONLY. Use .CEL files";
73 print(msg);
74 Errmsg <- as.matrix(msg);
75 write(Errmsg,"errors.txt");
76 q("no",1); # script will quit from R
77 }
78
79 # Make empty Phenodata and read .CEL files into "abatch" object
80 myf <- my.filenames;
81 my.pheno <- read.phenoData(sampleNames = my.filenames);
82 abatch <- read.affybatch(filenames = my.filenames,phenoData = my.pheno);
83
84 # Input parameter parsing - Starts.
85 # Based on input parameters modify existing Pheno DATA Object for microarray
86 # experiment and attach to "abatch" .CEL file object
87 # Below shown "script13_samples.txt" file has all parameter information
88 # entered by User
89 #
90 # -write 'file:script13_samples.txt'
91 # Group A Samples Names (filename alias) = control1,control2,control3
92 # Group B Samples Names (filename alias) = exp1,exp2,exp3
93 # P-value = 0.01
94 # False discovery rate = 0.01
95 # Fold change = X
96 # -write 'end'
97
98 s1 <- readLines("script13_samples.txt");
99 smp1 <- as.matrix(unlist(strsplit(s1[1], "=",fixed = TRUE)));
100 smp1[2] <- trimWhiteSpace(smp1[2]);
101 smpX <- as.matrix(unlist(strsplit(smp1[2], ",",fixed = TRUE)));
102 controls <- nrow(smpX);
103 for (i in 1:controls)
104 {
105 smpX[i] <- trimWhiteSpace(smpX[i]);
106 }
107 smpC <- as.character(smpX);
108
109 # parsing experimental sample names
110
111 smp2 <- as.matrix(unlist(strsplit(s1[2], "=",fixed = TRUE)));
112 smp2[2] <- trimWhiteSpace(smp2[2]);
113 smpX <- as.matrix(unlist(strsplit(smp2[2], ",",fixed = TRUE)));
114 experimentals <- nrow(smpX);
115 for (i in 1:experimentals)
116 {
117 smpX[i] <- trimWhiteSpace(smpX[i]);
118 }
119 smpE <- as.character(smpX);
120
121 # Generate errror file, if input files does not match with sample names provided
122
123 totalfiles <- controls + experimentals;
124 if(totalfiles != n)
125 {
126 msg <- paste(getErrMsgStr(1),"\n",getMsgStr(2),n,"\n",getMsgStr(3),controls,"\n",getMsgStr(4),experimentals);
127 print(msg);
128 Errmsg <- as.matrix(msg);
129 write(Errmsg,"errors.txt");
130 q("no",1); # script will quit from R
131 }
132
133
134
135 # mearging all sample names
136 smpall <- c(smpC,smpE);
137
138 sampleNames(abatch)<- as.character(smpall);
139 samples <- sampleNames(abatch);
140
141 # generating Quality control Report of affy batch data, raw data. object is abatch
142
143 QCReport(abatch,file = "Affymetrix_files_Quality_Control_Report.pdf");
144
145 # making mask information
146
147 p1 <- rep(0,controls);
148 p2 <- rep(1,experimentals);
149
150 mask <- c(p1,p2);
151 pheno = factor(as.character((mask)));
152 tmp <- pData(abatch);
153 tmp <- cbind(tmp, pheno1 = pheno);
154 pData(abatch) <- tmp;
155 # parsing number of GeneChips belong to control group and number of GeneChips
156 # belong to experimental group entered by User in parameters file
157 w = 1;
158 x = controls;
159 ty = experimentals;
160 if(x == ty)
161 {
162 y = 1 + ty;
163 }
164 if(x < ty)
165 {
166 y = ty;
167 }
168 if(x > ty)
169 {
170 y = 1 + x;
171 }
172 z = n;
173
174 # parsing p value cutoff,FDR cutoff, and Fold Change cutoff
175
176 # pcutoff <- as.character(as.matrix(unlist(strsplit(s1[3], "=",fixed = TRUE))));
177 # pcutoff<- trimWhiteSpace(pcutoff[2]);
178
179 pcutoff<- "X";
180
181 # FDRcutoff <- as.character(as.matrix(unlist(strsplit(s1[4], "=",fixed = TRUE))));
182 # FDRcutoff<- trimWhiteSpace(FDRcutoff[2]);
183
184 FDRcutoff <- "X";
185
186 # FoldChangecutoff <- as.character(as.matrix(unlist(strsplit(s1[5], "=",fixed = TRUE))));
187 # FoldChangecutoff<- trimWhiteSpace(FoldChangecutoff[2]);
188
189 FoldChangecutoff <- "X";
190
191 if(pcutoff[1] != "X")
192 {
193 pcutoff <- as.numeric(pcutoff);
194 }
195 if(FDRcutoff != "X")
196 {
197 FDRcutoff <- as.numeric(FDRcutoff);
198 }
199 if(FoldChangecutoff != "X")
200 {
201 FoldChangecutoff <- as.numeric(FoldChangecutoff);
202 }
203 # Input parameter parsing - Ends.
204 # After reading .CEL files, annotation information(which Affymetrix GeneChip used) is
205 # extracted from .CEL files and store in "chiptype" variable
206 chiptype <- annotation(abatch);
207 chiptype
208 # Based on Affymetrix GeneChip, loading Affymetrix Environment Libraries which
209 # is used for annotations of Affymetrix Probe IDs
210
211 system("scp $centerHost:/home/powerGene/lib/R_script/AffyChipTyAnnotateEnv.R .");
212 source("AffyChipTyAnnotateEnv.R");
213 system ("rm AffyChipTyAnnotateEnv.R");
214
215 # #############################################################################################################################
216 # #############################################################################################################################
217 # #############################################################################################################################
218 # #############################################################################################################################
219 # #############################################################################################################################
220 # ######################################### RMA Normalization report starts #######################################################
221
222 # Normalization of Affymetrix Raw data using RMA process - Starts.
223 # RMA function performs normalization, background correction and calculating
224 # expression values for affymetrix .CEL "abatch" object
225
226 myRMA<- rma(abatch); # this function is fast and which reads only CEL file exists in current user dir.
227 MatrixRMA <- exprs(myRMA); # converting this expression object to Matrix for future use
228 write.exprs(myRMA,file="RMA_Expression_data_after_normalization_for_all_genes.xls");
229
230 # Pre-processing of microarray data - Ends.
231
232
233 # Calculate t test,Fold Change, False Discovery Rate(FDR - BH method)
234
235 # perform ttest - student's t test
236 # "scoresFUN" function takes normalized expression set and two index values to
237 # perform ttest using equal variance == TRUE
238 #
239 # scoresFUN function takes expression set and two index values to perform ttest where equal variance == TRUE
240
241 scoresFUN <- function(myRMA,a,b)
242 {
243 Index1 <- which(myRMA$pheno1==a)
244 Index2 <- which(myRMA$pheno1==b)
245 scores <- esApply(myRMA,1,function(x){
246 tmp <- t.test(x[Index2],x[Index1],var.equal=TRUE)
247 c(mean(tmp$estimate),-diff(tmp$estimate),tmp$statistic,tmp$p.value)
248 })
249 scores <- t(scores) # #put genes back in rows
250 colnames(scores) <- c("A","M","t.stat","p.value")
251 return(scores);
252 } # scoresFUN ends here
253
254 myGenes <- geneNames(myRMA);
255 # calling scoresFUN() function on normalized expression set
256 allscores <- scoresFUN(myRMA,0,1);
257 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
258 # Benjamini and Hochberg p-value adjustment
259 adj.p.values <- tmp$adjp[order(tmp$index),];
260
261 # Calculate FDR
262 allscores <- cbind(allscores,adj.p.values[,-1]);
263 colnames(allscores)[5] <- "FDR";
264
265 # Calculate Mean expression intensity for Group1 or Control Group
266 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
267 allscores <- cbind(allscores,Temp_G1Means);
268 colnames(allscores)[6] <- "Group1Means";
269
270 # Calculate Mean expression intensity for Group2 or Expremental Group
271 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
272 allscores <- cbind(allscores,Temp_G2Means);
273 colnames(allscores)[7] <- "Group2Means";
274
275 # Calculate Fold Change
276 Temp_FC <- as.matrix(as.double(Temp_G2Means/Temp_G1Means));
277 allscores <- cbind(allscores,Temp_FC);
278 colnames(allscores)[8] <- "FoldChange";
279
280 # Calculate Up or Down regulation of gene in terms of intensitiy
281 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
282 allscores <- cbind(allscores,Temp_updwn);
283 colnames(allscores)[9] <- "updwn";
284
285 # Take below code's comment off, if user wants to get rid of low expressed genes
286 # non specific filtering...
287 # f1 <- pOverA(0.25,log2(100));
288 # ff <- filterfun(f1);
289 # selected <- genefilter(myRMA,ff);
290 # sum(selected);
291 # esetSub <- myRMA[selected,];
292 # myRMA <- esetSub;
293 # MatrixRMA <- exprs(myRMA);
294
295 # Number of Differentially expressed Gene filteration - Starts
296
297 myGenes <- geneNames(myRMA);
298
299 # calling scoresFUN() function on normalized expression set
300 allscores <- scoresFUN(myRMA,0,1);
301 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
302 # Benjamini and Hochberg p-value adjustment
303 adj.p.values <- tmp$adjp[order(tmp$index),];
304
305 # Calculate FDR
306 allscores <- cbind(allscores,adj.p.values[,-1]);
307 colnames(allscores)[5] <- "FDR";
308
309 # Calculate Mean expression intensity for Group1 or Control Group
310 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
311 allscores <- cbind(allscores,Temp_G1Means);
312 colnames(allscores)[6] <- "Group1Means";
313
314 # Calculate Mean expression intensity for Group2 or Expremental Group
315 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
316 allscores <- cbind(allscores,Temp_G2Means);
317 colnames(allscores)[7] <- "Group2Means";
318
319 # Calculate Fold Change
320 sizeG1 <- nrow(Temp_G1Means);
321 sizeG2 <- nrow(Temp_G2Means);
322 Temp_FC <- numeric(length(Temp_G1Means));
323
324 for(f in 1:sizeG1)
325 {
326 fcg1 = as.double(Temp_G1Means[f]/Temp_G2Means[f]);
327 fcg2 = as.double(Temp_G2Means[f]/Temp_G1Means[f]);
328 if(fcg1 >= 1)
329 {
330 Temp_FC[f] <- fcg1;
331 }
332 else
333 {
334 Temp_FC[f] <- fcg2;
335 }
336
337 }
338 # Temp_FC <- as.matrix(as.double(Temp_G1Means/Temp_G2Means));
339 allscores <- cbind(allscores,Temp_FC);
340 colnames(allscores)[8] <- "FoldChange";
341
342 # Calculate Up or Down regulation of gene in terms of intensitiy
343 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
344 allscores <- cbind(allscores,Temp_updwn);
345 colnames(allscores)[9] <- "updwn";
346
347 # Conditional logic , which extracts genes based on cutoff value - Starts
348
349 # if User have specified : t-test P-value and FDR and Fold Change
350 # Below function GetThs() returns threshold value based on cuttoff value passed
351
352 allscoresExtra <- 0;
353 ASBKP_two <- 0;
354 tx <- 0;
355 Ths <- 0;
356 # GetThs function
357
358 GetThs <- function(pcutoff,FDRcutoff,FoldChangecutoff)
359 {
360
361 # ##--------------- initial settings start---
362 # # store scores matrix in to a backup variable
363 # allscoresExtra <- allscores;
364 # second backup...it will be restored itself later before return of Ths
365 # ASBKP_two <- allscores;
366 # myGenes <- rownames(allscoresExtra);
367 # Ths <- 0;
368 # ##--------------- initial settings end ---
369
370 # Number of DE genes if user specified : t-test P-value and FDR and Fold Change
371
372 if(pcutoff != "X" && FDRcutoff != "X" && FoldChangecutoff != "X" )
373 {
374 # ##--------------- initial settings start---
375 # store scores matrix in to a backup variable
376 # second backup...to ASBKP variable it will be restored itself later before return of Ths
377 allscoresExtra <- allscores;
378 ASBKP_two <- allscores;
379 myGenes <- rownames(allscoresExtra);
380 Ths <- 0;
381 # ##--------------- initial settings end---
382 myGenes <- rownames(allscoresExtra);
383 tx <- 0;
384 tx <- nrow(matrix(myGenes[allscores[,"p.value"]<= pcutoff]));
385 if(tx == 0)
386 {
387 Ths <- 0;
388 return(Ths);
389 }
390 else
391 {
392 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
393 allscores <- allscoresExtra[myDEGenestemp,];
394 }
395
396 myGenes <- rownames(allscores);
397 tx <- 0;
398 tx <- nrow(matrix(myGenes[allscores[,"FDR"]<= FDRcutoff]));
399
400 if(tx == 0)
401 {
402 Ths <- 0;
403 return(Ths);
404 }
405 else
406 {
407 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
408 allscores <- allscoresExtra[myDEGenestemp,];
409 }
410
411 myGenes <- rownames(allscores);
412 tx <- 0;
413 tx <- nrow(matrix(myGenes[allscores[,"FoldChange"]>= FoldChangecutoff]));
414
415 if(tx == 0)
416 {
417 Ths <- 0;
418 return(Ths);
419 }
420 else
421 {
422 myDEGenestemp <- myGenes[allscores[,"FoldChange"]>= FoldChangecutoff];
423 allscores <- allscoresExtra[myDEGenestemp,];
424 }
425
426 Ths <- nrow(allscores);
427 if(is.null(Ths)) # if there is NULL value in Threshold
428 {
429 Ths <- 0;
430 }
431 # -------------- restore ------------------
432 allscores <- ASBKP_two;
433 # ----------------------------------------
434 return(Ths);
435 }
436 # Number of DE genes if user specified : t-test P-value and FDR
437 if(pcutoff != "X" && FDRcutoff != "X" && FoldChangecutoff == "X")
438 {
439 # ##--------------- initial settings start---
440 # store scores matrix in to a backup variable
441 allscoresExtra <- allscores;
442 # second backup...it will be restored itself later before return of Ths
443 ASBKP_two <- allscores;
444
445 myGenes <- rownames(allscoresExtra);
446 Ths <- 0;
447 # ##--------------- initial settings end---
448 myGenes <- rownames(allscoresExtra);
449 tx <- 0;
450 tx <- nrow(matrix(myGenes[allscores[,"p.value"]<= pcutoff]));
451 if(tx == 0)
452 {
453 Ths <- 0;
454 return(Ths);
455 }
456 else
457 {
458 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
459 allscores <- allscoresExtra[myDEGenestemp,];
460 }
461
462 myGenes <- rownames(allscores);
463 tx <- 0;
464 tx <- nrow(matrix(myGenes[allscores[,"FDR"]<= FDRcutoff]));
465
466 if(tx == 0)
467 {
468 Ths <- 0;
469 return(Ths);
470 }
471 else
472 {
473 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
474 allscores <- allscoresExtra[myDEGenestemp,];
475 }
476 Ths <- nrow(allscores);
477 if(is.null(Ths))
478 {
479 Ths <- 0;
480 }
481 # -------------- restore ------------------
482 allscores <- ASBKP_two;
483 # ----------------------------------------
484
485 return(Ths);
486 }
487 # Number of DE genes if user specified : t-test P-value and Fold Change
488
489 if(pcutoff != "X" && FDRcutoff == "X" && FoldChangecutoff != "X")
490 {
491 # ##--------------- initial settings start---
492 # store scores matrix in to a backup variable
493 allscoresExtra <- allscores;
494 # second backup...it will be restored itself later before return of Ths
495 ASBKP_two <- allscores;
496
497 myGenes <- rownames(allscoresExtra);
498 Ths <- 0;
499 # ##--------------- initial settings end---
500 myGenes <- rownames(allscoresExtra);
501 tx <- 0;
502 tx <- nrow(matrix(myGenes[allscores[,"p.value"]<= pcutoff]));
503 if(tx == 0)
504 {
505 Ths <- 0;
506 return(Ths);
507 }
508 else
509 {
510 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
511 allscores <- allscoresExtra[myDEGenestemp,];
512 }
513
514 myGenes <- rownames(allscores);
515 tx <- 0;
516 tx <- nrow(matrix(myGenes[allscores[,"FoldChange"]<= FoldChangecutoff]));
517
518 if(tx == 0)
519 {
520 Ths <- 0;
521 return(Ths);
522 }
523 else
524 {
525 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
526 allscores <- allscoresExtra[myDEGenestemp,];
527 }
528 Ths <- nrow(allscores);
529 if(is.null(Ths))
530 {
531 Ths <- 0;
532 }
533 # here before return Ths we calculate max fdr.
534
535 MaxFdr <- max(allscores[,"FDR"]);
536
537 ThsX <- matrix(Table1,nrow = 2,ncol = 1);
538 ThsX[1,1] <- Ths;
539 ThsX[2,1] <- MaxFdr;
540
541 # -------------- restore ------------------
542 allscores <- ASBKP_two;
543 # ----------------------------------------
544 return(ThsX);
545 }
546
547 # Number of DE genes if user specified : FDR and Fold Change
548
549 if(pcutoff == "X" && FDRcutoff != "X" && FoldChangecutoff != "X")
550 {
551
552 # ##--------------- initial settings start---
553 # store scores matrix in to a backup variable
554 allscoresExtra <- allscores;
555 # second backup...it will be restored itself later before return of Ths
556 ASBKP_two <- allscores;
557
558 myGenes <- rownames(allscoresExtra);
559 Ths <- 0;
560 # ##--------------- initial settings end---
561
562 myGenes <- rownames(allscoresExtra);
563 tx <- 0;
564 tx <- nrow(matrix(myGenes[allscores[,"FDR"]<= FDRcutoff]));
565
566 if(tx == 0)
567 {
568 Ths <- 0;
569 return(Ths);
570 }
571 else
572 {
573 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
574 allscores <- allscoresExtra[myDEGenestemp,];
575 }
576 myGenes <- rownames(allscores);
577 tx <- 0;
578 tx <- nrow(matrix(myGenes[allscores[,"FoldChange"]>= FoldChangecutoff]));
579 if(tx == 0)
580 {
581 Ths <- 0;
582 return(Ths);
583 }
584 else
585 {
586 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
587 allscores <- allscoresExtra[myDEGenestemp,];
588 }
589 Ths <- nrow(allscores);
590 if(is.null(Ths))
591 {
592 Ths <- 0;
593 }
594
595 # -------------- restore ------------------
596 allscores <- ASBKP_two;
597 # ----------------------------------------
598 return(Ths);
599 }
600
601 # Number of DE genes if user specified : t-test P-value
602
603 if(pcutoff != "X" && FDRcutoff == "X" && FoldChangecutoff == "X")
604 {
605
606 # ##--------------- initial settings start---
607 # store scores matrix in to a backup variable
608
609 allscoresExtra <- allscores;
610
611 # second backup...it will be restored itself later before return of Ths
612
613 ASBKP_two <- allscores;
614
615 myGenes <- rownames(allscoresExtra);
616 Ths <- 0;
617
618 # ##--------------- initial settings end---
619
620 myGenes <- rownames(allscoresExtra);
621 myDEGenestemp <- myGenes[allscores[,"p.value"]<= pcutoff];
622
623 allscores <- allscoresExtra[myDEGenestemp,]
624 Ths <- nrow(allscores);
625
626 if(is.null(Ths))
627 {
628 Ths <- 0;
629 }
630 # -------------- restore ------------------
631 allscores <- ASBKP_two;
632 # ----------------------------------------
633 return(Ths);
634
635 }
636
637 # Number of DE genes if user specified : FDR
638
639 if(pcutoff == "X" && FDRcutoff != "X" && FoldChangecutoff == "X")
640 {
641
642 # ##--------------- initial settings start---
643 # store scores matrix in to a backup variable
644 allscoresExtra <- allscores;
645
646 # second backup...it will be restored itself later before return of Ths
647
648 ASBKP_two <- allscores;
649
650 myGenes <- rownames(allscoresExtra);
651
652 Ths <- 0;
653
654 # ##--------------- initial settings end---
655
656 myGenes <- rownames(allscoresExtra);
657 myDEGenestemp <- myGenes[allscores[,"FDR"]<= FDRcutoff];
658 allscores <- allscoresExtra[myDEGenestemp,]
659 Ths <- nrow(allscores);
660 if(is.null(Ths))
661 {
662 Ths <- 0;
663 }
664 # -------------- restore ------------------
665 allscores <- ASBKP_two;
666 # ----------------------------------------
667 return(Ths);
668 }
669
670 # Number of DE genes if user specified : Fold Change
671 if(pcutoff == "X" && FDRcutoff == "X" && FoldChangecutoff != "X")
672 {
673
674 # ##--------------- initial settings start---
675 # store scores matrix in to a backup variable
676 allscoresExtra <- allscores;
677
678 # second backup...it will be restored itself later before return of Ths
679 ASBKP_two <- allscores;
680
681 myGenes <- rownames(allscoresExtra);
682 Ths <- 0;
683 # ##--------------- initial settings end---
684
685 myGenes <- rownames(allscoresExtra);
686 myDEGenestemp <- myGenes[allscores[,"FoldChange"] >= FoldChangecutoff];
687 allscores <- allscoresExtra[myDEGenestemp,]
688 Ths <- nrow(allscores);
689 if(is.null(Ths))
690 {
691 Ths <- 0;
692 }
693 # -------------- restore ------------------
694 allscores <- ASBKP_two;
695 # ----------------------------------------
696 return(Ths);
697 }
698
699 # Number of DE genes if user specified : No parameters
700
701 if(pcutoff == "X" && FDRcutoff == "X" && FoldChangecutoff == "X")
702 {
703 Ths <- 0;
704 # -------------- restore ------------------
705 allscores <- ASBKP_two;
706 # ----------------------------------------
707 return(Ths);
708 }
709
710 # -------------- restore ------------------
711 allscores <- ASBKP_two;
712 # ----------------------------------------
713
714 } # GetThs function ends here
715
716 # ############################## Generating RMA_Genes_Filtered_from_Single_Parameter_Test.html ###############################################
717
718 # Below code will build html table, which will show if only single parameter
719 # is applied to data, howmany number of genes will trun up to be
720 # differntially expressed.
721
722 Table1 <- 0;
723 Table1 <- matrix(Table1,nrow = 3,ncol = 6);
724
725 # #### [1,0] [2,0] [3,0] [4,0] [5,0] [6,0] ###########
726 colnames(Table1) <- c("Fold Change","No. of Genes","t-test (p-value)","No. of Genes","FDR (BH method)","No. of Genes");
727
728 # # fc
729
730 Table1[1,1] <- 1.5;
731 Table1[2,1] <- 2.0;
732 Table1[3,1] <- 3.0;
733
734 # pval
735
736 Table1[1,3] <- 0.05;
737 Table1[2,3] <- 0.01;
738 Table1[3,3] <- c("none");
739
740 # fdr
741
742 Table1[1,5] <- 0.05;
743 Table1[2,5] <- 0.01;
744 Table1[3,5] <- c("none");
745
746
747 # #### now putting values in to table
748
749 # fc
750
751 Ths <- GetThs("X","X",1.5);
752 Table1[1,2] <- Ths;
753
754 Ths <- GetThs("X","X",2);
755 Table1[2,2] <- Ths;
756
757 Ths <- GetThs("X","X",3);
758 Table1[3,2] <- Ths;
759
760 # pval
761
762 Ths <- GetThs(0.05,"X","X");
763 Table1[1,4] <- Ths;
764
765 Ths <- GetThs(0.01,"X","X");
766 Table1[2,4] <- Ths;
767
768 # fdr
769
770 Ths <- GetThs("X",0.05,"X");
771 Table1[1,6] <- Ths;
772
773 Ths <- GetThs("X",0.01,"X");
774 Table1[2,6] <- Ths;
775
776 # #####
777
778 Table1 <- as.table(Table1);
779 # table2html(Table1, filename = "RMA_Genes_Filtered_from_Single_Parameter_Test.html", title="Number of genes filtered from data using a single parameter test: Fold Change or t-test (p-value) or FDR (BH method)", table.head = "Normalization Method: RMA",disp ="file");
780
781 # #
782
783 RMA_Table1 <- Table1;
784
785 # #
786
787
788 # ############################## Generating RMA_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html ##############################
789
790 # Below code will build html table, which will show if T-test pvalue and
791 # fold change paramter will put together and applied to data, howmany number
792 # of genes will trun up to be differntially expressed.
793
794 Table2 <- 0;
795 Table2 <- matrix(Table2,nrow = 4,ncol = 4);
796
797 # #### [1,0] [2,0] [3,0] [4,0] ###########
798 colnames(Table2) <- c("t-test (p-value)","Fold Change","No. of Genes", "FDR (BH method)");
799
800 # pval
801
802 Table2[1,1] <- 0.05;
803 Table2[2,1] <- 0.05;
804 Table2[3,1] <- 0.01;
805 Table2[4,1] <- 0.01;
806
807 # fc
808
809 Table2[1,2] <- 1.5;
810 Table2[2,2] <- 2;
811 Table2[3,2] <- 1.5;
812 Table2[4,2] <- 2;
813
814
815 # No. of Genes
816
817 Ths <- GetThs(0.05,"X",1.5); # for this option, GetThs() returns two values: Ths and MaxFdr.
818 Table2[1,3] <- Ths[1,1];
819 Table2[1,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
820
821 Ths <- GetThs(0.05,"X",2);
822 Table2[2,3] <- Ths[1,1];
823 Table2[2,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
824
825 Ths <- GetThs(0.01,"X",1.5);
826 Table2[3,3] <- Ths[1,1];
827 Table2[3,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
828
829 Ths <- GetThs(0.01,"X",2);
830 Table2[4,3] <- Ths[1,1];
831 Table2[4,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
832
833 # #####
834
835 Table2 <- as.table(Table2);
836 # table2html(Table2, filename = "RMA_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html",title = "Number of Genes filtered from analysis using combined tests : t-test (P-value) and Fold Change", table.head = "Normalization Method: RMA",disp ="file");
837
838 # #
839
840 RMA_Table2 <- Table2;
841
842 # #
843
844 # ############################## Generating RMA_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html ##############################
845
846 # Below code will build html table, which will show if FDR(BH) value and
847 # fold change paramter will put together and applied to data, howmany number
848 # of genes will trun up to be differntially expressed.
849
850 Table3 <- 0;
851 Table3 <- matrix(Table3,nrow = 4,ncol = 3);
852
853 # #### [1,0] [2,0] [3,0] ###########
854 colnames(Table3) <- c("FDR (BH method)","Fold Change","No. of Genes");
855
856 # fdr
857
858 Table3[1,1] <- 0.05;
859 Table3[2,1] <- 0.05;
860 Table3[3,1] <- 0.01;
861 Table3[4,1] <- 0.01;
862
863 # fc
864
865 Table3[1,2] <- 1.5;
866 Table3[2,2] <- 2;
867 Table3[3,2] <- 1.5;
868 Table3[4,2] <- 2;
869
870 # No. of Genes
871
872 Ths <- GetThs("X",0.05,1.5);
873 Table3[1,3] <- Ths;
874
875 Ths <- GetThs("X",0.05,2);
876 Table3[2,3] <- Ths;
877
878 Ths <- GetThs("X",0.01,1.5);
879 Table3[3,3] <- Ths;
880
881 Ths <- GetThs("X",0.01,2);
882 Table3[4,3] <- Ths;
883
884 Table3 <- as.table(Table3);
885 # table2html(Table3, filename = "RMA_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html",title ="Number of genes filtered from data using combined tests: FDR (BH method) and Fold Change", table.head = "Normalization Method: RMA",disp ="file");
886
887 # #
888
889 RMA_Table3 <- Table3;
890
891 # #
892
893
894 # ############################## Generating RMA_Genes_Filtered_from_USER_Defined_Parameters.html ##############################
895
896 # Below code will build html table, which will show if user have
897 # specified values; thoes paramter values put together and applied to data,
898 # howmany number of genes will trun up to be differntially expressed.
899
900 Table4 <- 0;
901 Table4 <- matrix(Table4,nrow = 1,ncol = 5);
902
903 # #### [1,0] [2,0] [3,0] [4,0] [5,0] ###########
904 colnames(Table4) <- c("t-test (p-value)","FDR (BH method)","Fold Change","No. of Genes","FDR (BH method) for filtered genes");
905
906 Table4[1,1] <- pcutoff;
907 Table4[1,2] <- FDRcutoff;
908 Table4[1,3] <- FoldChangecutoff;
909
910 Ths <- GetThs(pcutoff,FDRcutoff,FoldChangecutoff);
911 Ths
912 Tn <- nrow(Ths);
913 Tn
914 if(is.null(Tn))
915 {
916 Table4[1,4] <- Ths;
917 Table4[1,5] <- FDRcutoff;
918 }
919 if(!is.null(Tn))
920 {
921 Table4[1,4] <- Ths[1,1];
922 Table4[1,5] <- Ths[2,1];
923 }
924
925 Table4 <- as.table(Table4);
926 # table2html(Table4, filename = "RMA_Genes_Filtered_from_USER_Defined_Parameters.html", title = "Number of genes filtered from data according to user-defined parameters: [Fold Change] (And/Or) [t-test (p-value)] (And/Or) [FDR (BH method)]",table.head = "Normalization Method: RMA",disp ="file");
927
928 # #
929
930 RMA_Table4 <- Table4;
931
932 # #
933
934
935 # ######################################### RMA Normalization report ends #######################################################
936 # #######################################################################################################################
937 # #######################################################################################################################
938 # #######################################################################################################################
939 # #######################################################################################################################
940 # #######################################################################################################################
941 # ######################################### MAS5 Normalization report starts #######################################################
942
943 # Normalization of Affymetrix Raw data using MAS5 process - Starts.
944 # MAS5 function performs normalization, background correction and calculating
945 # expression values for affymetrix .CEL "abatch" object
946
947
948 myRMA<- mas5(abatch);
949
950 # # below step is mas5 specific
951
952 exprs(myRMA) <- log2(exprs(myRMA)); # mas5 generates anti-log2based data so, for DE genes, we convert back to log based 2
953
954 # ##
955
956 MatrixRMA <- exprs(myRMA); # converting this expression object to Matrix for future use
957 write.exprs(myRMA,file="MAS5_Expression_data_after_normalization_for_all_genes.xls");
958
959 # Pre-processing of microarray data - Ends.
960
961
962 # Calculate t test,Fold Change, False Discovery Rate(FDR - BH method)
963
964 # perform ttest - student's t test
965 # "scoresFUN" function takes normalized expression set and two index values to
966 # perform ttest using equal variance == TRUE
967 #
968 # scoresFUN function takes expression set and two index values to perform ttest where equal variance == TRUE
969
970 scoresFUN <- function(myRMA,a,b)
971 {
972 Index1 <- which(myRMA$pheno1==a)
973 Index2 <- which(myRMA$pheno1==b)
974 scores <- esApply(myRMA,1,function(x){
975 tmp <- t.test(x[Index2],x[Index1],var.equal=TRUE)
976 c(mean(tmp$estimate),-diff(tmp$estimate),tmp$statistic,tmp$p.value)
977 })
978 scores <- t(scores) # #put genes back in rows
979 colnames(scores) <- c("A","M","t.stat","p.value")
980 return(scores);
981 } # scoresFUN ends here
982
983 myGenes <- geneNames(myRMA);
984 # calling scoresFUN() function on normalized expression set
985 allscores <- scoresFUN(myRMA,0,1);
986 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
987 # Benjamini and Hochberg p-value adjustment
988 adj.p.values <- tmp$adjp[order(tmp$index),];
989
990 # Calculate FDR
991 allscores <- cbind(allscores,adj.p.values[,-1]);
992 colnames(allscores)[5] <- "FDR";
993
994 # Calculate Mean expression intensity for Group1 or Control Group
995 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
996 allscores <- cbind(allscores,Temp_G1Means);
997 colnames(allscores)[6] <- "Group1Means";
998
999 # Calculate Mean expression intensity for Group2 or Expremental Group
1000 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
1001 allscores <- cbind(allscores,Temp_G2Means);
1002 colnames(allscores)[7] <- "Group2Means";
1003
1004 # Calculate Fold Change
1005 Temp_FC <- as.matrix(as.double(Temp_G2Means/Temp_G1Means));
1006 allscores <- cbind(allscores,Temp_FC);
1007 colnames(allscores)[8] <- "FoldChange";
1008
1009 # Calculate Up or Down regulation of gene in terms of intensitiy
1010 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
1011 allscores <- cbind(allscores,Temp_updwn);
1012 colnames(allscores)[9] <- "updwn";
1013
1014 # Take below code's comment off, if user wants to get rid of low expressed genes
1015 # non specific filtering...
1016 # f1 <- pOverA(0.25,log2(100));
1017 # ff <- filterfun(f1);
1018 # selected <- genefilter(myRMA,ff);
1019 # sum(selected);
1020 # esetSub <- myRMA[selected,];
1021 # myRMA <- esetSub;
1022 # MatrixRMA <- exprs(myRMA);
1023
1024 # Number of Differentially expressed Gene filteration - Starts
1025
1026 myGenes <- geneNames(myRMA);
1027
1028 # calling scoresFUN() function on normalized expression set
1029 allscores <- scoresFUN(myRMA,0,1);
1030 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
1031 # Benjamini and Hochberg p-value adjustment
1032 adj.p.values <- tmp$adjp[order(tmp$index),];
1033
1034 # Calculate FDR
1035 allscores <- cbind(allscores,adj.p.values[,-1]);
1036 colnames(allscores)[5] <- "FDR";
1037
1038 # Calculate Mean expression intensity for Group1 or Control Group
1039 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
1040 allscores <- cbind(allscores,Temp_G1Means);
1041 colnames(allscores)[6] <- "Group1Means";
1042
1043 # Calculate Mean expression intensity for Group2 or Expremental Group
1044 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
1045 allscores <- cbind(allscores,Temp_G2Means);
1046 colnames(allscores)[7] <- "Group2Means";
1047
1048 # Calculate Fold Change
1049 sizeG1 <- nrow(Temp_G1Means);
1050 sizeG2 <- nrow(Temp_G2Means);
1051 Temp_FC <- numeric(length(Temp_G1Means));
1052
1053 for(f in 1:sizeG1)
1054 {
1055 fcg1 = as.double(Temp_G1Means[f]/Temp_G2Means[f]);
1056 fcg2 = as.double(Temp_G2Means[f]/Temp_G1Means[f]);
1057 if(fcg1 >= 1)
1058 {
1059 Temp_FC[f] <- fcg1;
1060 }
1061 else
1062 {
1063 Temp_FC[f] <- fcg2;
1064 }
1065
1066 }
1067 # Temp_FC <- as.matrix(as.double(Temp_G1Means/Temp_G2Means));
1068 allscores <- cbind(allscores,Temp_FC);
1069 colnames(allscores)[8] <- "FoldChange";
1070
1071 # Calculate Up or Down regulation of gene in terms of intensitiy
1072 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
1073 allscores <- cbind(allscores,Temp_updwn);
1074 colnames(allscores)[9] <- "updwn";
1075 # ##################################################################################
1076
1077 # ##################################################################################
1078
1079 # ############################## Generating MAS5_Genes_Filtered_from_Single_Parameter_Test.html ###############################################
1080
1081 # Below code will build html table, which will show if only single parameter
1082 # is applied to data, howmany number of genes will trun up to be
1083 # differntially expressed.
1084
1085 Table1 <- 0;
1086 Table1 <- matrix(Table1,nrow = 3,ncol = 6);
1087
1088 # #### [1,0] [2,0] [3,0] [4,0] [5,0] [6,0] ###########
1089 colnames(Table1) <- c("Fold Change","No. of Genes","t-test (p-value)","No. of Genes","FDR (BH method)","No. of Genes");
1090
1091 # # fc
1092
1093 Table1[1,1] <- 1.5;
1094 Table1[2,1] <- 2.0;
1095 Table1[3,1] <- 3.0;
1096
1097 # pval
1098
1099 Table1[1,3] <- 0.05;
1100 Table1[2,3] <- 0.01;
1101 Table1[3,3] <- c("none");
1102
1103 # fdr
1104
1105 Table1[1,5] <- 0.05;
1106 Table1[2,5] <- 0.01;
1107 Table1[3,5] <- c("none");
1108
1109
1110 # #### now putting values in to table
1111
1112 # fc
1113
1114 Ths <- GetThs("X","X",1.5);
1115 Table1[1,2] <- Ths;
1116
1117 Ths <- GetThs("X","X",2);
1118 Table1[2,2] <- Ths;
1119
1120 Ths <- GetThs("X","X",3);
1121 Table1[3,2] <- Ths;
1122
1123 # pval
1124
1125 Ths <- GetThs(0.05,"X","X");
1126 Table1[1,4] <- Ths;
1127
1128 Ths <- GetThs(0.01,"X","X");
1129 Table1[2,4] <- Ths;
1130
1131 # fdr
1132
1133 Ths <- GetThs("X",0.05,"X");
1134 Table1[1,6] <- Ths;
1135
1136 Ths <- GetThs("X",0.01,"X");
1137 Table1[2,6] <- Ths;
1138
1139 # #####
1140
1141 Table1 <- as.table(Table1);
1142 # table2html(Table1, filename = "MAS5_Genes_Filtered_from_Single_Parameter_Test.html", title="Number of genes filtered from data using a single parameter test: Fold Change or t-test (p-value) or FDR (BH method)", table.head = "Normalization Method: MAS5",disp ="file");
1143
1144 # #
1145
1146 MAS5_Table1 <- Table1;
1147
1148 # #
1149
1150
1151 # ############################## Generating MAS5_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html ##############################
1152
1153 # Below code will build html table, which will show if T-test pvalue and
1154 # fold change paramter will put together and applied to data, howmany number
1155 # of genes will trun up to be differntially expressed.
1156
1157 Table2 <- 0;
1158 Table2 <- matrix(Table2,nrow = 4,ncol = 4);
1159
1160 # #### [1,0] [2,0] [3,0] [4,0] ###########
1161 colnames(Table2) <- c("t-test (p-value)","Fold Change","No. of Genes", "FDR (BH method)");
1162
1163 # pval
1164
1165 Table2[1,1] <- 0.05;
1166 Table2[2,1] <- 0.05;
1167 Table2[3,1] <- 0.01;
1168 Table2[4,1] <- 0.01;
1169
1170 # fc
1171
1172 Table2[1,2] <- 1.5;
1173 Table2[2,2] <- 2;
1174 Table2[3,2] <- 1.5;
1175 Table2[4,2] <- 2;
1176
1177
1178 # No. of Genes
1179
1180 Ths <- GetThs(0.05,"X",1.5); # for this option, GetThs() returns two values: Ths and MaxFdr.
1181 Table2[1,3] <- Ths[1,1];
1182 Table2[1,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1183
1184 Ths <- GetThs(0.05,"X",2);
1185 Table2[2,3] <- Ths[1,1];
1186 Table2[2,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1187
1188 Ths <- GetThs(0.01,"X",1.5);
1189 Table2[3,3] <- Ths[1,1];
1190 Table2[3,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1191
1192 Ths <- GetThs(0.01,"X",2);
1193 Table2[4,3] <- Ths[1,1];
1194 Table2[4,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1195
1196 # #####
1197
1198 Table2 <- as.table(Table2);
1199 # table2html(Table2, filename = "MAS5_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html",title = "Number of genes filtered from data using combined tests: t-test (p-value) and Fold Change", table.head = "Normalization Method: MAS5",disp ="file");
1200
1201 # #
1202
1203 MAS5_Table2 <- Table2;
1204
1205 # #
1206
1207
1208 # ############################## Generating MAS5_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html ##############################
1209
1210 # Below code will build html table, which will show if FDR(BH) value and
1211 # fold change paramter will put together and applied to data, howmany number
1212 # of genes will trun up to be differntially expressed.
1213
1214 Table3 <- 0;
1215 Table3 <- matrix(Table3,nrow = 4,ncol = 3);
1216
1217 # #### [1,0] [2,0] [3,0] ###########
1218 colnames(Table3) <- c("FDR (BH method)","Fold Change","No. of Genes");
1219
1220 # fdr
1221
1222 Table3[1,1] <- 0.05;
1223 Table3[2,1] <- 0.05;
1224 Table3[3,1] <- 0.01;
1225 Table3[4,1] <- 0.01;
1226
1227 # fc
1228
1229 Table3[1,2] <- 1.5;
1230 Table3[2,2] <- 2;
1231 Table3[3,2] <- 1.5;
1232 Table3[4,2] <- 2;
1233
1234 # No. of Genes
1235
1236 Ths <- GetThs("X",0.05,1.5);
1237 Table3[1,3] <- Ths;
1238
1239 Ths <- GetThs("X",0.05,2);
1240 Table3[2,3] <- Ths;
1241
1242 Ths <- GetThs("X",0.01,1.5);
1243 Table3[3,3] <- Ths;
1244
1245 Ths <- GetThs("X",0.01,2);
1246 Table3[4,3] <- Ths;
1247
1248 Table3 <- as.table(Table3);
1249 # table2html(Table3, filename = "MAS5_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html",title ="Number of genes filtered from data using combined tests: FDR (BH method) and Fold Change", table.head = "Normalization Method: MAS5",disp ="file");
1250
1251 # #
1252
1253 MAS5_Table3 <- Table3;
1254
1255 # #
1256
1257
1258 # ############################## Generating MAS5_Genes_Filtered_from_USER_Defined_Parameters.html ##############################
1259
1260 # Below code will build html table, which will show if user have
1261 # specified values; thoes paramter values put together and applied to data,
1262 # howmany number of genes will trun up to be differntially expressed.
1263
1264 Table4 <- 0;
1265 Table4 <- matrix(Table4,nrow = 1,ncol = 5);
1266
1267 # #### [1,0] [2,0] [3,0] [4,0] [5,0] ###########
1268 colnames(Table4) <- c("t-test (p-value)","FDR (BH method)","Fold Change","No. of Genes","FDR (BH method) for filtered genes");
1269
1270 Table4[1,1] <- pcutoff;
1271 Table4[1,2] <- FDRcutoff;
1272 Table4[1,3] <- FoldChangecutoff;
1273
1274 Ths <- GetThs(pcutoff,FDRcutoff,FoldChangecutoff);
1275 Ths
1276 Tn <- nrow(Ths);
1277 Tn
1278 if(is.null(Tn))
1279 {
1280 Table4[1,4] <- Ths;
1281 Table4[1,5] <- FDRcutoff;
1282 }
1283 if(!is.null(Tn))
1284 {
1285 Table4[1,4] <- Ths[1,1];
1286 Table4[1,5] <- Ths[2,1];
1287 }
1288
1289 Table4 <- as.table(Table4);
1290 # table2html(Table4, filename = "MAS5_Genes_Filtered_from_USER_Defined_Parameters.html", title = "Number of genes filtered from data according to user-defined parameters: [Fold Change] (And/Or) [t-test (p-value)] (And/Or) [FDR (BH method)]",table.head = "Normalization Method: MAS5",disp ="file");
1291
1292 # #
1293
1294 MAS5_Table4 <- Table4;
1295
1296 # #
1297
1298
1299 # ######################################### MAS5 Normalization report ends #######################################################
1300 # #######################################################################################################################
1301 # #######################################################################################################################
1302 # #######################################################################################################################
1303 # #######################################################################################################################
1304 # #######################################################################################################################
1305 # ######################################### GCRMA Normalization report starts #######################################################
1306
1307 # Normalization of Affymetrix Raw data using GCRMA process - Starts.
1308 # GCRMA function performs normalization, background correction and calculating
1309 # expression values for affymetrix .CEL "abatch" object
1310
1311 myRMA<- gcrma(abatch); # this function is fast and which reads only CEL file exists in current user dir.
1312 MatrixRMA <- exprs(myRMA); # converting this expression object to Matrix for future use
1313 write.exprs(myRMA,file="GCRMA_Expression_data_after_normalization_for_all_genes.xls");
1314
1315 # Pre-processing of microarray data - Ends.
1316
1317
1318 # Calculate t test,Fold Change, False Discovery Rate(FDR - BH method)
1319
1320 # perform ttest - student's t test
1321 # "scoresFUN" function takes normalized expression set and two index values to
1322 # perform ttest using equal variance == TRUE
1323 #
1324 # scoresFUN function takes expression set and two index values to perform ttest where equal variance == TRUE
1325
1326 scoresFUN <- function(myRMA,a,b)
1327 {
1328 Index1 <- which(myRMA$pheno1==a)
1329 Index2 <- which(myRMA$pheno1==b)
1330 scores <- esApply(myRMA,1,function(x){
1331 tmp <- t.test(x[Index2],x[Index1],var.equal=TRUE)
1332 c(mean(tmp$estimate),-diff(tmp$estimate),tmp$statistic,tmp$p.value)
1333 })
1334 scores <- t(scores) # #put genes back in rows
1335 colnames(scores) <- c("A","M","t.stat","p.value")
1336 return(scores);
1337 } # scoresFUN ends here
1338
1339 myGenes <- geneNames(myRMA);
1340 # calling scoresFUN() function on normalized expression set
1341 allscores <- scoresFUN(myRMA,0,1);
1342 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
1343 # Benjamini and Hochberg p-value adjustment
1344 adj.p.values <- tmp$adjp[order(tmp$index),];
1345
1346 # Calculate FDR
1347 allscores <- cbind(allscores,adj.p.values[,-1]);
1348 colnames(allscores)[5] <- "FDR";
1349
1350 # Calculate Mean expression intensity for Group1 or Control Group
1351 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
1352 allscores <- cbind(allscores,Temp_G1Means);
1353 colnames(allscores)[6] <- "Group1Means";
1354
1355 # Calculate Mean expression intensity for Group2 or Expremental Group
1356 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
1357 allscores <- cbind(allscores,Temp_G2Means);
1358 colnames(allscores)[7] <- "Group2Means";
1359
1360 # Calculate Fold Change
1361 Temp_FC <- as.matrix(as.double(Temp_G2Means/Temp_G1Means));
1362 allscores <- cbind(allscores,Temp_FC);
1363 colnames(allscores)[8] <- "FoldChange";
1364
1365 # Calculate Up or Down regulation of gene in terms of intensitiy
1366 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
1367 allscores <- cbind(allscores,Temp_updwn);
1368 colnames(allscores)[9] <- "updwn";
1369
1370 # Take below code's comment off, if user wants to get rid of low expressed genes
1371 # non specific filtering...
1372 # f1 <- pOverA(0.25,log2(100));
1373 # ff <- filterfun(f1);
1374 # selected <- genefilter(myRMA,ff);
1375 # sum(selected);
1376 # esetSub <- myRMA[selected,];
1377 # myRMA <- esetSub;
1378 # MatrixRMA <- exprs(myRMA);
1379
1380 # Number of Differentially expressed Gene filteration - Starts
1381
1382 myGenes <- geneNames(myRMA);
1383
1384 # calling scoresFUN() function on normalized expression set
1385 allscores <- scoresFUN(myRMA,0,1);
1386 tmp <- mt.rawp2adjp(allscores[,"p.value"],proc = "BH");
1387 # Benjamini and Hochberg p-value adjustment
1388 adj.p.values <- tmp$adjp[order(tmp$index),];
1389
1390 # Calculate FDR
1391 allscores <- cbind(allscores,adj.p.values[,-1]);
1392 colnames(allscores)[5] <- "FDR";
1393
1394 # Calculate Mean expression intensity for Group1 or Control Group
1395 Temp_G1Means <- as.matrix(rowMeans(2^MatrixRMA[,w:x]));
1396 allscores <- cbind(allscores,Temp_G1Means);
1397 colnames(allscores)[6] <- "Group1Means";
1398
1399 # Calculate Mean expression intensity for Group2 or Expremental Group
1400 Temp_G2Means <- as.matrix(rowMeans(2^MatrixRMA[,y:z]));
1401 allscores <- cbind(allscores,Temp_G2Means);
1402 colnames(allscores)[7] <- "Group2Means";
1403
1404 # Calculate Fold Change
1405 sizeG1 <- nrow(Temp_G1Means);
1406 sizeG2 <- nrow(Temp_G2Means);
1407 Temp_FC <- numeric(length(Temp_G1Means));
1408
1409 for(f in 1:sizeG1)
1410 {
1411 fcg1 = as.double(Temp_G1Means[f]/Temp_G2Means[f]);
1412 fcg2 = as.double(Temp_G2Means[f]/Temp_G1Means[f]);
1413 if(fcg1 >= 1)
1414 {
1415 Temp_FC[f] <- fcg1;
1416 }
1417 else
1418 {
1419 Temp_FC[f] <- fcg2;
1420 }
1421
1422 }
1423 # Temp_FC <- as.matrix(as.double(Temp_G1Means/Temp_G2Means));
1424 allscores <- cbind(allscores,Temp_FC);
1425 colnames(allscores)[8] <- "FoldChange";
1426
1427 # Calculate Up or Down regulation of gene in terms of intensitiy
1428 Temp_updwn <- as.matrix(as.double(Temp_G2Means - Temp_G1Means));
1429 allscores <- cbind(allscores,Temp_updwn);
1430 colnames(allscores)[9] <- "updwn";
1431
1432 # ############################## Generating GCRMA_Genes_Filtered_from_Single_Parameter_Test.html ###############################################
1433
1434 # Below code will build html table, which will show if only single parameter
1435 # is applied to data, howmany number of genes will trun up to be
1436 # differntially expressed.
1437
1438 Table1 <- 0;
1439 Table1 <- matrix(Table1,nrow = 3,ncol = 6);
1440
1441 # #### [1,0] [2,0] [3,0] [4,0] [5,0] [6,0] ###########
1442 colnames(Table1) <- c("Fold Change","No. of Genes","t-test (p-value)","No. of Genes","FDR (BH method)","No. of Genes");
1443
1444 # # fc
1445
1446 Table1[1,1] <- 1.5;
1447 Table1[2,1] <- 2.0;
1448 Table1[3,1] <- 3.0;
1449
1450 # pval
1451
1452 Table1[1,3] <- 0.05;
1453 Table1[2,3] <- 0.01;
1454 Table1[3,3] <- c("none");
1455
1456 # fdr
1457
1458 Table1[1,5] <- 0.05;
1459 Table1[2,5] <- 0.01;
1460 Table1[3,5] <- c("none");
1461
1462
1463 # #### now putting values in to table
1464
1465 # fc
1466
1467 Ths <- GetThs("X","X",1.5);
1468 Table1[1,2] <- Ths;
1469
1470 Ths <- GetThs("X","X",2);
1471 Table1[2,2] <- Ths;
1472
1473 Ths <- GetThs("X","X",3);
1474 Table1[3,2] <- Ths;
1475
1476 # pval
1477
1478 Ths <- GetThs(0.05,"X","X");
1479 Table1[1,4] <- Ths;
1480
1481 Ths <- GetThs(0.01,"X","X");
1482 Table1[2,4] <- Ths;
1483
1484 # fdr
1485
1486 Ths <- GetThs("X",0.05,"X");
1487 Table1[1,6] <- Ths;
1488
1489 Ths <- GetThs("X",0.01,"X");
1490 Table1[2,6] <- Ths;
1491
1492 # #####
1493
1494 Table1 <- as.table(Table1);
1495 # table2html(Table1, filename = "GCRMA_Genes_Filtered_from_Single_Parameter_Test.html", title="Number of genes filtered from data using a single parameter test: Fold Change or t-test (p-value) or FDR (BH method)", table.head = "Normalization Method: GCRMA",disp ="file");
1496
1497 # #
1498
1499 GCRMA_Table1 <- Table1;
1500
1501 # #
1502
1503 # ############################## Generating GCRMA_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html ##############################
1504
1505 # Below code will build html table, which will show if T-test pvalue and
1506 # fold change paramter will put together and applied to data, howmany number
1507 # of genes will trun up to be differntially expressed.
1508
1509 Table2 <- 0;
1510 Table2 <- matrix(Table2,nrow = 4,ncol = 4);
1511
1512 # #### [1,0] [2,0] [3,0] [4,0] ###########
1513 colnames(Table2) <- c("t-test (p-value)","Fold Change","No. of Genes","FDR (BH method)");
1514
1515 # pval
1516
1517 Table2[1,1] <- 0.05;
1518 Table2[2,1] <- 0.05;
1519 Table2[3,1] <- 0.01;
1520 Table2[4,1] <- 0.01;
1521
1522 # fc
1523
1524 Table2[1,2] <- 1.5;
1525 Table2[2,2] <- 2;
1526 Table2[3,2] <- 1.5;
1527 Table2[4,2] <- 2;
1528
1529 # No. of Genes
1530
1531 Ths <- GetThs(0.05,"X",1.5); # for this option, GetThs() returns two values: Ths and MaxFdr.
1532 Table2[1,3] <- Ths[1,1];
1533 Table2[1,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1534
1535 Ths <- GetThs(0.05,"X",2);
1536 Table2[2,3] <- Ths[1,1];
1537 Table2[2,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1538
1539 Ths <- GetThs(0.01,"X",1.5);
1540 Table2[3,3] <- Ths[1,1];
1541 Table2[3,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1542
1543 Ths <- GetThs(0.01,"X",2);
1544 Table2[4,3] <- Ths[1,1];
1545 Table2[4,4] <- Ths[2,1]; # here we put MaxFdr in table before another call of GetThs function
1546
1547
1548 # #####
1549
1550 Table2 <- as.table(Table2);
1551 # table2html(Table2, filename = "GCRMA_Genes_Filtered_from_T_Test_pvalue_and_Fold_Change_Parameters.html",title = "Number of genes filtered from data using combined tests: t-test (p-value) and Fold Change", table.head = "Normalization Method: GCRMA",disp ="file");
1552
1553 # #
1554
1555 GCRMA_Table2 <- Table2;
1556
1557 # #
1558
1559
1560 # ############################## Generating GCRMA_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html ##############################
1561
1562 # Below code will build html table, which will show if FDR(BH) value and
1563 # fold change paramter will put together and applied to data, howmany number
1564 # of genes will trun up to be differntially expressed.
1565
1566 Table3 <- 0;
1567 Table3 <- matrix(Table3,nrow = 4,ncol = 3);
1568
1569 # #### [1,0] [2,0] [3,0] ###########
1570 colnames(Table3) <- c("FDR (BH method)","Fold Change","No. of Genes");
1571
1572 # fdr
1573
1574 Table3[1,1] <- 0.05;
1575 Table3[2,1] <- 0.05;
1576 Table3[3,1] <- 0.01;
1577 Table3[4,1] <- 0.01;
1578
1579 # fc
1580
1581 Table3[1,2] <- 1.5;
1582 Table3[2,2] <- 2;
1583 Table3[3,2] <- 1.5;
1584 Table3[4,2] <- 2;
1585
1586 # No. of Genes
1587
1588 Ths <- GetThs("X",0.05,1.5);
1589 Table3[1,3] <- Ths;
1590
1591 Ths <- GetThs("X",0.05,2);
1592 Table3[2,3] <- Ths;
1593
1594 Ths <- GetThs("X",0.01,1.5);
1595 Table3[3,3] <- Ths;
1596
1597 Ths <- GetThs("X",0.01,2);
1598 Table3[4,3] <- Ths;
1599
1600 Table3 <- as.table(Table3);
1601 # table2html(Table3, filename = "GCRMA_Genes_Filtered_from_FDR_BH_and_Fold_Change_Parameters.html",title ="Number of genes filtered from data using combined tests: FDR (BH method) and Fold Change", table.head = "Normalization Method: GCRMA",disp ="file");
1602
1603 # #
1604
1605 GCRMA_Table3 <- Table3;
1606
1607 # #
1608
1609
1610 # ############################## Generating GCRMA_Genes_Filtered_from_USER_Defined_Parameters.html ##############################
1611
1612 # Below code will build html table, which will show if user have
1613 # specified values; thoes paramter values put together and applied to data,
1614 # howmany number of genes will trun up to be differntially expressed.
1615
1616 Table4 <- 0;
1617 Table4 <- matrix(Table4,nrow = 1,ncol = 5);
1618
1619 # #### [1,0] [2,0] [3,0] [4,0] [5,0] ###########
1620 colnames(Table4) <- c("t-test (p-value)","FDR (BH method)","Fold Change","No. of Genes","FDR (BH method) for filtered genes");
1621
1622 Table4[1,1] <- pcutoff;
1623 Table4[1,2] <- FDRcutoff;
1624 Table4[1,3] <- FoldChangecutoff;
1625
1626 Ths <- GetThs(pcutoff,FDRcutoff,FoldChangecutoff);
1627 Ths
1628 Tn <- nrow(Ths);
1629 Tn
1630 if(is.null(Tn))
1631 {
1632 Table4[1,4] <- Ths;
1633 Table4[1,5] <- FDRcutoff;
1634 }
1635 if(!is.null(Tn))
1636 {
1637 Table4[1,4] <- Ths[1,1];
1638 Table4[1,5] <- Ths[2,1];
1639 }
1640
1641 Table4 <- as.table(Table4);
1642 # table2html(Table4, filename = "GCRMA_Genes_Filtered_from_USER_Defined_Parameters.html", title = "Number of genes filtered from data according to user-defined parameters: [Fold Change] (And/Or) [t-test (p-value)] (And/Or) [FDR (BH method)]",table.head = "Normalization Method: GCRMA",disp ="file");
1643
1644 # #
1645
1646 GCRMA_Table4 <- Table4;
1647
1648 # #
1649
1650
1651 # ######################################### GCRMA Normalization report ends #####################
1652 # ###############################################################################################
1653 # ###############################################################################################
1654 # ###############################################################################################
1655 # ###############################################################################################
1656 # ######################################## Generating ArrayQuest_Report starts ##################
1657
1658 H1 <- 0;
1659 H1 <- matrix(H1,nrow = 1,ncol = 1);
1660
1661 H1[1,1] <- c("ArrayQuest Report");
1662 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1663
1664 H1[1,1] <- c(" ");
1665 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1666
1667
1668 # # Table 1 heading ####################################################
1669 H1[1,1] <- c("Number of genes filtered from data using a single parameter test: Fold Change or t-test (p-value) or FDR (BH method)");
1670 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1671 # #############
1672
1673 H1[1,1] <- c("Normalization method: RMA");
1674 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1675 # #
1676 write.table(RMA_Table1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1677 # #
1678 H1[1,1] <- c("Normalization method: MAS5");
1679 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1680 # #
1681 write.table(MAS5_Table1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1682 # #
1683 H1[1,1] <- c("Normalization method: GCRMA");
1684 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1685 # #
1686 write.table(GCRMA_Table1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1687
1688 # ########################################################################
1689
1690 H1[1,1] <- c(" ");
1691 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1692
1693 # # Table 3 heading ####################################################
1694 H1[1,1] <- c("Number of genes filtered from data using combined tests: FDR (BH method) and Fold Change");
1695 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1696 # #############
1697 H1[1,1] <- c("Normalization method: RMA");
1698 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1699 # #
1700 write.table(RMA_Table3, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1701 # #
1702 H1[1,1] <- c("Normalization method: MAS5");
1703 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1704 # #
1705 write.table(MAS5_Table3, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1706 # #
1707 H1[1,1] <- c("Normalization method: GCRMA");
1708 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1709 # #
1710 write.table(GCRMA_Table3, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1711
1712 # ########################################################################
1713 H1[1,1] <- c(" ");
1714 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1715
1716 # # Table 2 heading ####################################################
1717 H1[1,1] <- c("Number of genes filtered from data using combined tests: t-test (p-value) and Fold Change");
1718 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1719 # #############
1720
1721 H1[1,1] <- c("Normalization method: RMA");
1722 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1723 # #
1724 write.table(RMA_Table2, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1725 # #
1726 H1[1,1] <- c("Normalization method: MAS5");
1727 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1728 # #
1729 write.table(MAS5_Table2, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1730 # #
1731 H1[1,1] <- c("Normalization method: GCRMA");
1732 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1733 # #
1734 write.table(GCRMA_Table2, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1735
1736 # ########################################################################
1737 H1[1,1] <- c(" ");
1738 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1739
1740
1741 # # Table 4 heading ####################################################
1742 # H1[1,1] <- c("Number of genes filtered from data according to user-defined parameters: [Fold Change] (And/Or) [t-test (p-value)] (And/Or) [FDR (BH method)]");
1743 # write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE,col.names = FALSE, qmethod = c("escape", "double"));
1744 # #############
1745 # H1[1,1] <- c("Normalization method: RMA");
1746 # write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1747 # #
1748 # write.table(RMA_Table4, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1749 # #
1750 # H1[1,1] <- c("Normalization method: MAS5");
1751 # write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1752 # #
1753 # write.table(MAS5_Table4, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1754 # #
1755 # H1[1,1] <- c("Normalization method: GCRMA");
1756 # write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1757 # #
1758 # write.table(GCRMA_Table4, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"));
1759
1760 # ########################################################################
1761 H1[1,1] <- c(" ");
1762 write.table(H1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1763
1764 # ###
1765 Ft1 <- 0;
1766 Ft1 <- matrix(Ft1,nrow = 1,ncol = 1);
1767 Ft1[1,1] <- c("This report was generated using an ArrayQuest method authored by Saurin D. Jani, Jeremy L. Barth and W. Scott Argraves. MUSC");
1768 write.table(Ft1, file = "ArrayQuest_Analysis_Report.xls", append = TRUE, quote = TRUE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"));
1769
1770 # ######################################## Generating ArrayQuest_Report ends #####################
1771
1772
1773 # End of script
|
| 17 | Saurin D. Jani | MUSC DNA Microarray DB; .CEL; |
Assess gene expression associated with a specified Gene Ontology ID tag(s): .CEL files only: Heatmap output Description: Description of Method No. 17This method analyzes Affymetrix GeneChip data to find gene expression values associated with specified Gene Ontology IDs (GOIDs). The script normalizes GeneChip hybridization data (RMA), extracts hybridization values for genes associated with a user-provided GOID, performs hierarchical clustering and renders a heat map of the expression profiles. This method can be used for analysis involving any number of .CEL files. Bioconductor and R packages used in the development of this method:
Input or Requirements:
Script ContributorAuthor: Saurin D. Jani, M.S. Address: Cell Biology and Anatomy Department Medical University of South Carolina, Charleston, SC 29425, USA Email: jani@musc.edu Tel.: 843-792-1340 Description of Analysis StepsNormalized hybridization intensities are generated using Robust Multichip Average (RMA; Bolstad et al., 2003; Rafael et al., 2003; Irizarry et al., 2000) and R/Biocondcutor package "affy". Box plots and histograms are created for hybridization values before and after normalization. OutputThe method outputs the following results:
Setting Parameters for Method 17Method 17 accepts only .CEL files of Affymetrix GeneChip. Parameters for executing this method are regulated by the following lines of text. This text is present in "Method parameter box" at Step 4 (Choose Data Set for Analysis) of the ArrayQuest.
This text is revealed by selecting the Parameters checkbox adjacent to Method 17. Lines 2 pertain to the samples that will be analyzed. These lines must be edited to correctly specify the number and identity of samples as they are ordered .CEL files. The Line 3 of the method parameters includes GO id(s). Please see, Gene Ontology website for more informatoin on GO Id(s).
This page and all pages on this site are copyrighted (2006) Gary Argraves. This page may not be reproduced in whole or in part without the express permission of the author. -Notes for Paramaeter Code AND Parameter File List Entries-------------
1. To run this analysis method the user must manually enter sample
name aliases after the into the equal sign in the Parameter Code section
(each separated by a comma). The order of sample filename aliases listed in the Parameter Code section
must correspond with the order of actual filenames listed in
the File Parameter List.
2. DO NOT MODIFY "-write" lines
3. The users must manually enter desired GOids into the 3rd line of the
"Parameter Code" after the equal sign. Follow the format of the
GOids listed as examples.
4. The user may choose any alias to describe the corresponding
filename. The use of the terms 'sample' in the example below is
arbitrary. Simply replace the alias names after the = sign with
alias names of your choice.
5. The user must group the actual filenames in the File Parameter
List so that order of all your Groups' match with files' entered
by you in file parameter list.
Parameter File List Entry:
6. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script17_samples.txt'
Group Sample Names (use filename aliases) = control1, control2, control3, exp1, exp2, exp3
GOids (Example: enter GO:123456, GO:123456, GO:123456 after equal sign) =
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# List the actual filenames below (each on a separate line) SEE Note #6 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla this is a key telling 'pgrd' how to launch this script
2
3 # ------------------------------------------------------------------------------
4 # Method Title: Assess gene expression associated with a specified Gene Ontology
5 # ID tag(s): .CEL files only: Heatmap output
6 # Method
7 # Description : This method analyzes Affymetrix GeneChip data to find gene
8 # expression values associated with specified Gene Ontology IDs
9 # (GOIDs). The script normalizes GeneChip hybridization data(RMA),
10 # extracts hybridization values for genes associated with a
11 # user-provided GOID, performs hierarchical clustering and
12 # renders a heat map of the expression profiles.
13 # -------------------------------------------------------------------------------
14 # This method uses the following:
15 # Bioconductor
16 # packages : Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,
17 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
18 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
19 # prada, siggenes, stam, multtest
20 # For more information on Bioconductor packages:
21 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
22 # R
23 # contributed
24 # packages : cluster, stats, XML
25 # For more information on R contributed packages:
26 # http://cran.r-project.org/src/contrib/PACKAGES.html
27 #
28 # Biocondutor
29 # metadata
30 # packages : Annotation, CDF and Probe packages
31 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
32 # ------------------------------------------------------------------------------
33 # Author : Suarin Jani, M.S.
34 # Contact : Department of Cell Biology and Anatomy
35 # Medical University of South Carolina
36 # Charleston, SC - 29425, USA
37 # Email : jani@musc.edu
38 # Tel. : (843)-792-5483
39 # -------------------------------------------------------------------------------
40
41 # Load Bioconductor Libraries from R library direcotry
42 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
43 source("BioCLibs1.R");
44 system("rm BioCLibs1.R")
45
46 # Loading Basic Functions library from ArrayQuest Library directory
47 system("scp $centerHost:/home/powerGene/lib/R_script/BasicFuncs.R .");
48 source("BasicFuncs.R");
49 system("rm BasicFuncs.R");
50
51 # Load Error Library from R library direcotry
52 system("scp $centerHost:/home/powerGene/lib/R_script/errorLib.R .");
53 # system("cp /saurin/powerGene/PG_Libs/errorLib.R .");
54 source("errorLib.R");
55 system("rm errorLib.R");
56
57 # Pre-processing of microarray data - Starts.
58 # ReadAffy() function from Bioconductor reads all .CEL files that exist in the
59 # current method's working directory.
60 # Below shown "paramFilenames.txt" file is generated on ArrayQuest which has all
61 # .CEL file names as entered by User in order
62 # to copy those files to the method's working directory.
63 my.filenames <- readLines("paramFilenames.txt");
64 parFile <- "paramFilenames.txt";
65
66 # my.filenames <- list.files(,"CEL"); # my.filenames will have all file names, which has ".CEL" extention
67 f <- as.matrix(my.filenames);
68 n <- nrow(f); # geting number of CEL files in user directroy and storing in to n
69
70 # Check if CEL file does not exist (i.e.,n == 0), generate error message and quit
71 # R session. "errors.txt" will be generated as shown below.
72 if(n == 0)
73 {
74 msg <- "Error File";
75 msg <- "This Method requires Affymetrix GeneChip .CEL files ONLY. Use .CEL files";
76 print(msg);
77 Errmsg <- as.matrix(msg);
78 write(Errmsg,"errors.txt");
79 q("no",1); # script will quit from R
80 }
81
82 # Make empty Phenodata and read .CEL files into "abatch" object
83 myf <- my.filenames;
84 # my.pheno <- read.phenoData(sampleNames = my.filenames);
85
86 my.pheno <- read.AnnotatedDataFrame(parFile,sep = "\n", header = FALSE);
87 abatch <- read.affybatch(filenames = my.filenames,phenoData = my.pheno);
88
89 # Input parameter parsing - Starts.
90 # Based on input parameters modify existing Pheno DATA Object for microarray
91 # experiment and attach to "abatch" .CEL file object
92 # Below shown "script17_samples.txt" file has all parameter information
93 # entered by User
94 #
95 # -write 'file:script17_samples.txt'
96 # Sample order = control1,control2,control3,exp1,exp2,exp3
97 # GOids = GO:123456,GO:123456,GO:123456
98 # -write 'end'
99 s1 <- readLines("script17_samples.txt");
100 smp1 <- as.matrix(unlist(strsplit(s1[1], "=",fixed = TRUE)));
101 smp1[2] <- trimWhiteSpace(smp1[2]);
102 smpX <- as.matrix(unlist(strsplit(smp1[2], ",",fixed = TRUE)));
103 smpC <- as.character(smpX);
104
105 # Generate errror file, if input files does not match with sample names provided
106 totalfiles <- nrow(smpX);
107 if(totalfiles != n)
108 {
109 msg <- paste(getErrMsgStr(1),"\n",getMsgStr(2),n,"\n",getMsgStr(5),totalfiles,"\n");
110 print(msg);
111 Errmsg <- as.matrix(msg);
112 write(Errmsg,"errors.txt");
113 q("no",1); # script will quit from R
114 }
115
116 sampleNames(abatch)<- as.character(smpC);
117 samples <- sampleNames(abatch);
118
119
120 # generating Quality control Report of affy batch data, raw data. object is abatch
121
122 # QCReport(abatch,file = "Affymetrix_files_Quality_Control_Report.pdf");
123
124
125 # Parse GO ID(s) from parameter file
126 goids <- as.matrix(unlist(strsplit(s1[2], "=",fixed = TRUE)));
127 goids[2] <- trimWhiteSpace(goids[2]);
128 goidsX <- as.matrix(unlist(strsplit(goids[2], ",",fixed = TRUE)));
129 goidsC <- as.character(goidsX);
130 goidsX <- as.matrix(goidsC);
131 # Input parameter parsing - Ends.
132
133 # After reading .CEL files, annotation information (which Affymetrix GeneChip used) is
134 # extracted from .CEL files and store in "chiptype" variable
135 chiptype <- annotation(abatch); # getting chip environment from raw batch data
136
137 # Based on Affymetrix GeneChip, loading Affymetrix Environment Libraries which
138 # is used for annotations of Affymetrix Probe IDs
139 system("scp $centerHost:/home/powerGene/lib/R_script/AffyChipTyAnnotateEnv.R .");
140 source("AffyChipTyAnnotateEnv.R");
141 system("rm AffyChipTyAnnotateEnv.R");
142
143 # Generating histogram of Affymetrix raw data - Before Normalization
144 # jpeg(filename="Histogram_before_Norm_affydata.jpeg",width=1000,height=1000);
145 # pops <- pData(abatch)[,1] + 1;
146 # hist(abatch,col=pops,type = "l", main = " ");
147 # title(main = " Histogram before normalization - Control vs Experimental " );
148 # dev.off(); # this function saves file names above jpeg function
149
150 # Generating boxplot of Affymetrix raw data - Before Normalization
151 # jpeg(filename="Boxplot_Before_Norm_affydata.jpeg",width=1000,height=1000);
152 # boxplot(abatch,col=c(1:n),names = samples);
153 # title(main = " Box plot before normalization - Control vs Experimental" );
154 # dev.off(); # this function saves file names aboeve jpeg function
155
156 # Normalization of Affymetrix Raw data using RMA process - Starts.
157 # RMA function performs normalization, background correction and calculating
158 # expression values for affymetrix .CEL "abatch" object
159 myRMA<- rma(abatch);
160 # this function is fast and which reads only CEL file exists in current user dir.
161 MatrixRMA <- exprs(myRMA); # converting this expression object to Matrix for future use
162 write.exprs(myRMA,file="Expression_data_after_RMA_normalization_for_all_genes.xls");
163 # expres2excle() function generates Excel file
164 # Normalization of Affymetrix Raw data using RMA process - Ends.
165
166 # Generate MAS5 - P (presence) and A (absence) calls
167 # PAcalls <- mas5calls(abatch);
168 # write.exprs(PAcalls,file="Expression_data_after_MAS5_normalization_for_all_genes.xls");
169 # expres2excle() function generates Excel file and writes MAS5 output
170
171 abatch.norm <- normalize(abatch); # this step is for displaying normalization graph to user
172 # Above step is for displaying normalization graph to user
173
174 # Generating histogram of affymatrix raw data - After Normalization
175 # jpeg(filename="Histogram_After_RMA_Norm_affydata.jpeg",width=1000,height=1000);
176 # pops <- pData(abatch)[,1] + 1;
177 # hist(abatch.norm,col=pops,type = "l", main = " ");
178 # title(main = " Histogram after normalization - Control vs Experimental" );
179 # dev.off(); # this function saves file names aboeve jpeg function
180
181 # Generating boxplot of Affymetrix raw data - After Normalization
182 # jpeg(filename="Boxplot_After_Normalization_affydata.jpeg",width=1000,height=1000);
183 # boxplot(abatch.norm,col=c(1:n),names = samples);
184 # title(main = " Box plot after normalization - Control vs Experimental" );
185 # dev.off(); # this function saves file names aboeve jpeg function
186 # Pre-processing of microarray data - Ends.
187
188
189 # Generating Gene Ontology expression profile using heatmap
190
191 goL <- nrow(goidsX); # read line of GO Ids
192 for(i in 1:goL)
193 {
194 goid <- goidsX[i];
195 # get all probes on GeneChip which has GO ID(s) associated with them
196 ChipsGO_with_Probes <- as.list(envGO2AllPROBES);
197 a <- ChipsGO_with_Probes[goid]; # get probes which as GO ID enterd by User
198 go_probes <- as.character(unlist(a));
199 goCheck <- 0;
200 goCheck <- nrow(as.matrix(go_probes));
201 if(goCheck <=2)
202 {
203 msg <- c("Error: ","The GO Term(s) that you entered in parameter box are either incorrect or have two or less genes associated with them.","To resolve this problem go to geneontology.org","and find the parent GO term to the one you have used and then enter","the parent term in parameter box in place of the previous term.");
204 print(msg);
205 Errmsg <- as.matrix(msg);
206 write(Errmsg,"errors.txt");
207 q("no",1); # script will quit from R
208 }
209
210 # generate heatmap of those probes, using RMA normalized expression set
211 MapW = 1000;
212 MapH = 1000;
213 MapMargin = 50;
214 gNum <- 0;
215
216 # if(!(gSymb <- unique(getSYMBOL(go_probes,chiptype))))
217 # {
218 # msg <- c("Error: ","GO Term ID:",goid, " can not be found. Replace with parent GO Term ID.");
219 # print(msg);
220 # Errmsg <- as.matrix(msg);
221 # write(Errmsg,"errors.txt");
222 # q("no",1); # script will quit from R
223 # }
224
225 gSymb <- unique(getSYMBOL(go_probes,chiptype));
226 gNum <- nrow(as.matrix(gSymb));
227
228 if(gNum <=2)
229 {
230 msg <- c("Error: ","GO Term needs two or more Genes to generate heatmap. Replace with parent GO Term ID to get more genes.");
231 print(msg);
232 Errmsg <- as.matrix(msg);
233 write(Errmsg,"errors.txt");
234 q("no",1); # script will quit from R
235 }
236
237 if (gNum <= 50)
238 {
239 MapW = 1000;
240 MapH = 1000;
241 MapMargin = 50;
242 }
243 if (gNum > 50 && gNum <= 100)
244 {
245 MapW = 1500;
246 MapH = 1500;
247 MapMargin = 75;
248 }
249 if (gNum > 100 && gNum <= 150)
250 {
251 MapW = 2000;
252 MapH = 2000;
253 MapMargin = 100;
254 }
255 if (gNum > 150 && gNum <= 200)
256 {
257 MapW = 3000;
258 MapH = 3000;
259 MapMargin = 160;
260 }
261 if (gNum > 200 && gNum <= 400)
262 {
263 MapW = 4000;
264 MapH = 4000;
265 MapMargin = 210;
266 }
267 if (gNum > 400 && gNum <= 800)
268 {
269 MapW = 6000;
270 MapH = 6000;
271 MapMargin = 350;
272 }
273 if (gNum > 800 )
274 {
275 MapW = 8000;
276 MapH = 8000;
277 MapMargin = 410;
278 }
279 goterm <- as.character(unlist(getGOTerm(goid)));
280 Hsub <- paste("Heatmap:",goid,":",goterm,"[Red=High, Blue=Low, White=Medium] ","Number of Genes:",gNum,date(),sep = " ");
281 goname <- as.character(trimWhiteSpace(goterm));
282
283 print(goid);
284 print(goname);
285 print("-----------");
286 goname <- sub(":", "_", goname,extended = TRUE, perl = TRUE);
287 goname <- sub("/", "_", goname,extended = TRUE, perl = TRUE);
288 goname <- sub("-", "_", goname,extended = TRUE, perl = TRUE);
289 # goname <- sub(" " , "_", goname,extended = TRUE, perl = TRUE);
290 # goname <- sub(" " , "_", goname,extended = TRUE, perl = TRUE);
291 # goname <- sub(" " , "_", goname,extended = TRUE, perl = TRUE);
292 # goname <- sub("-", "_", goname,extended = TRUE, perl = TRUE);
293
294 heatgoFile <- paste(goname,"GO.jpeg",sep = "_");
295 HeatmapBreakFUN(go_probes,heatgoFile,Hsub);
296 } # for loop ends here
297
298 # End of script
|
| 23 | Saurin D. Jani | MUSC DNA Microarray DB; .CEL;.DAT |
GCRMA Normalization and Calculation of MAS5.0 Detection Calls for Affymetrix GeneChips: .CEL files only: Expression intensity and MAS5 A/P call output Description: Description of Method No. 23 Bioconductor and R packages used in this method:
Input or Requirements:
Script Contributor:
Output or Result FilesThis method will create the following five output files. One Microsoft Excel file of normalized intensities transformed into log base 2 for all genes and four JPEG files of box plots and histograms of expression intensities before and after normalization.
Setting Parameters for Method 23Parameters for executing this method are regulated by the follwoing lines of text. This text is present in "Method parameter box" at Step 4 (Choose Data Set for Analysis). User can modify method paramter box according to their datasets. Line 1: -write 'file:script23_samples.txt'
Copyright Notice This page and all pages on this site are copyrighted (2006) by Gary Argraves. This page may not be reproduced in whole or in part without the express permission of the author. -Notes for Paramaeter Code AND Parameter File List Entries------------------
1. To run this analysis method the user must manually enter sample names
'aliases' into the 2nd line of the "Parameter Code" after the equal
sign (each entry separated by a comma). The order of sample
filename aliases listed in the Parameter Code section must correspond
with the order of actual filenames listed in the File Parameter List.
2. The user may choose any alias to describe the corresponding filename.
The use of the terms 'control' and 'exp' in the example below
is arbitrary. Simply replace the alias names after the = sign with
alias names of your choice.
3. DO NOT MODIFY "-write" lines
Parameter File List Entry:
4. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script23_samples.txt'
Group Sample Names (filename alias) = control1, control2, control3, exp1, exp2, exp3
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# List the actual filenames below (each on a separate line) SEE Note #4 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla This is the key word to instruct 'pgrd' to launch this script.
2
3 # -------------------------------------------------------------------------------
4 # Method Title : GCRMA Normalization of Affymetrix Data:.CEL files only:
5 # Expression intensity output
6 # Method
7 # Description : This method performs Robust Multiple Chip Analysis (RMA)
8 # and generates normalized expression intensities for any number
9 # of Affymetrix GeneChip data sets. The outputs of this method are
10 # an Excel file containing normalized expression intensity values
11 # as well as Box plots and histograms of the data before and after
12 # normalization.
13 # -------------------------------------------------------------------------------
14 # This method uses the following R/BioC programs:
15 # Bioconductor
16 # packages : Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,
17 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
18 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
19 # prada, siggenes, stam, multtest
20 # For more information on Bioconductor packages:
21 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
22 # R
23 # contributed
24 # packages : cluster, stats, XML
25 # For more information on R contributed packages:
26 # http://cran.r-project.org/src/contrib/PACKAGES.html
27 #
28 # Biocondutor
29 # metadata
30 # packages : Annotation, CDF and Probe packages
31 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
32
33 # -------------------------------------------------------------------------------
34 # Author : Suarin Jani, M.S.
35 # Contact : Department of Cell Biology and Anatomy
36 # Medical University of South Carolina
37 # Charleston, SC - 29425, USA
38 # Email : jani@musc.edu
39 # Tel. : (843)-792-1340
40 # -------------------------------------------------------------------------------
41
42 # ####################################################################################
43 # Load Bioconductor Libraries from R library direcotry
44 # ####################################################################################
45 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
46 source("BioCLibs1.R");
47 system("rm BioCLibs1.R");
48 # ####################################################################################
49 # Load Error Library from R library direcotry
50 # ####################################################################################
51 system("scp $centerHost:/home/powerGene/lib/R_script/errorLib.R .");
52 # system("cp /saurin/powerGene/PG_Libs/errorLib.R .");
53 source("errorLib.R");
54 system("rm errorLib.R");
55
56 # ####################################################################################
57 # Reading CEL files
58 # ####################################################################################
59 # ReadAffy() function from Bioconductor reads all .CEL files that exist in the
60 # current method's working directory.
61 # Below shown "paramFilenames.txt" file is generated on ArrayQuest which has
62 # all .CEL file names as entered by User in order
63 # to copy those files to the method's working directory.
64
65 my.filenames <- readLines("paramFilenames.txt");
66 parFile <- "paramFilenames.txt";
67
68 # my.filenames <- list.files(,"CEL"); # user can use this commnad for
69 # local analysis where my.filenames has all file names that have ".CEL" extention.
70 f <- as.matrix(my.filenames);
71 n <- nrow(f); # get the number of CEL files in the working directory.
72
73 # Check if CEL file does not exist (i.e.,n == 0), generate error message and
74 # quit R session. errors.txt will be generated as shown below.
75 if (n == 0)
76 { msg <- "Error File";
77 msg <- "This Method requires Affymetrix GeneChip .CEL files. Use .CEL files.";
78 print(msg);
79 Errmsg <- as.matrix(msg);
80 write(Errmsg,"errors.txt");
81 q("no",1); # script will quit from R
82 }
83
84 # Make empty Phenodata and read .CEL files into "abatch" object
85 myf <- my.filenames;
86 # my.pheno <- read.phenoData(sampleNames = my.filenames);
87 my.pheno <- read.AnnotatedDataFrame(parFile,sep = "\n", header = FALSE);
88 abatch <- read.affybatch(filenames = my.filenames,phenoData = my.pheno);
89
90 # ####################################################################################
91 # Input parameter parsing - Starts.
92 # ####################################################################################
93 # Based on input parameters modify existing Pheno DATA Object for microarray
94 # experiment and attach to "abatch" .CEL file object
95 # Below shown "script23_samples.txt" file has all parameter information entered by User
96
97 s1 <- readLines("script23_samples.txt");
98 smp1 <- as.matrix(unlist(strsplit(s1[1], "=",fixed = TRUE)));
99 smp1[2] <- trimWhiteSpace(smp1[2]);
100 smpX <- as.matrix(unlist(strsplit(smp1[2], ",",fixed = TRUE)));
101
102 # ---------------------------------------------------------
103 smpX;
104 smpC <- as.character(smpX);
105 smpC;
106 # Removes white spaces from sample names
107 for(c in 1:n){ smpX[c] <- trimWhiteSpace(smpX[c]); }
108 # --------------
109 smpX;
110 smpC <- as.character(smpX);
111 smpC;
112 # ---------------------------------------------------------
113
114 totalfiles <- nrow(smpX);
115
116 # Generate errror file, if input files does not match with sample names provided
117
118 if(totalfiles != n)
119 {
120 msg <- paste(geterror(1),"\n",geterror(2),n,"\n",geterror(5),totalfiles,"\n");
121 print(msg);
122 Errmsg <- as.matrix(msg);
123 write(Errmsg,"errors.txt");
124 q("no"); # script will quit from R
125 }
126
127
128 sampleNames(abatch)<- as.character(smpC);
129 samples <- sampleNames(abatch);
130
131 # ------------------------------------------------------------------
132 # ---- Adding pheno data ------------
133 # ------------------------------------------------------------------
134 cl <- numeric();
135 cl <- c(cl,rep(1,n));
136 pheno = factor(as.character((cl)));
137 pheno;
138 tmp <- pData(abatch);
139 tmp <- cbind(tmp, cl = pheno);
140 pData(abatch) <- tmp;
141 abatch;
142 pData(abatch);
143
144
145 # ####################################################################################
146 # Generating Quality control Report of affy batch data, raw data. object is abatch
147 # ####################################################################################
148
149 QCReport(abatch,file = "ArrayQuest_Affymetrix_GeneChips_QC_Report.pdf");
150
151
152 # ####################################################################################
153 # Input parameter parsing - Ends.
154 # ####################################################################################
155 # After reading .CEL files, annotation information (which Affymetrix GeneChip used) is
156 # extracted from .CEL files and stored in "chiptype" variable
157 chiptype <- annotation(abatch);
158 chiptype;
159 # Based on Affymetrix GeneChip, loading Affymetrix Environment Libraries which is
160 # used for annotations of Affymetrix Probe IDs
161 system("scp $centerHost:/home/powerGene/lib/R_script/AffyChipTyAnnotateEnv.R .");
162 source("AffyChipTyAnnotateEnv.R");
163 system("rm AffyChipTyAnnotateEnv.R");
164 # ####################################################################################
165 # Normalization of Affymetrix Raw data using GCRMA process
166 # ####################################################################################
167 # GCRMA function performs normalization, background correction and calculating
168 # expression values for affymetrix .CEL "abatch" object.
169
170 myRMA<- gcrma(abatch);
171 MatrixRMA <- exprs(myRMA); # converting expression object to Matrix
172 class(MatrixRMA); # displaying class of eset object
173 write.exprs(myRMA,file="Expression_data_after_GCRMA_normalization_for_all_genes.xls");
174
175 # ####################################################################################
176 # Normalization : MAS5 starts
177 # ####################################################################################
178
179 myMAS5<- mas5(abatch); # this function is fast and which reads only CEL file exists in current user dir.
180 MatrixMAS5 <- exprs(myMAS5); # converting this expression object to Matrix for future use
181
182 # ####################################################################################
183 # Generate MAS5 - P (presence) and A (absence) calls
184 # ####################################################################################
185
186 MASPACalls <- mas5calls(abatch);
187 write.exprs(MASPACalls,file="Expression_data_PMA_Calls_After_MAS5_Normalization_for_All_Genes.xls");
188
189 MatrixMASPACalls <- exprs(MASPACalls);
190
191 # ####################################################################################
192 # Generate Combination file of GCRMA and MAS5 PMA calls : All Genes
193 # ####################################################################################
194
195 CombA <- character();
196 CombB <- character();
197
198 for (i in 1:n)
199 {
200 CombA <- cbind(CombA,as.matrix(as.numeric(MatrixRMA[,i])),as.matrix(as.character(MatrixMASPACalls[,i])));
201 # old code CombB <- c(CombB,rep(colnames(MatrixRMA)[i],2))
202 CombB <- c(CombB,paste(colnames(MatrixRMA)[i],"GCRMA"),paste(colnames(MatrixRMA)[i],"MAS5"));
203 }
204
205 rownames(CombA) <- rownames(MatrixRMA);
206 colnames(CombA) <- CombB;
207 CombA <- as.matrix(CombA);
208
209 All_rows <- rownames(MatrixRMA);
210 NewC <- as.matrix(All_rows);
211 colnames(NewC) <- c("Probe IDs");
212 NewC;
213
214 NewCombX <- cbind(NewC,CombA);
215 write.table(NewCombX,file="Expression_data_Combination_of_GCRMA_and_MAS5_PMA_Calls.xls", append = TRUE, quote = TRUE, sep = "\t",eol = "\n", na = "NA", dec = ".", row.names = FALSE,col.names = TRUE, qmethod = c("escape", "double"));
216
217 # End of script
|
| 24 | Saurin D. Jani | MUSC DNA Microarray DB; .CEL; |
Find differentially expressed genes based on fold-change, p-value and/or FDR parameters: GCRMA normalization: .CEL files only: ClassA vs ClassB Description: Description of Method No. 24This method is used to analyze Affymetrix DNA microarray data that can be obtained from MUSC DNA microarray database or uploaded to the ArrayQuest web-server into a private database. This method is limited to analysis of a two condition experiment (e.g., control versus experimental or wild-type versus mutant). The method normalizes hybridization data, finds differentially expressed genes based on fold-change, t-test and FDR thresholds, collects annotations for these genes, performs hierarchical clustering of the gene expression profiles and renders a heat map of the expression profiles. A requirement is that there are two or more replicates for each condition. Bioconductor and R packages used in the development of this method:
Input or Requirements:
Script Contributor:Author: Saurin D. Jani, M.S. Description of Analysis Steps
OutputThe method outputs the following results:
Setting Parameters for Method 24Parameters for executing this method are regulated by the following lines of text. This text is present in Method parameter box" at Step 4 (Choose Dataset for Analysis) of the ArrayQuest. Line 1: -write 'file:script24_samples.txt'
Copyright Notice This page and all pages on this site are copyrighted (2006) by Gary Argraves. This page may not be reproduced in whole or in part without the express permission of the author. -Notes for Paramaeter Code AND Parameter File List Entries--------------------------------------
1. To run this analysis method the user must manually enter sample
name aliases after the into the equal sign in the Parameter Code section
(each separated by a comma). The order of sample filename aliases listed in the Parameter Code section
must correspond with the order of actual filenames listed in
the File Parameter List. Only two groups can be specified.
2. DO NOT MODIFY "-write" lines
3. The user may choose any alias to describe the corresponding
filename. The use of the terms 'sample' in the example below is
arbitrary. Simply replace the alias names after the = sign with
alias names of your choice.
4. The user must group the actual filenames in the File Parameter
List so that order of all your Groups' match with files' entered
by you in file parameter list.
5. The user may manually modify the filter settings for False Discovery Rate(FDR).
Insert an X for those filters that you do not want to apply to the analysis.
Parameter File List Entry:
6. To list the actual filenames below press either the "View Data Sets for
this Project" button OR the "View Data Sets for this Analysis Process"
button above. After pressing one of these buttons, all data sets will
be displayed. You may then select a data set by clicking the 'No.'
buttons next to a given dataset, and it will be added to the parameter
box. If you would like to add individual files from your Dataset, copy
and paste individual file names each including the "-file" extension
(i.e., -file 'filename').
-Notes 'end'
------------------------------------------------------------------------
Parameter Code:
-write 'file:script24_samples.txt'
Group A Sample Names (filename alias) = control1, control2, control3
Group B Sample Names (filename alias) = exp1, exp2, exp3
P-value = 0.01
False discovery rate = 0.01
Fold change = X
-write 'end'
------------------------------------------------------------------------
File Parameter List:
# List the actual filenames below (each on a separate line) SEE Note #6 above. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
1 # !R --vanilla this is a key telling 'pgrd' how to launch this script
2
3 # ------------------------------------------------------------------------------
4 # Method Title: Determine the number of differentially expressed genes based on
5 # p-value, fold-change and/or FDR parameters:.CEL files only:
6 # Comprehensive output
7 # Method
8 # Description : This method can be used to analyze data from any two-condition
9 # microarray experiment (e.g., control and experimental or
10 # wild-type and mutant). A requirement is that there are two or mor
11 # replicates for each condition. The algorithm finds differentially
12 # expressed genes, collects annotations for these genes, performs
13 # hierarchical clustering of the gene expression profiles and renders
14 # a heatmap of the expression profiles.
15 # -------------------------------------------------------------------------------
16 # This method uses the following:
17 # Bioconductor
18 # packages : Biobase, affy, affyPLM, annaffy, AnnBuilder, annotate,gcrma
19 # simpleaffy, makecdfenv, marray, genefilter, GOstats,
20 # vsn, globaltest, geneplotter, graph, Rgraphviz, RBGL,
21 # prada, siggenes, stam, multtest
22 # For more information on Bioconductor packages:
23 # http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/
24 # R
25 # contributed
26 # packages : cluster, stats, XML
27 # For more information on R contributed packages:
28 # http://cran.r-project.org/src/contrib/PACKAGES.html
29 #
30 # Biocondutor
31 # metadata
32 # packages : Annotation, CDF and Probe packages
33 # http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
34 # -------------------------------------------------------------------------------
35 # Author : Suarin Jani, M.S.
36 # Contact : Department of Cell Biology and Anatomy
37 # Medical University of South Carolina
38 # Charleston, SC 29425, USA
39 # Email : jani@musc.edu
40 # Tel. : (843)-792-5483
41 # -------------------------------------------------------------------------------
42
43 # Load Bioconductor Libraries from R library direcotry
44 system("scp $centerHost:/home/powerGene/lib/R_script/BioCLibs1.R .");
45 # system("cp /saurin/powerGene/PG_Libs/BioCLibs1.R .");
46 source("BioCLibs1.R");
47 system("rm BioCLibs1.R");
48
49 # Load Error Library from R library direcotry
50
51 system( |


Bottom