FAQ is wrong about "the microarray paper": No publication describes actual microarray data to be analyzed   Earl F. Glynn 06/Dec/2005

I summarized this and other information about the CAMDA '06 datasets here:

This answer from the existing FAQ is not correct:

Question: Why are there only 177 subjects
that have microarray data while there are
227 subjects with clinical data.

Answer: Details of the microarray data
acquisition can be found in the microarray
paper in the publications zip file.

After much confusion trying to connect the microarray descriptions in the CAMDA '06 publications to the CAMDA '06 microarray data, I exchanged several E-mails with Suzanne Vernon (sdv), Centers for Disease Control and Prevention:

12/6/2005 E-mail:
efg: "If I'm understanding you correctly (and from what I've observed in the papers), there is no "the microarray paper," and none of the CAMDA papers describe details of the microarray data provided on the CAMDA site."

sdv: "Correct, there is currently no microarray paper from this study/dataset. We are excited and hopeful that CAMDA comes up with some interesting papers/results."

efg comment: The CAMDA '06 publications ZIP contains PDFs of a number of papers that provide background information about Chronic Fatigue Syndrome.

sdv: "but there are none (yet) that have come out with the microarray, genetic, proteomic data" (sdv, 12/4/2005). Some of the information does describe various clinical health assessment surveys.

sdv, 12/1/2005:

"The paper [Whistler, et al, 'Integration of gene expression, ...'] is only provided as background information on the illness and to demonstrate that gene expression profiling of the peripheral blood has utility in describing the heterogeneity of this illness."

"The microarrays we used were from a company called MWG Biotech, the 40K microarray which consists of 2 glass slides (A and B), each with 20,000 features. We provided only gene expression data for the A microarray only. The spreadsheet of csv file have microarrays/samples in columns and features/genes in rows. The data is raw (so [you] can transform, normalize, ...) so there does not seem to be any real good reason to go back to the actual images."

"There are many challenges to this particular dataset; including determining ways to integrate the various types of data to derive new insight into the biology, and new computational approaches for dealing with signal/noise issues in various data types."

sdv, 12/4/2005:

"Each data table has the subjects identified with and ABTID. There were 227 people/subjects eligible for the study. Of these 227 people, only 177 samples from these people gave satisfactory microarray results."

"This is the first study where [we] used the MWG 40K, which was the most recent and comprehensive array. We only used the A slide from this microarray set as the B slide failed all our qc measures. Each subject's blood sample, as identified by ABTID, is hybridized to a microarray slide. This is similar to the Affymetrix approach, one chip:one subject. The detection was not with fluorescence, as Affy uses, but with a resonance light scattering, resulting in a single intensity readout. The microarray slides are imaged with a CCD camera - versus a laser - to generate the tiff images you have on your website. To read more about this approach, see The image is acquired and intensities quantified using ArrayVision (can also find a link to this software at this site). Once the intensities are captured, we use the csv to import into various analysis tools."

"this data is about as new and fresh to us as it is to you"

efg comment:
Files at MWG Biotech (specifically the Excel file or the TXT file) both have the expected number of probes, 20,160, as found in the CFS microarray datasets, and information identifying the probes.

