Best Presentation
Preconference Seminar
Call for Papers
Call for Posters
Contest Datasets
Important Dates
Submitted Abstracts
Scientific Committee
See and Do
CAMDA 2003 Conference Contest Datasets

The CAMDA’03 data sets are focused on lung cancers. Four microarray data sets were released as a CAMDA'03 data challenge package. All four microarray data sets (Boston, Michigan, Ontario, and Stanford) are independently acquired to ask the same questions in lung cancer biology. This years challenge is to integrate information from different data sets. Thus, you are encouraged to submit your analysis result based on at least two of the four data sets. However, there may be obstacles to integrating the data. For example, there may not be a sufficient number of clones in common between the different sets for interpreting the results. Discussions of the problem and solution of cross-platform issues are encouraged. We should emphasize that the final goal of the analysis is to make an impact on cancer biology and eventually patient care. Thus, the biological relevance of your methodology is critical. We would like to see papers deeply analyzing the biology of lung cancer. Especially, we welcome the methodology development of survival analysis using microarrays for cancer prognostics (Bioinformatics 18: S120, 2002).

***Please note that in the initial release, some of the raw image files are not available. We will notify the people who downloaded this partial data set as soon as we acquired the raw data.

The four data sets are: CAMDA’03- Boston, Michigan, Ontario, and Stanford.

Datasets: (pdf format)

Harvard Lung Cancer Dataset
Bhattacharjee, A, et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS. 98 (24), 13790-13795, November 2001.
Michigan Lung Cancer Dataset
Beer, D, et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 9 (816), 2002.
Stanford Lung Cancer Dataset
Garber, ME, et al.  Diversity of gene expression in adenocarcinoma of the lungPNAS. 98 (24), 13790-13795, November 2001.
Ontario Lung Cancer Dataset
Wigle, D, et al.  Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Research, 62 (11): 3005-3008, June 2002.

Downloadable Data: (zip format) (ftp.camda.duke.edu/CAMDA03_DATASETS/)

dataset publication processed data images cel dat platform
harvard 25 MB 33 MB X 729 MB NA Affymetrix
michigan 1 MB 4 MB X 217 MB 3355 MB Affymetrix
stanford 9 MB 186 MB coming soon X X cDNA array
ontario 1 MB 1 MB text
1 MB excel
1558 MB X X cDNA array

Note: only 9 concurrent downloads are allowed at any one time.  Web browsers tend to create multiple connections when FTPing, so we strongly encourage that you download a single file at a time via a command-line ftp client during off-peak hours (between 6 PM and 8 AM, anytime on weekends).

CAMDA 2003 Data Set Policies:

Because CAMDA is a competition, we are unable to answer questions related to the content or format of the data files.  Part of the technical challenge of the competition is in decoding the raw data itself, in addition to processing the data.  To facilitate this process, we have set up a public mailing list for discussion of the datasets.  This mailing list is accessible at http://groups.yahoo.com/group/camdadata/.  To be fair to all contestants, we will not reply to individuals.  However, contestants are encouraged to share ideas and questions on the public mailing list.  We hope that the mailing list will become a forum for discussing the technical aspects of the datasets and move the process of analyzing the data forward.  We will answer questions to the mailing list at our own discretion. 

North Carolina Biotechnology Center GlaxoSmithKline Duke Center for Bioinformatics and Computational Biology The Scientist

Last modified on 01/28/2004 10:08:08

© Duke Comprehensive Cancer Center