In non-small-cell lung cancer (NSCLC), molecular profiling of tumors has led to the identification of gene expression patterns that are associated with specific phenotypes and prognosis. Such correlations could identify early-stage patients who are at increased risk of disease recurrence and death after complete surgical resection and who might benefit from adjuvant therapy. Profiling may also identify aberrant molecular pathways that might lead to specific molecularly targeted therapies. The technology behind the capturing and correlating of molecular profiles with clinical and biologic endpoints have evolved rapidly since microarrays were first developed a decade ago. In this review, we discuss multiple methods that have been used to derive prognostic gene expression signatures in NSCLC. Despite the diversity in the approaches used, 3 main steps are followed. First, the expression levels of several hundred to tens of thousands of genes are quantified by microarray or quantitative polymerase chain reaction techniques; the data are then preprocessed, normalized, and possibly filtered. In the second step, expression data are combined and grouped by clustering, risk score generation, or other means, to generate a gene signature that correlates with a clinical outcome, usually survival. Finally, the signature is validated in datasets of independent cohorts. This review discusses the concepts and methodologies involved in these analytical steps, primarily to facilitate the understanding of reports on large dataset gene expression studies that focus on prognostic signatures in NSCLC.