Phase 1: Dataset Acquisition and Quality Control — Days 1-3
Download single-cell RNA sequencing dataset GSE159677 from GEO database containing human atherosclerotic plaque samples. Verify data integrity and completeness, ensuring presence of both count matrices and metadata. Load data into R environment using Seurat package (v4.0+). Perform initial quality control filtering: retain cells with 200-6000 detected genes, <20% mitochondrial gene expression, and <10% ribosomal gene expression. Filter genes expressed in at least 10 cells. Calculate quality metrics including number of unique molecular identifiers (UMIs), gene count per cell, and mitochondrial gene percentage. Remove potential doublets using DoubletFinder algorithm with expected doublet rate of 7.5% for 10X Genomics data.
...
Phase 1: Dataset Acquisition and Quality Control — Days 1-3
Download single-cell RNA sequencing dataset GSE159677 from GEO database containing human atherosclerotic plaque samples. Verify data integrity and completeness, ensuring presence of both count matrices and metadata. Load data into R environment using Seurat package (v4.0+). Perform initial quality control filtering: retain cells with 200-6000 detected genes, <20% mitochondrial gene expression, and <10% ribosomal gene expression. Filter genes expressed in at least 10 cells. Calculate quality metrics including number of unique molecular identifiers (UMIs), gene count per cell, and mitochondrial gene percentage. Remove potential doublets using DoubletFinder algorithm with expected doublet rate of 7.5% for 10X Genomics data.
Phase 2: Normalization, Dimensionality Reduction and Cell Clustering — Days 4-7
Normalize gene expression data using SCTransform method to account for technical variation while preserving biological variation. Identify highly variable genes (n=3000) and perform principal component analysis (PCA). Determine optimal number of principal components using elbow plot analysis and retain first 30 PCs for downstream analysis. Construct shared nearest neighbor (SNN) graph using FindNeighbors function with k=20 nearest neighbors. Perform Leiden clustering with resolution parameter optimization (test 0.1-2.0 range) to identify distinct cell populations. Generate UMAP embedding for visualization using 30 PCs with min.dist=0.3 and n.neighbors=30 parameters.
Phase 3: Cell Type Annotation and Marker Identification — Days 8-12
Identify differentially expressed genes for each cluster using FindAllMarkers function with Wilcoxon rank-sum test (min.pct=0.25, logfc.threshold=0.5). Annotate cell types based on expression of canonical markers: CD68, CD14 for macrophages; CD31, VWF for endothelial cells; ACTA2, MYH11 for smooth muscle cells; CD3D, CD8A for T cells; CD79A, MS4A1 for B cells; S100B, GFAP for other cell types. Use SingleR package with Human Primary Cell Atlas reference for automated annotation validation. Manually refine annotations by examining expression of known cell-type specific genes and consulting atherosclerosis literature. Generate dot plots and violin plots to visualize marker gene expression across clusters.
Phase 4: C1Q Complex Gene Expression Analysis — Days 13-16
Focus analysis on complement C1Q complex components: C1QA, C1QB, and C1QC. Generate feature plots showing expression distribution across UMAP embedding and calculate percentage of cells expressing each gene per cluster. Perform differential expression analysis comparing C1Q+ vs C1Q- cells within each cell type using MAST algorithm accounting for cellular detection rate. Identify C1Q-related differentially expressed genes using correlation analysis (Pearson r > 0.3 with any C1Q subunit) and functional enrichment analysis using clusterProfiler package. Focus on complement pathway genes, inflammatory markers, and atherosclerosis-associated genes. Create heatmaps showing C1Q complex expression across all identified cell types.
Phase 5: Functional Enrichment and Pathway Analysis — Days 17-19
Perform Gene Ontology (GO) enrichment analysis and KEGG pathway analysis on C1Q-related differentially expressed genes using enrichGO and enrichKEGG functions. Focus on biological processes related to complement activation, immune response, and vascular pathology. Conduct Gene Set Enrichment Analysis (GSEA) using fgsea package with MSigDB hallmark gene sets and complement-specific gene sets. Generate pathway interaction networks using Cytoscape to visualize relationships between C1Q complex and other complement components. Calculate module scores for complement activation signatures across all cells using AddModuleScore function.
Phase 6: Statistical Validation and Visualization — Days 20-21
Validate cell type annotations using multiple approaches: examine expression of 3-5 marker genes per cell type, calculate silhouette scores for cluster quality, and perform bootstrap resampling to assess clustering stability. Generate comprehensive visualization package including: UMAP plots colored by cell type and C1Q expression, dot plots of top marker genes, violin plots of C1Q complex genes by cell type, and heatmaps of C1Q-correlated genes. Perform statistical testing using appropriate methods (Wilcoxon rank-sum test for pairwise comparisons, Kruskal-Wallis for multiple groups) with Benjamini-Hochberg FDR correction. Document all analysis parameters and generate reproducible R scripts with session information.