Understanding SAM Clustering Results
Overview
SAM provides comprehensive clustering outputs designed to support both technical analysis and business decision-making. This guide explains how to interpret all 6 quality metrics (Silhouette, Davies-Bouldin, Calinski-Harabasz, Cluster Imbalance, Cluster Separation, Cluster Cohesion) and use them effectively for strategic planning.
Primary Outputs
1. Cluster Assignments (CSV Export)
Professional CSV output with cluster labels, quality metrics, and business indicators for strategic analysis
Standardized Multi-Column Format:
Record_ID | Cluster_Label | Silhouette_Score | Distance_to_Center |
Business_Metrics | Feature_Values | Quality_Indicators
Key Features:
- Cluster Labels: Each record assigned to its optimal cluster
- Quality Scores: Individual point silhouette scores for validation
- Business Metrics: Revenue, profit, and operational indicators per record
- Feature Values: Original and transformed feature values
- Distance Metrics: Proximity to cluster centers and boundaries
2. Visual Analytics (Interactive Charts)
Chart Components:
- Cluster Separation Plots: 2D/3D visualization of cluster boundaries
- Silhouette Analysis: Individual point quality assessment
- Feature Importance: Key distinguishing factors visualization
- Business Dashboards: Performance metrics per cluster
- Quality Heatmaps: Cluster separation and cohesion visualization
3. Executive Summary (PDF Report)
Complete executive PDF report with cluster performance, visual analytics, business insights, and strategic recommendations
Multi-Page Professional Report:
- Title Page: Project overview and generation date
- Cluster Summary: Model rankings and recommendations
- Visual Analytics: All charts included with captions
- Business Insights: Key findings and strategic implications
- Technical Glossary: Metric definitions and interpretations
4. Advanced Visualization Suite
Task 4.1: Foundational Visualizations
- Geospatial Distribution Map: Geographic clustering patterns
- Performance Quadrant: Revenue vs margin scatter plots
- Persona DNA Radar Chart: Comparative cluster profiles
- Cluster Summary Table: High-level performance metrics
Task 4.2: Deep-Dive Analytics
- Assortment Strategy Heatmap: Product mix analysis by cluster
- Geographic Dominance Matrix: Regional cluster distribution
- Trend vs Density Analysis: Competitive dynamics visualization
- Strategic Role Mapping: Business segment classification
Task 4.3: Final Report Generation
- Professional PDF: Multi-page executive report
- Chart Integration: All visualizations embedded
- Business Narratives: AI-generated insights and recommendations
- Action Plans: Specific strategic recommendations per cluster
Understanding Quality Metrics
Primary Quality Indicators
Silhouette Score
What it measures: How well each point fits in its assigned cluster
- Range: -1 to 1 (higher is better)
- Excellent: > 0.7 (Clear cluster separation)
- Good: 0.5-0.7 (Reasonable separation)
- Fair: 0.2-0.5 (Weak separation)
- Poor: < 0.2 (No clear separation)
Business Interpretation:
Silhouette Score = 0.65 means:
• 65% of points are well-separated into distinct clusters
• Clear business segments are identifiable
• Suitable for strategic decision-making
Davies-Bouldin Index
What it measures: Cluster compactness and separation (lower is better)
- Excellent: < 0.5 (Very compact, well-separated clusters)
- Good: 0.5-1.0 (Reasonable compactness)
- Fair: 1.0-2.0 (Moderate quality)
- Poor: > 2.0 (Poor cluster quality)
Business Interpretation:
Davies-Bouldin = 0.8 means:
• Clusters are reasonably compact and well-separated
• Business segments are distinct and actionable
• Good foundation for strategic planning
Calinski-Harabasz Score
What it measures: Between-cluster vs within-cluster variance (higher is better)
- Excellent: > 2000 (Strong cluster separation)
- Good: 1000-2000 (Reasonable separation)
- Fair: 500-1000 (Moderate separation)
- Poor: < 500 (Weak separation)
Simplified Quality Ratings
Cluster Quality Assessment
Our AI automatically grades cluster performance:
- Excellent (Silhouette > 0.7): High confidence for strategic decisions
- Good (Silhouette 0.5-0.7): Reliable for operational planning
- Fair (Silhouette 0.2-0.5): Useful for directional guidance
- Poor (Silhouette < 0.2): Consider additional data or different approach
Confidence Levels
Risk assessment for cluster reliability:
- High: Clear separation, consistent patterns, strong model fit
- Medium: Moderate uncertainty, acceptable for most planning
- Low: High variability, use with caution, consider alternative approaches
Business Intelligence Metrics
Cluster Profiling and Analysis
Cluster Size Distribution
What it measures: Balance and interpretability of cluster sizes
- Balanced: Similar-sized clusters (ideal for business segments)
- Skewed: One dominant cluster (may indicate natural business hierarchy)
- Fragmented: Many small clusters (may need consolidation)
Business Performance Metrics
Compare key business indicators across clusters:
Cluster 1: High Performers
• Size: 150 stores (25%)
• Avg Revenue: $2.1M
• Avg Margin: 18.5%
• Growth Rate: +12%
Cluster 2: Growth Opportunities
• Size: 200 stores (33%)
• Avg Revenue: $1.4M
• Avg Margin: 12.3%
• Growth Rate: +8%
Feature Importance Analysis
Identify which variables most distinguish clusters:
- Revenue Drivers: Key factors driving high performance
- Risk Indicators: Variables associated with underperformance
- Growth Factors: Characteristics of high-growth clusters
- Operational Metrics: Efficiency and productivity indicators
Strategic Segmentation Analysis
Business Segment Classification
Our AI automatically classifies clusters into business segments:
High Performers (Revenue > $2M, Margin > 15%):
- Strategy: Expansion & Replication
- Priority: HIGH - Study and replicate success factors
- Actions: Scale successful practices, invest in growth
Growth Opportunities (Revenue < $1.5M, Margin < 12%):
- Strategy: Support & Optimization
- Priority: HIGH - Requires immediate attention
- Actions: Performance improvement, targeted interventions
New Ventures (Age < 1 year):
- Strategy: Growth Support
- Priority: MEDIUM - Monitor maturation progress
- Actions: Development support, patience for growth
Geographic Clusters (Regional concentration):
- Strategy: Regional Strategy
- Priority: MEDIUM - Regional optimization
- Actions: Local market strategies, regional resources
Advanced Quality Metrics
Reliability and Confidence
Model Reliability Score (0-100)
Calculation: Quality-adjusted confidence measure
- 90-100: Extremely reliable, suitable for critical decisions
- 70-89: Good reliability, appropriate for most planning
- 50-69: Moderate reliability, use with additional validation
- < 50: Low reliability, consider alternative approaches
Cluster Stability Score
What it measures: Consistency of cluster assignments across multiple runs
- High Stability: Consistent cluster assignments
- Low Stability: Variable assignments, higher uncertainty
- Business Impact: Planning confidence and risk assessment
Separation Coefficient
Technical Measure: Average distance between cluster centers / average cluster radius Business Interpretation:
- > 2.0: Very clear separation between business segments
- 1.5-2.0: Good separation, actionable segments
- < 1.5: Overlapping segments, consider consolidation
Data Quality Indicators
Cluster Cohesion
Scale: 0-1, where higher values indicate tighter clusters
- > 0.8: Very cohesive business segments
- 0.6-0.8: Good cohesion, clear segment identity
- < 0.6: Loose segments, may need refinement
Cluster Separation
Scale: 0-1, where higher values indicate better separation
- > 0.7: Clear business segment boundaries
- 0.5-0.7: Good separation, actionable segments
- < 0.5: Overlapping segments, consider alternative approaches
Model Performance Comparison
Model Rankings Table
Our executive summary includes a comprehensive comparison:
| Model | Quality Grade | Silhouette | Reliability Score | Best Use Case |
|---|---|---|---|---|
| HDBSCAN | Excellent | 0.73 | 94 | Strategic Segmentation |
| K-Means | Good | 0.58 | 87 | Operational Clustering |
| GMM | Excellent | 0.71 | 96 | Risk Assessment |
Recommendation Engine
Best Model Selection: Our AI recommends the optimal model based on:
- Quality Performance: Silhouette score and separation metrics
- Business Context: Interpretability and actionability requirements
- Data Characteristics: Shape, size, and complexity factors
- Computational Efficiency: Processing time and resource requirements
Risk Assessment Framework
High Confidence Scenarios (Use clusters directly)
- Quality Grade: Excellent
- Silhouette Score > 0.7
- Reliability Score > 90
- Clear business interpretation
Medium Confidence Scenarios (Use with validation)
- Quality Grade: Good
- Silhouette Score 0.5-0.7
- Consider business validation
- Develop contingency plans
Low Confidence Scenarios (Directional guidance only)
- Quality Grade: Fair/Poor
- Silhouette Score < 0.5
- Focus on general patterns
- Frequent re-clustering recommended
AI-Generated Insights
Executive Summaries
What you get: Business-focused analysis for each cluster including:
- Performance assessment in business terms
- Key characteristics and distinguishing factors
- Comparison to other clusters
- Strategic implications
Example:
"Cluster 1 represents high-performing stores (18% of total) with average revenue of $2.1M and 18.5% margins. These stores are primarily located in urban markets with high customer density. Key success factors include strong inventory management and experienced staff. Strategic recommendation: Replicate these practices in Cluster 2 stores to drive overall performance improvement."
Actionable Recommendations
Categories:
- Performance Optimization: Improve underperforming clusters
- Growth Strategy: Scale successful cluster practices
- Resource Allocation: Distribute resources based on cluster potential
- Risk Management: Address cluster-specific challenges
Interpreting Cluster Visualizations
Visual Elements
- Cluster Colors: Each cluster has a distinct color for easy identification
- Point Sizes: May indicate business importance (revenue, profit, etc.)
- Boundaries: Show cluster separation and overlap areas
- Centers: Highlight cluster centroids and characteristics
Pattern Recognition
- Cluster Density: Tight vs loose clusters indicate segment cohesion
- Separation: Clear boundaries vs overlap indicate business segment clarity
- Outliers: Points far from cluster centers may need special attention
- Hierarchies: Nested clusters may indicate business sub-segments
Business Insights
- Segment Identification: Clear business segments for targeted strategies
- Performance Patterns: Visual correlation between location and performance
- Growth Opportunities: Underperforming areas with growth potential
- Risk Assessment: Clusters with high variability or outlier concentration
Common Pitfalls to Avoid
1. Over-Interpreting Low Quality Clusters
- Problem: Making major decisions on clusters with silhouette < 0.3
- Solution: Use for directional guidance only
2. Ignoring Business Context
- Problem: Accepting clusters that don't make business sense
- Solution: Validate AI insights against business knowledge
3. Misinterpreting Cluster Sizes
- Problem: Assuming equal cluster sizes are always better
- Solution: Consider natural business hierarchies and market realities
4. Not Validating Against Business Metrics
- Problem: Accepting clusters misaligned with business performance
- Solution: Validate cluster assignments against known business outcomes