Interactive Data Flow Architecture
Explore the complete iD Mobile data ecosystem with detailed component interactions and flow visualization
Explore the complete iD Mobile data ecosystem with detailed component interactions and flow visualization
Files Generated: 40-50 daily
Processing Schedule: Morning batch
Transfer Method: IBM File Transfer
Destination: SFTP Server Lin51
Essential customer data pipeline supporting downstream analytics, reporting, and business intelligence processes.
File Size Growth: Increasing by one additional day daily
Delivery Time: Progressively later arrivals affecting downstream processes
System Dependencies: Upgrade-related delays cascade through entire pipeline
Primary Path: API via Databricks
Secondary Path: API via ADF
Data Types: Analytics, Surveys, Reviews
Update Frequency: Multiple intervals
Multi-platform digital insights feeding comprehensive customer experience analytics and business intelligence processes.
Customer Insights: 360-degree view across digital touchpoints
Experience Optimisation: Data-driven improvements to customer journey
Business Intelligence: Real-time performance monitoring and analytics
Source: Google Analytics (BigQuery)
Frequency: 4 timed intervals daily
Method: Automated synchronisation
Output: Structured data warehouse
Unified data engineering platform enabling reliable batch and streaming data pipeline management across the organisation.
Simplified Management: Unified solution for complex data engineering workflows
Automated Orchestration: Reduced manual intervention and improved reliability
Scalable Architecture: Supports both batch and streaming data processing requirements
Source: Carphone Warehouse Retail Stores
Data Types: Sales & Return Transactions
Processing: Netezza Data Warehouse
Output: Raw Layer Data Files
Legacy transaction system providing essential retail data through established on-premise infrastructure before cloud migration.
Infrastructure Risk: On-premise Netezza solution requires immediate cloud migration planning
Data Processing: Complex double-ingestion process for iD creates operational complexity
Modernisation Blocked: Legacy dependencies prevent scalability and advanced analytics integration
File Type: Bespoke Lookup Files
Access Method: Multi-device SharePoint
Integration: Databricks API
Processing: Automated Ingestion
Microsoft collaborative platform enabling business users to provide custom data enrichment files through familiar interface.
API Standardisation: Establish proven patterns for SharePoint-Databricks integration
Quality Framework: Implement automated validation and quality checks for user uploads
Monitoring Enhancement: Advanced error handling and process monitoring for custom integrations
Input: CMP via IBM File Transfer
Processing: Decryption & Staging
Integration: ADF Automation
Output: Azure Blob Storage Raw
Secure intermediary server enabling encrypted file processing and automated cloud storage integration in the data pipeline.
Decommissioning Timeline: Server scheduled for retirement with no confirmed replacement solution
Pipeline Dependency: Critical staging point with no alternative path currently available
Immediate Action Required: Urgent need for alternative infrastructure solution to maintain data flow
Cloud-Native Replacement: Develop Azure-based secure file staging solution
Enhanced Security: Implement modern encryption and access control mechanisms
Scalable Architecture: Design for future growth and improved operational resilience
Experience the power of conversational analytics with our intelligent data chat interface.
π Open Data AppPipeline Duration: 4-6 hours for redaction processes
Annual Cost Impact: Additional Β£200k Azure expense
Process Understanding: Limited visibility into data locking mechanisms
Platform: MLflow on Databricks
Feature Store: Databricks Feature Store Client
Inference: Automated endpoint creation
Data Source: Curated feature tables
Models utilise feature tables from the conformed layer ensuring consistent, high-quality input data for reliable predictions.
30
Production Reports
~40
Users per Month
~500
Views per Month
Explore the full suite of production reports and analytics dashboards.
π Open Power BI WorkspaceThis migration would reduce complexity, improve consistency, and potentially address many of the current performance and maintenance challenges by centralising business logic in the Databricks layer.
Design Pattern: Star Schema
Data Source: Conformed Tables
Platform: Power BI Service
Deployment: Multi-Report Support
Models incorporate custom business metrics and serve as the semantic layer between raw data and business intelligence reporting.
Platform: Databricks Native
Data Sources: Tables, Queries, Metric Views
Connectivity: Native Integration
Sharing: Cross-Platform Compatible
Dashboards leverage the Databricks ecosystem for simplified, high-performance analytics with seamless integration across notebooks, Genie AI, and SQL interfaces.
Speed & Simplicity: Native connectivity eliminates external bottlenecks
Ecosystem Integration: Unified experience across all Databricks tools
Future-Ready: Continuously evolving platform with regular improvements
Source: Metric Views (0.1 version)
Generation: Automated topic creation
Outputs: Dashboards & Datasets
Platform: Omni (Databricks)
Semantic data models that bridge Metric Views with business analytics, enabling AI-powered dashboards and self-service datasets.
Simplified Analytics: Business-friendly semantic layer reduces technical barriers to data exploration
AI-Enhanced Insights: Native AI capabilities provide intelligent recommendations and natural language interactions
Unified Experience: Single semantic model supports multiple analytics outputs for consistent business logic
Platform: Omni (Databricks)
Data Source: Omni Topics
Integration: Native Omni Platform
AI Capabilities: Enhanced Native Features
Dashboards leverage Omni Topics semantic models for AI-enhanced analytics with superior native integration and performance.
AI-Enhanced Analytics: More advanced AI capabilities than standard dashboards for intelligent insights
Native Performance: Optimised Omni platform integration for superior speed and responsiveness
Unified Semantic Layer: Consistent business logic through Omni Topics foundation
Input: Natural Language Questions
Processing: AI Interpretation & Query Generation
Output: Data Insights + Explanations
Source: Omni Topics
Self-service datasets powered by AI similar to Genie, enabling natural language data exploration through Omni Topics semantic models.
Universal Access: Every user becomes a data analyst without technical training requirements
Accelerated Insights: Instant data exploration without complex query building or BI tool expertise
Governed Self-Service: AI-powered analytics with consistent business logic through Omni Topics foundation
Similar AI Capabilities: Both powered by advanced AI for natural language data exploration
Omni Topics Foundation: Datasets built on semantic models from Omni Topics for enhanced consistency
Self-Service Focus: Designed specifically for independent data exploration with AI assistance
Input: Natural Language Questions
Processing: AI Interpretation & SQL Generation
Output: Data Insights + SQL Code
Platform: Databricks Native
Revolutionary natural language interface transforming complex data analytics into conversational interactions for universal accessibility.
Universal Access: Every user becomes a data analyst without technical training requirements
Accelerated Insights: Instant data exploration without complex query building or BI tool expertise
Transparent Analytics: Full visibility into analytical processes through code generation and explanation
Documentation Excellence: Comprehensive table and field documentation by BI team
Metric View Foundation: Robust models using advanced metric view functionality
Feedback Loop Excellence: Rapid response to user feedback for continuous improvement
User Engagement: Active participation in feedback and system training for optimal results
Source: Databricks Direct
Method: Saved Functions
Execution: Designated User Base
Distribution: Internal + ADF Third-Party
Coded extract solutions providing targeted data outputs for internal analytics and external third-party business intelligence sharing.
Function Repository: Saved coded functions stored within Databricks environment
User Access Control: Designated user segments with appropriate execution permissions
External Integration: ADF-enabled third-party data sharing capabilities
Processing Power: Leverages full Databricks computational capabilities for extract generation
Targeted Solutions: Specific data outputs designed for precise business requirements
Partnership Enablement: Facilitates external data collaboration and business intelligence sharing
Operational Efficiency: Automated extraction processes reducing manual workload and processing time
Core Structure: valid_from date, subscription, device brand
Snapshot Frequency: Daily subscriber movements
Join Strategy: Foreign keys to type 2 dimensions
Access Method: ASOF joins for temporal accuracy
Centralised logic for key subscriber movements with structured attributes including Tariff Plan Name and comprehensive usage metrics.
Feature Serving: Post REST server retrieval with mega cluster in-memory processing
Temporal Windows: Last 3 bill average, 30-day usage patterns, and historical trend analysis
Experimentation Support: Feature selection capabilities with conformed layer extension
Platform Integration: Native Databricks Feature Store with ML Runtime connectivity
Engineering Efficiency: Dramatic reduction in data engineering time through centralised feature store
Model Applications: Powers churn prediction, engagement analysis, roaming intelligence, and personalisation models
Consistency & Reuse: Standardised features across teams eliminating duplicate engineering and ensuring model reproducibility
Data Source: Azure Data Lake - Clean
Processing: Independent from Reporting
Output: Tailored Business Files
Integration: CMP Upload & Attribution
Strategic layer enabling business-specific data transformations independent of reporting infrastructure, ensuring operational flexibility and reduced dependencies.
Data Ingestion: Cleaned tables from Azure Data Lake providing high-quality source data
Business Logic Application: Complex code implementing specific business rules and transformations
Output Generation: Tailored files created for specific business process requirements
System Integration: Automated upload and integration with CMP for operational use
Operational Excellence: Enables sophisticated business logic implementation separate from reporting constraints
Customer Experience: Powers personalised segmentation and targeted communication strategies
Architectural Flexibility: Independent design reduces dependencies while maintaining integration capabilities
Data normalisation is the process of organising data to reduce redundancy and improve data integrity. Think of it as tidying up a messy spreadsheetβsplitting it into smaller, more focused tables and defining clear relationships between them.
1NF (First Normal Form): No repeating groupsβeach field contains only atomic values
2NF (Second Normal Form): Every non-key column is fully dependent on the entire primary key
3NF (Third Normal Form): Removes transitive dependencies between non-key columns
Target Achievement: 3NF balance between structure and performance
Our conformed layer implements normalisation principles to ensure data efficiency, consistency, and maintainability across the entire analytics platform.
Input Sources: Redacted SQL Server + Azure Data Lake
Processing Framework: iD Framework Transformation
Data Tools: ADF, MuleSoft Integration
Output Applications: BI, ML, Analytics, Self-Service
Central data hub enabling standardised field naming, business logic application, and consistent customer data management across all analytical applications.
Power BI Semantic Models: Standardised data foundation for enterprise reporting and dashboards
Feature Tables: ML-ready data structures supporting churn prediction, engagement analysis, and advanced analytics
Data Extracts: Targeted business outputs for operational systems and third-party integrations
Genie AI/BI: Self-service analytics platform enabling natural language data exploration
Single Source of Truth: Eliminates data inconsistencies and provides unified customer view across all business applications
Analytical Excellence: Enables sophisticated customer behaviour analysis, predictive modelling, and data-driven decision making
Operational Efficiency: Reduces data engineering overhead through standardised, reusable data structures across teams
Technical Architecture: Designed and implemented by the Analytics Engineers
Privacy & Compliance: Advanced data redaction maintaining analytical capability while ensuring regulatory compliance
Framework Innovation: Continuous enhancement of iD Framework supporting evolving business and technical requirements