The modernisation of the data warehouse
Lead Data Scientist
Associate Data Engineer
In the past decade, data warehouses have been the mainstay of the analysis of business data: administrative, marketing and sales departments have based their budgets, planning and reporting on these centralised data platforms fed by all the information systems.
With the advent of cloud-native data analysis solutions offered by Google, Microsoft and AWS, more and more companies are embarking on the path of modernising their data warehouses. A path towards what all the analysts are calling Augmented Business Intelligence.
We discuss this phenomenon with three experts from our xTech centre of excellence on cloud technologies and data management.
Marco Pesarini, Partner at xTech, explains what it means to modernise your data warehouse and why our customers are considering these solutions with interest.
"Modernising your data warehouse means bringing your data analysis platform to cloud-native functions, fully managed and offered in pay-per-use mode A household analogy would be what we did with email servers, which were replaced by cloud services like Office 365 or Gmail. No more hardware or licences, which often weigh down the IT budget; no more operating costs or disruptions arising from your data centre; data, reports and analyses are always available through the network and with a cost model that is based purely on consumption.
Google BigQuery, Azure Synapse and AWS RedShift are the turnkey cloud solutions for building a data platform, without worrying about the need for a complicated IT architecture to support it.
Many of our customers are proceeding to modernise their data warehouses for cost reasons: our benchmarks report savings of up to 50% in the TCO compared to traditional solutions. The lion's share of this reduction is attributed to savings in terms of software licences that are replaced by leaner pay-per-use models that can improve the migration project's payback time to below 18 months.
But the benefits of modernisation go far beyond savings: customers who start exploring the topic realise how a modernised data warehouse solution allows analysis that isn't possible with traditional data warehouses.
Cloud solutions enable much greater real-time management of reporting, thereby surpassing the "delayed" data warehouse model, where data is often 24 hours old and many reports are updated on a weekly or monthly basis. Cloud-native solutions adopt big data uploading tools which offer a near real-time supply from operational systems to the data warehouse, providing – to give just one example – fresh accounting data for the day.
Cloud-native data warehouses are also more open to managing unstructured data originating from outside the company. For example, think of data from IoT sources and social media, for a company that wants to enrich its marketing analyses with sentiments from the network or with data collected from connected objects in the field. This data often originates in the cloud and comes in highly significant volumes and variety, so managing it directly on the cloud helps drastically reduce transfer costs and enables flexible resource management: I create space for data management only when I need it, without having to make pointless investments overtime.
Last but not least, cloud data warehouse solutions natively integrate the most advanced machine learning and artificial intelligence features on the market from Google, Microsoft and Amazon. These features can be inserted directly into the company's reporting to transform the old data warehouse reports into advanced analysis, deduction and forecasting tools. These are the most suitable tools for coping with the uncertain period we're facing; a period when our experience and many of our practices may no longer be adequate: a period be tackled with Augmented Business Intelligence.
All this is offered by the cloud, accessible with just a few clicks from our PCs, tablets or smartphones, from wherever we are. This is why we're talking about an epochal transformation."
Eros Frigerio, Manager and Solution Architect at xTech, explains the main challenges our customers face in modernising their data warehouses and the best approach to follow.
"As my colleague Marco explained, choosing to modernise your data warehouse in the cloud brings about numerous benefits, but at the same time it entails a series of challenges to tackle when designing and implementing the solution. The main challenges are:
Meeting security and compliance requirements: in the world of the public cloud, you have to pay close attention to the security and compliance requirements (e.g. the GDPR). We can't resort to exaggerated, blanket approaches –anonymise everything – but we need to work on the various use cases, to find solutions that provide the best compromise between operational agility and compliance.
The need to adopt a cloud-native approach: data models and data ingestion and transformation pipelines must be optimised for the cloud. Specialised skills are needed to adopt the most suitable components and obtain the maximum benefit from the cloud's native services and functions (serverless, autoscaling, etc.). Approaching the cloud with overly traditional solutions is not only unhelpful, but could also be counterproductive. For example, let's take the theme of data modelling, where a classic approach guided by standardisation to reduce storage space must give way to new approaches more oriented towards denormalisation – less SQL and more Excel, if you'll allow me the metaphor – which can better enable the management of unstructured data.
The need to adopt governance solutions: the configuration and implementation of cloud services within the corporate data design requires an accompanying data governance framework that lets you control the data along its articulated transformation path, to ensure the expected technical and business quality levels. Data governance has become a fundamental component of modernisation.
To prevent any problems, it's therefore essential for the project team to have the correct mix of specialist skills in cloud-native functions, data governance and cloud security, and to know how to understand, evaluate and direct a structured and mature approach to the modernisation programme.
To ensure and support the evolution of the cornerstones of the new data warehouse, our approach consists of three key phases which examine four areas of intervention: the choice of cloud access strategy, the design of the reference architecture, data modelling, and roll-out of analytics solutions.
The first phase is the assessment, which analyses the current situation to evaluate technical and economic feasibility, organisational maturity and data availability to identify the best strategic direction. A fundamental step of the assessment, which often helps customers understand the benefits and challenges associated with the modernised data warehouse, is the proof of concept. This experimental exercise aims to migrate a specific range of data, one specific report, to the cloud to see firsthand what modernisation means. A proof of concept can be achieved in a few weeks and is often a great help in raising awareness among all the project stakeholders.
Next comes the conceptualisation phase of foundation, aimed at creating the main founding pillars of the new data warehouse, especially in terms of architecture, data model and implementation guidelines for data flows and analytics.
The third and final phase is implementation: execute, measure and improve. This involves effectively activating and configuring the cloud architecture components, developing the new data model, the ingestion and transformation pipelines, and implementing the analytical and reporting use cases. This all takes place within an agile progress that foresees phases of continuous improvement."
Luca Natali, Associate Data Engineer at xTech, explains which main technologies enable the development of a modernised data warehouse platform.
"The main technologies for developing so-called modernised data warehouses are the cloud-native services developed by providers to produce a platform, offering complete management from raw data to visualisation dashboards. These services can be grouped into five basic categories:
Raw Data Layer: object-storage services, capable of accommodating structured, semi-structured and unstructured data. They are usually used as landing areas for files from other cloud environments or from systems at the customer's data centre. Examples include: Amazon S3, Google Cloud Storage, Azure Blob Storage.
Data Integration: managed and scalable processing tools which can perform distributed data processing by dynamically adapting the resources necessary for computation. On top of these tools, it is often useful to include additional services for scheduling activities, event-based processing (such as serverless event-based functions), data anonymisation, data cataloguing, etc. Examples of these tools include: Amazon Glue, Azure Data Factory, Google Dataflow.
Data Warehousing: this is the main component of the chain, where the data is saved in the data model in table format, after the transformation and quality assessment process. Examples include: Amazon Redshift, Google BigQuery, Azure Synapse Analytics
Visualisation/Reporting: these services are for developing reports and creating data visualisation dashboards. These tools are very often integrated with other sources to combine data from multiple different sources, thereby enabling new types of analysis.
Examples include: Amazon QuickSight, Microsoft Power BI, Google Data Studio.
One aspect that makes these cloud-native tools very attractive for creating new data warehouses is their ease of integration with the company's data sources. With just a few configuration steps, we have the possibility not only to integrate traditional business systems (like historical ERP or CRM), but also to connect more modern and cloud-based business information systems, such as Salesforce or ServiceNow.
Finally, it's worth stressing how even for customers who have already brought their data warehouse to the cloud, but without changing the technology (i.e. by adopting a lift & shift IaaS approach), modernisation can offer further benefits on top of those already obtained with the first migration of the data warehouse. These include greater flexibility and agility, greater platform reliability and a reduction in daily operational tasks, thanks to the fully managed service (management of backups, software upgrades, etc.). These benefits concern not only the data warehouse per se, but also all the tools related to data processing, ETL pipeline execution and scheduling. For this reason, as my colleagues have said, in order to derive maximum benefit from the modernised solution, the entire data management chain – from the ingestion phase to reporting – should be designed and implemented with extensive use of cloud-native tools."