Context & questions

Our client wanted to build a data platform to deliver on multiple use cases like predictive maintenance, customization or price optimization. They faced many architecture challenges and had many questions :

  • How can we make the best use of the different data sources we have including the cloud ?
  • Should we centralize data in a structured data warehouse?
  • Should we add an additional layer on top?
  • What level of cleaning and transformation should be performed?
  • What level of involvement and autonomy should the business have in these tasks ?

our adviser

For this consultation, we picked Jean Michel C. , Data Expert for Singapore Airlines.  

Our answer

During the consultation, JeanMichel made the following recommendations :

1. Make a clear distinction between Time Sensitive Data ( Kafka, Event Hub, Serverless Data Hub) and non-Time Sensitive Data ( Batch, Data dump on cloud object storage)

2. Do not store data in a centralized Data Warehouse for the following reasons :

  • Require nearly permanent compute and unique design before being used
  • Complex data pipelines to build to feed a rigid schema
  • Difficult to implement across domains (métiers)

3. Adopt a Lake House approach ( Technical approach) :

  • Data Lake with query capabilities
  • Some data warehouse to support only “warm” data for reporting purpose
  • All “cold data” stay in the data lake and can be queried directly with no data warehouse requirement

1. Let IT decide data quality
• Missing fields can be ok on a business perspective
• Data quality is the first step to check before making data available

b. Let IT drive a data project without a data owner (from the business)

• EACH data project / stream needs to have a Business RoleJean Michel also explained to avoid the following pitfalls :

1. Collect all the available data … just in case
• Expensive
• Not useful as it requires downstream management with no oversight from any data owner since the use cases haven’t been defined
1. Let IT decide data quality
• Missing fields can be ok on a business perspective
• Data quality is the first step to check before making data available

1. Let IT drive a data project without a data owner (from the business)
• EACH data project / stream needs to have a Business Role

Our answer

Based on our recommendations, Poclain was able to write a Request for Proposal saving weeks of work. They understood also who were the providers to work with in priority. We help them save time & money to build their data platform.

Nous avons aussi accompagne