poclain case study : Construction of a Data Platform
- Secteur : Data
- Cible : CIO
- A propos : Data Platform
Context & questions
Our client wanted to build a data platform to deliver on multiple use cases like predictive maintenance, customization or price optimization. They faced many architecture challenges and had many questions :
- How can we make the best use of the different data sources we have including the cloud ?
- Should we centralize data in a structured data warehouse?
- Should we add an additional layer on top?
- What level of cleaning and transformation should be performed?
- What level of involvement and autonomy should the business have in these tasks ?
our adviser
For this consultation, we picked Jean Michel C. , Data Expert for Singapore Airlines.
Our answer
During the consultation, JeanMichel made the following recommendations :
1. Make a clear distinction between Time Sensitive Data ( Kafka, Event Hub, Serverless Data Hub) and non-Time Sensitive Data ( Batch, Data dump on cloud object storage)
2. Do not store data in a centralized Data Warehouse for the following reasons :
- Require nearly permanent compute and unique design before being used
- Complex data pipelines to build to feed a rigid schema
- Difficult to implement across domains (métiers)
3. Adopt a Lake House approach ( Technical approach) :
- Data Lake with query capabilities
- Some data warehouse to support only “warm” data for reporting purpose
- All “cold data” stay in the data lake and can be queried directly with no data warehouse requirement
1. Let IT decide data quality
• Missing fields can be ok on a business perspective
• Data quality is the first step to check before making data available
b. Let IT drive a data project without a data owner (from the business)
• EACH data project / stream needs to have a Business RoleJean Michel also explained to avoid the following pitfalls :
1. Collect all the available data … just in case
• Expensive
• Not useful as it requires downstream management with no oversight from any data owner since the use cases haven’t been defined
1. Let IT decide data quality
• Missing fields can be ok on a business perspective
• Data quality is the first step to check before making data available
1. Let IT drive a data project without a data owner (from the business)
• EACH data project / stream needs to have a Business Role