Data Profiling Is OK, Really It Is

Session Number: 8251
Track: Big Data & Data Warehousing
Sub-Categorization: Data Architecture
Session Type: Best Practices
Primary Presenter: Peter Scott [Director - Sandwich Analytics]
Time: Jun 24, 2019 (03:45 PM - 04:45 PM)
Room: 310, Level 3

Speaker Bio: With over two decades of experience in business intelligence and ETL systems, Peter has a wealth of practical experience with data and its analysis. Over his career with global system integrators, boutique BI software houses and tier one management consultancies, Peter has worked on analytics projects for government, retail, banking, education, publishing, energy, and manufacturing. To further his desire to share his knowledge, he set up Sandwich Analytics to provide expert assistance to companies implementing and enhancing BI systems throughout the world.

Peter is an Oracle ACE specializing in BI, and through this program he contributes to Oracle conferences in North America, Europe, and New Zealand. He is also active on his own BI-centric blog and pops up on Twitter every now and then with topics as diverse as Data Modeling, ETL, Analytic SQL, and cats.
Technologies or Products Used: None

Session Summary for Attendees:  In a targeted political advertising world, the words "data profiling" have been hijacked to become something less than socially desirable. However, true data profiling has a major role in the delivery of analytics and reporting projects, even Agile ones!

This talk goes back to a more functional approach to data profiling and how we can use that to build better BI systems. It’s better in terms of performance, better in terms of the need for less reworking during development, and has fewer “surprises.” Starting from the importance of building models and discovering data relationships, the presentation expands on the need to actually look at the data to consider things such as:

• Event transition states—the way that factual statuses progress through a sequence of states over time.
• Hierarchies, parentage, ways to handle "adoption,” and the time when data is reclassified.
• Multiple data sources to one data store and how we can be sure we are talking about the same thing.

Throughout the talk, there will be real-world examples of recent projects using both locally and cloud-hosted data stores. As we’ll discover, data profiling is not a bad thing, and used well it delivers enormous benefits to a BI project.