"No matter how great the data and system, we still need to provide the most accurate and useful insights we can for the people who use them as part of their business strategy and decisions." — Manoj Oleti, Lead Data Scientist/Technical Architect at ADP
Once you have enough quality data in usable form, it's possible to process that data in different ways to detect patterns and see connections and relationships.
To understand more about how ADP analyzes its data to turn it into new and useful insights, we talked to Manoj Oleti, Lead Data Scientist/Technical Architect at ADP.
Q: One of my favorite cartoons is by S. Harris. It has two scientists working through a problem. There are formulas across a chalkboard. In the center it says: "Then a Miracle Occurs." The caption is: "I think you should be more explicit in Step 2."
For many of us, that's what machine learning seems like. You put in "big data," then a miracle occurs, and out comes amazing analytics and insights.
What really happens in Step 2 after the data is collected, clean and ready to process?
Manoj: That's great. We would like it to feel like that. But it's not a miracle. It's algorithms, data models, processing power and design.
But first, it's important to understand what insights organizations need and who within those organizations needs them. We focus on information that could be used to create business value. Then we look at what makes an insight. The five traits of an insight are that they are useful, easy to comprehend, timely, personalized and actionable.
Then we categorize the types of insights we want based on whether they are about a specific organization (turnover rate), the industry (average turnover by type of organization) or the comparison between an individual organization and its industry (benchmarks).
Within each of these categories, we look at different ways we can process and organize the data to learn new things. One way that is useful is to rank findings in a particular category. For example, we could rank turnover rates within your industry and region to find that your turnover rate is lower than 75 percent of the others in your industry. Another way is to look for changes or outliers in the data. For example, we could see that your overtime has increased 18 percent over the last 12 months.
So a lot of thought and consideration about the types of insights we want go into the overall design process before we actually build the system and process any data.
Q: Can you please explain what an algorithm and a data model is?
Manoj: Yes. An algorithm is simply a set of directions or instructions to solve a problem. It's the knowledge that you give a computer on what to do and what order to do it in.
Data models, on the other hand, are ways of organizing relationships between various types of data.
Algorithms and data models complement each other and are both critical for building products. Say you want to build a house. Just knowing how to build a house and not having the raw materials, or having the raw materials but not knowing how to build a house will be of no use.
Traditional ways of designing algorithms used to be all about coding up the rules and logic manually. However, with the advent of machine learning, we now have algorithms which can learn from data and solve various interesting problems, without the need for the programmer to code up all the logic.
Q: So how does machine learning fit into this process of turning data into insights?
Manoj: We first leverage big data to mine the universe of data points and generate a pool of candidate insights. That's only half the problem solved – we still have to deliver the right insights to the right people at the right time. This is the gap that machine learning fills.
Machine learning is a paradigm by which we can program computers to learn from data or evidence. In traditional programming the developer creates the set of instructions to solve a problem as code. But in machine learning, we start with an existing set of data and train an algorithm to learn from this data and remember this knowledge as a "trained model." The nice thing about this is, the model can continue to evolve and adapt with availability of new data. The system incorporates what it "learns" and as it is used over time, we can begin to see other new things like cycles and changes.
Q: Then how do you choose the insights you deliver?
Manoj: This is where we combine what's possible with our business knowledge and research on what our clients want and what they will find useful. We come up with a list of things we can do, then we evaluate and rank the possible insights to choose which ones would provide the most value to the users.
We look at broad generic factors like how many people are covered by the insight, how much it changes with time, and how many different dimensions go into the insight (when there are fewer, it's easier to track accuracy). We also look at user-centric factors like who is going to use this insight, what role do they have, how do they use metrics and analytics in that role, and what types of decisions are they responsible for.
No matter how great the data and system, we still need to provide the most accurate and useful insights we can for the people who use them as part of their business strategy and decisions.
SIGN UP FOR THE SPARK NEWSLETTER