■ Because “XX is not enough” that “I can not master the data”
As a professional in solving practical problems using data analysis, I conduct business training, practical support, lectures, etc. I also teach at business schools and universities.
Many people have studied data analysis and statistics until now. Nevertheless, I always hear the following concerns:
“Somehow uncomfortable with the results”
“There is data, but I do not realize that I am mastery of it”
“No results like what I expected”
“I can not use what I actually learned”
It is not necessarily caused by misuse of hard skills such as “analytical method”, “data science”, “statistics”, but the soft skills of people using them are not enough .
■ Then, what is practical “data analysis”?
Can you clearly and specifically answer to the question: “What do you need for data analysis?” And “What do you need to do for data analysis?”. What is the output of “data analysis”, what can you do with the output to achieve your goal?
In fact, many people tend to start to play around with data, without making those answers clear. In such case, you may not be able to draw out any useful information from data.
We often hear the word “data science”.
And then you may think “As long as you appropriately process the data, you can get valuable information out of the data automatically”
There are so many people who have such an impression .
Let’s try to sort out the categories and scope of the “data analysis” (see figure below).
The figure above shows a simplified representation of the world covered by the word “data analysis” in general . However, please be aware that my thinking “that the person in charge is using data (analysis) in practice” is part of it (“Data Analysis” category in the figure).
In general, the technical category handled by a data analysis expert (data scientist) is the top layer in the above figure. In order to become an expert in the upper category (Data Science), you need knowledge and understanding of academic mathematics, statistics, programming and latest technology.
However, it is quite unusual for a business company in general to hire an in-house data analysis specialist(s) all the time. This is because it is an area that you can outsource from time to time or leave it to a “machine”.
On the other hand, huge data can be easily gathered through internet, and there are overwhelmingly many cases where “non-data scientist” wants to quickly use it for his/her immediate goal. This is depicted in the middle (Data analysis) and the lower (Data Arrangement/Processing) categories.
Note that there is a huge gap in reality between the top layer and middle&lower layers(categories).
Never a business person who is not an expert on data analysis can do something even with sophisticated analytical tools, methods, statistical theory.
And, there is also a clear important reason to divide the middle and lower categories.
From small startups to super-large enterprises, there are many companies that has the trouble of “We have lots of data but not enough results with sufficient analysis”. Those companies end with the lower categories and never reach the middle without noticing the fact.
It is difficult to obtain “useful and convincing” analytical results only within the lower category. The goal should be in the middle category for utilizing the data analysis results for your business objective.
What absolutely necessary in any case is to identify the category where you use the data according to the ultimate goal you want to achieve (do you want to apply the latest technologies? Or to resolve the problem with data analysis or simply to visualize the data trend ? etc.) BEFORE starting collecting or processing any data!
In this article, we will cover the middle and lower categories in the chart. In other words, it will be a totally different story from this article to talk about the latest technology trends and programming for professional data scientists.
The common issue is that many organizations stops its data utilization in the lower category and have not reached up to the middle category (Data analysis). If you can expand the scope of your data utilization to the middle category, then you may get useful results required in your team/organization.
It is neither “statistical theory” nor “advanced analysis methods and tools” nor “the latest programming technology”.
No matter how fundamental or how data collection and processing methods are based on state-of-the-art technology, human skills (soft skills, “Data analysis” in the following chart) are required for the following process:
· What kind of data should be used
· How to interpret and utilize the output/result
As mentioned above, people and organizations who are not familiar with data are completely missing (not shortage) the soft skills.
■ Some misunderstanding on practical data analysis
Some people might think “I want to have the data analysis skills”.
“If I get to know even more analysis methods, additional and useful information can be obtained from the usual data.”
But, after some time you struggle with the data, you may understand that the idea is a just “illusion”.
Why is it “illusion”?
There are several reasons and backgrounds for this, but here I will tell you the most obvious (and easy to fall) background (see the figure below).
Before starting any actions using data, you should ask the fundamental question “How detailed does the data in your hands comprehensively represents the reality of the issue?”
Examples of data available to any companies are such as “sales results” and “customer satisfaction score” etc. Some data can be decomposed by product, by customer attribute, by region, by time, etc.
But no matter how much you are decomposing the data, you do not get information like “Why is your sales higher on Friday than on Wednesday?” Or “Why is the score in AreaA lower than that in AreaB?”
It is necessary to return to the reality that the data shows only a part of reality. Furthermore, the information that analysts can derive from that data should be also only a part of the overall information that the data has.
From time to time, I use such expressions in my lecture:
“There is no answer in the data”
Under the illusion of “There must be an answer I want to know in the data”, I ‘ve seen a lot of cases in which they struggle with the data endlessly, resulting in no practical results in the end. In this way, data analysis does not go well in practice.
So how can you resolve the issue?
Do not search for an answer. Rather, you make your own answer and verify it with data!
To do so, you need to begin by defining your issue and goal concretely and developing the necessary logic as a hypothesis.
(To be continued)