Tag Archives: analytics

Part 1: What does it mean “to utilize data analysis in practice”? What do you need?

■ Because “XX is not enough” that “I can not master the data”
As a professional in solving practical problems using data analysis, I conduct business training, practical support, lectures, etc. I also teach at business schools and universities.

Many people have studied data analysis and statistics until now. Nevertheless, I always hear the following concerns:
“Somehow uncomfortable with the results”
“There is data, but I do not realize that I am mastery of it”
“No results like what I expected”
“I can not use what I actually learned”
.

It is not necessarily caused by misuse of hard skills such as “analytical method”, “data science”, “statistics”, but the soft skills of people using them are not enough .
■ Then, what is practical “data analysis”?
Can you clearly and specifically answer to the question: “What do you need for data analysis?” And “What do you need to do for data analysis?”. What is the output of “data analysis”, what can you do with the output to achieve your goal?
In fact, many people tend to start to play around with data, without making those answers clear. In such case, you may not be able to draw out any useful information from data.

We often hear the word “data science”.
And then you may think “As long as you appropriately process the data, you can get valuable information out of the data automatically”
There are so many people who have such an impression .

Let’s try to sort out the categories and scope of the “data analysis” (see figure below).

Chart1
The figure above shows a simplified representation of the world covered by the word “data analysis” in general . However, please be aware that my thinking “that the person in charge is using data (analysis) in practice” is part of it (“Data Analysis” category in the figure).

In general, the technical category handled by a data analysis expert (data scientist) is the top layer in the above figure. In order to become an expert in the upper category (Data Science), you need knowledge and understanding of academic mathematics, statistics, programming and latest technology.
However, it is quite unusual for a business company in general to hire an in-house data analysis specialist(s) all the time. This is because it is an area that you can outsource from time to time or leave it to a “machine”.

On the other hand, huge data can be easily gathered through internet, and there are overwhelmingly many cases where “non-data scientist” wants to quickly use it for his/her immediate goal. This is depicted in the middle (Data analysis) and the lower (Data Arrangement/Processing) categories.
Note that there is a huge gap in reality between the top layer and middle&lower layers(categories).
Never a business person who is not an expert on data analysis can do something even with sophisticated analytical tools, methods, statistical theory.

And, there is also a clear important reason to divide the middle and lower categories.

From small startups to super-large enterprises, there are many companies that has the trouble of “We have lots of data but not enough results with sufficient analysis”. Those companies end with the lower categories and never reach the middle without noticing the fact.
It is difficult to obtain “useful and convincing” analytical results only within the lower category. The goal should be in the middle category for utilizing the data analysis results for your business objective.

What absolutely necessary in any case is to identify the category where you use the data according to the ultimate goal you want to achieve (do you want to apply the latest technologies? Or to resolve the problem with data analysis or simply to visualize the data trend ? etc.) BEFORE starting collecting or processing any data!

In this article, we will cover the middle and lower categories in the chart. In other words, it will be a totally different story from this article to talk about the latest technology trends and programming for professional data scientists.
The common issue is that many organizations stops its data utilization in the lower category and have not reached up to the middle category (Data analysis). If you can expand the scope of your data utilization to the middle category, then you may get useful results required in your team/organization.
It is neither “statistical theory” nor “advanced analysis methods and tools” nor “the latest programming technology”.

No matter how fundamental or how data collection and processing methods are based on state-of-the-art technology, human skills (soft skills, “Data analysis” in the following chart) are required for the following process:

· What kind of data should be used
· How to interpret and utilize the output/result

As mentioned above, people and organizations who are not familiar with data are completely missing (not shortage) the soft skills.

Chart2
■ Some misunderstanding on practical data analysis
Some people might think “I want to have the data analysis skills”.
“If I get to know even more analysis methods, additional and useful information can be obtained from the usual data.”
But, after some time you struggle with the data, you may understand that the idea is a just “illusion”.

Why is it “illusion”?
There are several reasons and backgrounds for this, but here I will tell you the most obvious (and easy to fall) background (see the figure below).

Chart3

Before starting any actions using data, you should ask the fundamental question “How detailed does the data in your hands comprehensively represents the reality of the issue?”

Examples of data available to any companies are such as “sales results” and “customer satisfaction score” etc. Some data can be decomposed by product, by customer attribute, by region, by time, etc.
But no matter how much you are decomposing the data, you do not get information like “Why is your sales higher on Friday than on Wednesday?” Or “Why is the score in AreaA lower than that in AreaB?”
It is necessary to return to the reality that the data shows only a part of reality. Furthermore, the information that analysts can derive from that data should be also only a part of the overall information that the data has.

From time to time, I use such expressions in my lecture:
“There is no answer in the data”

Under the illusion of “There must be an answer I want to know in the data”, I ‘ve seen a lot of cases in which they struggle with the data endlessly, resulting in no practical results in the end. In this way, data analysis does not go well in practice.

So how can you resolve the issue?

Do not search for an answer. Rather, you make your own answer and verify it with data!
To do so, you need to begin by defining your issue and goal concretely and developing the necessary logic as a hypothesis.

(To be continued)

Advertisements

DATA ANALYSIS DESIGN APPROACH

Here is a part of my presentation in the “data analysis” seminar.

I always emphasis the significance of the right approach to a problem when you apply “data analysis” for solving it.

 

I found that many people struggled to find effective solutions based on the data especially when they started with analyzing the data without properly defining/formulating the problems and making hypothesis. Even if you find something from data, it might not be effective enough to solve the fundamental problem you have.

 

I call the necessary part in the problem-solving process as “ANALYSIS DESIGN”. My training programs all focus on the skill sets to design the analysis (i.e. problem-definition/formulation and hypothesis making) so that they can find a “right” solution.

 

Data analysis

This is something you should learn before learning the methodology of data analysis and/or difficult theory of statistics if you want to obtain the analytical skills to apply for business problem solving.

Also it is an important skill in the AI(Artificial Intelligence) era for many business persons as analysis itself can be already done by machines.

 

I have programs to train business persons and university students on this subject.

 

http://data-story.net/english/

A very interesting student work from my business statistics class

This is a very interesting presentation as the final exam of my “Business statistics” class at Yokohama National University in Japan.

 

My classes are all for international students and this is work of a student from Vietnam.

The students learned how to set a practical goal and how to effectively use some analytical techniques to support the conclusion(s).

The student tried to find out the difference of the students from two different countries, Vietnam and Japanese in terms of GPA and objectives to learn at university.

 

While the sample size is small and the conclusions are not necessarily surprising, the approach and analysis itself was quite interesting.

 

My class is not just to teach some academic analytical techniques but to teach how to apply those techniques to meet practical goals.

I would be happy to give my lecture anywhere in the world.

 

Hana 1Hana 2Hana 3Hana 4Hana 5

What you need to consider when drilling down the data.(vol.1)

People drill down data by some axis.

For example, sales amount data can be broken down (drilled down) by area, by branch or by product etc.

It is technically possible to drill down by any axis but you may find it not practical at all when you try to apply the results into your business. Why.

From my experience and survey, I found three major points you should consider when you select the axis for drilling down the data.

One of the three is “Impact to the goal”.

You need to make sure the consequence of using the axis will effectively contribute to or affect the goal finally.

You can break the data by customer location (area), for instance. But if the business is internet shop, then location does not matter at all. I know this is a too simple example, but people tend to skip this exercise when they simply use some axis to break the data.

The question you should ask would be “Is the axis really a key driver to the goal?”.

◾From my lecture note at university #11(Modeling with statistics)

We started inferential statistics today but only for this week and next week.

Many people hate the statistical testing staff as it is very confusing and sometimes they feel it not practical. Therefore, I focused only on the most practical one which is testing a gap between two averages using T-test.

It was really hard to explain the concept of the statistical testing (population and samples). At the start, I used an example of coin-tossing to test the 50%:50% chance.

Secondly, I reminded the students that the data we had been using was just sampled data not from the population, which you have to be aware of.

無題

無題2

I always try to explain a practical way to use the statistics, rather than just a theory. The step chart below was used as an introduction.

無題3

This was the hardest part to explain. I showed two approaches, critical value approach (t-value) and probability approach(p-value).

無題4

Finally, as always and as my policy, I had the students to solve some practical problems as follows:

無題5

At the next session, we will continue to learn more about the statistical testing.

From my lecture note at university #9(Modeling with statistics)

Today’s topic was (simple linear) regression analysis.

As usual, the main focus of my class is not to learn an academic theory but to be able to apply the tools for a practical business issue.

無題

After the review of the correlation analysis, I started talking about what is the “regression”.

無題2

Using Microsoft Excel, students solved a sample practice and other couple of the questions.

無題3

Through those exercises, they learned how to use the equation gained by the regression analysis. This is not only for future predictions but also for planning and optimization etc.

I spent lots of time for them to understand the slope of the equation means and how they can apply the concept to the problem-solving.

無題4

From my lecture note at university #5 (Modeling with statistics)

Today’s focus was CV(Coefficient of Variation).

It is used to measure “relative” variance and is indispensable when you compare the variance among the data sets with different averages.

無題

Impact of standard deviation of $2,000 is not the same for a large store with average monthly sales of $500,000 and a small shop with sverage monthly sales of $5,000.

In such case, you have to cancel out the difference of the average (data scale), by dividing the standard deviation by the average, which makes CV.

無題3

A question in the class was which index you would like to invest your money and why. (Nikkei 225, NY Dow, and JPY/USD FOREX)

I was expecting the students to look at the recent trend in terms of value and risk among the index.

You may compare the value trend with monthly average and the risk with the CV.

In conclusion, only FOREX had upward trend and with lowest risk (CV), compared with other two.

無題2

I hope they enjoyed the team discussions in the class.

See you all next week!