If you try to look up information on how to become a better analyst, you’ll likely get overwhelmed by the amount of advice you’ll find. You’ll also get confused by how contradictory the advice can be. And if you’re not necessarily focused on becoming a Data Scientist, you’ll be swarmed with jargon and technical tool names that will just go over your head instantly. Data Science has become a popular topic, perhaps because claiming Artificial Intelligence will take over the world has become mainstream in our late-capitalist dystopia. There’s also a general myth that entry-level pay for a Data Scientist is incredibly high (hint: it isn’t) - and that attracts a lot of eyes.
But what if all you wanted was to figure out how to work with the data you have in front of you right now? What if you have no intention of understanding all those tools, and you really just have to throw a report together for your boss by the end of the week? Looking up data analysis information online, you’ll be misled to believe that being good at working with data is way beyond you. Let me tell you: I have been there. I, too, became an analyst by accident.
I didn’t choose to work with data from the get-go in my career. It was never a conscious decision. I often say that Data Visualisation chose me as a career instead. Apart from Star Trek,
Data wasn’t even a word in my vocabulary. It has always been a presence in everything I did, even when I did not know I was doing it. Even before I could explain that, there was a technique to what I was doing. The same goes for Data Analysis. This will sound cliché, but it is true: to be good at analysing data, you don’t need to learn fancy tools. You need a good dose of curiosity, good judgement and critical thinking.
The career that chose me
My career was far from a straight line. It’s as curly as my hair (very). At some point, I became an analyst. I am still unsure how it happened, but there I was, running reports. My first-ever report was more journalistic, which I believe prepared me for always valuing the impact of a big headline before the analysis. It was a Hurricanes Report. I worked at this oil company which had refineries in Central America. When it was hurricane season, everyone running a refinery plant had to know the risks of being hit by a severe weather event and take precautions for the safety of their staff and structure. So, I, a humble analyst, would collect information from multiple international weather agencies every day, compile the most relevant information about zones of low pressure around the Atlantic, and sometimes draw by hand on top of satellite pictures to show the probable paths, in simple terms. The key was to make highly technical information easily digestible for people during a potentially life-threatening situation. I heard of plant managers printing the report daily and hanging it on the information boards - that always made me feel like my work mattered, which is always nice.
From there, other roles came and went in an ever-increasing complexity curve. Suddenly I’d find myself having to create reports about sales orders, deliveries and making forecasts of stock. I’ll be honest: I went with the flow because I’m curious and love learning, but it was not easy. In these situations, I also felt overwhelmed and wondered if what I was doing was right. I didn’t have any primer on data preparation, cleaning, making a chart, or better presenting things to my bosses. I believe it was circa 2013 when I looked up one of these concepts online. After scouring dozens of unhelpful pages, I stumbled across an interesting book recommendation that immediately caught my eye: The Accidental Analyst: Show Your Data Who’s Boss by Eileen McDaniel, PhD and Stephen McDaniel.
There was a book title I could identify myself with. The synopsis said all the right words too:
“Although you didn't plan for a career as a data analyst, you're now in a position where you have to analyse data to be successful. Whether you've been working with data for a few years or are just getting started, you can learn how to analyse your data to find answers to real-world questions. Even if you're an expert, you'll find creative ideas on how to work with accidental analysts. Using illustrated examples, we'll walk you through a clear, step-by-step framework that we call The Seven C's of Data Analysis.”
Tools come and go, but good technique is permanent.
The book was written in 2011-2012. The tool landscape was wildly different back then. Analytics, to me, was Microsoft Excel. All the more complex tools were exclusively IT realm. The idea of data democratisation wasn’t all that popular, and governance teams would scoff at giving business people control over their data. It was about this time that my roles were becoming more and more that of maintaining “alternative databases”, as I liked to call them: Microsoft Access databases or Excel spreadsheets that were readily available for business users to grab information from when access to data warehouse systems was kept through the tight grips of IT bureaucracy.
I remember reading it back in the day and finding it extremely useful. So, a bit before writing this review, I decided to revisit it, and I’m glad to tell you it still holds up! There are a few things here and there that show its age, like the heads-up that reads:
“If you are working with large datasets, approximately 50,000 rows of data or more, they can be complex to handle and may take a while to download”.
Oh, how far we’ve come. The quantity of data has increased, and the tools have evolved. Still, the theory remains the same: we need to have a structured, systemic approach to tackle an analysis for it to be efficient and effective, and that’s what this book is about - not tools, but technique. The book's core - the process laid out to help us untangle our data and shape it to support our analysis - remains as relevant as ever.
This book is not for those familiar with or senior in data analysis. It is very much a beginner’s introduction. But it is well organised, with a clear layout and simple case examples to support the reader as they follow along. It contains invaluable basic information, though - the type we don’t usually find easily - on how to take practical steps to analyse data with whichever tools you have available in your toolkit. It is deliberately written to be accessible to a wider audience. I applaud that. Not everyone immediately understands what cleaning data means, why it was dirty, to begin with or what an appropriate analytical question looks like. The book makes a successful effort to lower the entry barriers to those lost and confused, wondering what to do next with our analysis.
The 7 C’s of Data Analysis
The Accidental Analyst: Show Your Data Who’s Boss is a book meant to help non-data people who found themselves in a data role to work with data - and which steps to take. It does that by structuring the analytical process from gathering requirements to communicating your findings in a 7-step framework. Each step is explained and illustrated and, where necessary, further broken down into more detailed sub-frameworks to clarify things.
Here’s my reference summary for you:
1. Choose your questions
Choosing your questions is a crucial step that will guide your entire analysis.
The authors split this section into two parts: asking questions when you’re doing the analysis for yourself or when you’re doing the analysis for others. Doing an analysis for yourself is usually much easier and straightforward, but often we’re required to answer questions from others as analysts (our colleagues, stakeholders, bosses, clients, etc.).
This step is split into 3 stages:
Relax: figure out the problem and who will act on the solution
Gather information: not the data yet, but information about the problem at hand and whatever has been tried before
Select the questions: when you create a big list of all questions you can think of and then narrow it down to the ones that are relevant for the audience you’re dealing with, the problem you’re analysing and the expected outcome of the potential answers.
2. Collect your data
Before any of the analysing parts starts, you must find and gather all the data you need.
It may involve internal or external sources
It may take quite some time to be able to collect all the data needed
Here the authors also split the step into 3 stages:
Identify your data - that is, make sure you know what type of data you are working with: is it categorical data? Is it numerical data? Is it structured? Unstructured?
Inventory your data - create a repository for all this data to live in - this could be as easy as saving everything in the same folder or on a tool’s workspace.
Integrate your data - here’s where they introduce a few more technical terms, referencing unions and joins: when you bring different data sources together and connect them to one another.
3. Check out your data
The third and fourth steps are somewhat intertwined - next, you’ll look at cleaning your data. This means that sometimes, upon checking your data, you have to clean it first and then check again when you may find more things to clean.
These two steps can be very time-consuming, depending on your data's state.
You can employ multiple techniques to check if your data makes sense or if it requires any further work. Some of the techniques mentioned and explained by the book include sorting, filtering, summarising values (doing sums, averages, counts, etc.), ranking (with percentiles, for example), checking for change over time and seasonality, and conducting reality checks to make sure the data reflects the phenomenon, process or object it is trying to record.
Keep notes of your findings - you’ll need them to either explain the data’s shortcomings as part of communicating your analysis or reference it later while cleaning up whatever inadequacies you find.
4. Clean up your data
The mandatory, unescapable step - without clean and appropriate data, there’s no analysis.
Data collected from the real world is messy.
You can either employ DYI methods (yourself, as quick fixes on your end) or identify bigger issues that must be addressed at the source, demanding a company-wide effort.
The cleanup process may include multiple steps, and the book goes into some detail about the most common ones, which include: identifying outliers, dealing with missing data, date problems, calculated field errors and issues, technical glitches like software limitations or confusions caused by different field formats, collection point issues like typos and misspellings.
The author also dedicates a section to one dreaded instance of data cleaning - the one that involves company red-tape issues - when you have data siloes and each department cuts or understands the data differently or when data is out of date, including inaccurate entries and even security issues.
5. Chart your analysis (YAY!)
Finally, some fun! Have you noticed how much work it takes until you can even start visualising your data to present it? It isn’t glamorous; it takes a lot of time and effort. There’s a lot that goes under the hood to make a viz happen!
Here the author goes extensively into multiple different types of charts, when they’re useful and what pitfalls to watch for when using them. This is an excellent entry-level summary of how to choose the adequate way to visualise information, considering your audience, the purpose of the analysis and the potential limitations of your choices.
6. Customise your analysis
This is a nice addition. A lot of the time, just charting information is not enough to get the point across. You have to tailor it to your audience’s needs.
You may be asked to adopt a general analysis into more specific bits for a particular meeting or an ad-hoc analysis
You can employ a range of techniques to customise your analysis. The book goes into a bit more detail about how to use summary values, sorting techniques, basic filter types and when each is more appropriate, dates and trends and how to use and highlight relationships between values and categories.
7. Communicate your results
The book describes this step as the moment when you put all the other 6 C’s together.
It discussed how to best layout your presentations (briefly), how to communicate your findings clearly in meetings or group discussions, how to highlight the path you took to get to the conclusions being communicated and how you can phrase it so that people can act on your findings.
The author also emphasises the importance that well-designed charts play in portraying polish and care in your analysis
He also mentions how storytelling elements may play an important role in grabbing your audience's attention to the most relevant findings and inspiring them to take action.
An effective presentation of your analysis must:
contain audience-appropriate results
answer the right questions,
includes applicable metrics,
contains attractive visuals,
contains actionable insights, recommendations and decisions
Includes attention-grabbing storylines
Should you read it?
If you’re not a data person, but you are currently feeling the weight of being demanded to deliver reports, analysis or slide decks with actionable insights even though you never really expected your career to take you there - then this may be of great help. It is a nice, accessible introduction to the non-glamorous side of diving into the data and fishing insights. It will surely act as a friendly helping hand to guide you in the first steps of your journey as an analyst.
Bear in mind, though: this is not a book for everyone working in analytics. If you are experienced, confident in your analytical skills and looking for more in-depth techniques to expand into something more complex, feel free to skip it. It is also not a Data Science book. It also won’t tell you how to become a Data Analyst. It will not discuss any specific tool, although most examples are created using Microsoft Excel.
Always check your local library first to see if any of the books I recommend are available. If they’re not, consider donating a copy!