What are some data challenges you find particularly fascinating?
What are some data challenges you find particularly fascinating?
I think one of the most difficult and interesting data challenges is the ability to translate a question into a data question. I think that structuring questions to BE data questions is an art, and it sets the foundation for a reliable analyses (or a non-reliable analyses). For example, if you ask me a question: “Does featuring a topic affect the likelihood for someone to search and find and engage with a topic?” So this is gonna say “Given adding a Featured tag, will people who search for a specific topic and see it, are those people more likely to engage with it?” Notice how that question already has so many different dimensionalities, to break down and identify the behavior you’re impacting versus not impacting.
A bad translation of that would be “Are topics with more featured getting more views?” That’s a very bad translation of that data question. Because when you answer the second one, you have no idea what to do with that. There’s a million things that could drive that, and you have no idea what they are. So this doesn’t drive you to an actionable result of “Should you feature something or should you not?”
If you want to make an action out of the question that you asked, then the high level question, which is “Do posts that are featured get more views” doesn’t tell you if you should tag it or not. The first question helps you understand the impact of tagging something as featured. So that’s a big thing I would tell people to do, is to translate your questions into data questions accurately. This fascinates me.
The breakdown in actuality is ignoring everything the person says and trying to identify what action you’re trying to take. When you asked me a question of if featuring affects something, what you’re asking me is “Should I tag my posts as featured?” So that has to be broken down to “Given if I tag my posts, am I more likely to get them viewed, versus if they are not tagged.” It becomes a nuanced data question so I can tell you whether or not to tag your posts. Often, data analysts outsource that work to YOU, so that you, without seeing any information, guess whether it would work or not. And you end up trying things, guessing, and blindly working. And that’s not good! And this is going to be what we tackle next: at Narrator.ai, if we can make your analyses super-quick, then you can spend all your time talking to your customer to try to understand that question. And eventually, we’ll hopefully tackle this human-translation problem.
Yes. When I was in college, I worked in a robotics lab that was responsible for an autonomous car. And my professor was, I remember he taught me the basics of AI. I was designing robots at this time, and there was a specific moment, I remember, where he said “Pick up a cup.” And he asked me to try to explain to him programmatically “How do you pick up a cup.” And that’s a REALLY hard problem! To pick up a cup, a robot or a human being uses our stereo vision to create a measurement of how far a cup is. Then we take in all our muscles we have in our body, and solve a very complicated optimization problem to figure out how do we use the least energy to get close to the cup. As we move our hands toward the cup, our eyes recalibrate and give us feedback until we hold it. Once we grab the cup, our fingers detect the force on the cup, until we have enough force to pick it up. And then there’s a whole lot of computer vision also to figure out “What is the cup” and “What shape is the cup.”
And everything here has error! So that moment is when I began realizing that our brains can be modeled in math. And there’s so much beauty in the complexity of that model. And I was hooked ever since. So everything we do, if you break it down to its components, it’s us humans solving all these math problems in our heads magically, reasoning about all this uncertainty, to act!
Wouldn’t it be awesome if we could understand that better?
So the beauty of understanding that better inspired me to dive into studying robotics, so I studied robotics!
Great question. So we touched upon it in an earlier conversation. I’ll break it down to 3 common problems.
One is assuming the data is clean.
I mentioned that all data is dirty. You have to figure out how to ask that question with the data being dirty.
Two is assuming the question itself is correct.
Which is every question is trying to get at an action. And often, the question itself has to be rephrased multiple times until you can find the right question to guide an action.
And the third problem is time.
Because doing an analyses right is super-time consuming, due to modeling issues, structuring issues, query issues, and math issues, we get excited and we are not able to dive as deep. So analyses, people do phDs on these analyses, and they spend months trying to isolate and understand. In an environment when you have days or weeks to come up with an action, you end up making mistakes, unless you have some support or tooling to help accelerate that process.
And I’ll add one more: the last thing is the illusion of self-service. An untrained person is not able to look at a dashboard and figure out what they should do. I had to go through robotics training, so I could make decisions, to learn how to understand data. And often anyone thinks they can make a decision by looking at a dashboard. I don’t believe there’s ever been in a history of decisions one that someone found on their own through looking at a dashboard. You have lucky situations, but that dashboard doesn’t necessarily say whether it’s good or bad. So let experts do the analyses, and you consume them. Everyone wants to do self-serve analytics, but they may not be qualified to do that.
Haha! So I happen to know a little bit about this. At one point, I think it’s still the case, adding a show to your list makes NO difference. I’ll tell you why: what Netflix figured out was that when people added shows to their list, they’re being overly optimistic on what they’re gonna watch. People add documentaries and things, stuff that if we were to use that data in the recommendations, you’re LESS likely to like the recommendations. So Netflix, similar to Spotify, uses your behavior. At its core, what they do is, based on what you watch and in what order and how often you’re watching something and not finishing it and the whole idea of your behavior, it can group you into another category of people who are like you who’ve watched nearly the same movies, and what are other movies those people have binged, or that have made Netlfix more successful. There’s a cost component to the recommendations: Netlfix makes money off of every view, so they are more likely to recommend a Netflix original, versus a licensed show. They must make a trade off: what is the best show you are most likely to finish that will make Netflix the most money based on your behavior and people who have similar movie taste as you, their behavior as well.
There’s always a cost-price component when you deliver a marketplace. As you get more and more into the world of algorithms and data, there's a lot of fascination for me in how you make a decision.
If you give people what they SAY they want, they’ll be really mad at you. So you have to give them what they ACTUALLY want, that they didn’t communicate.
It’s the same as in data: everyone says “Oh, promos will help! Best prices will get sales!” And that’s sometimes. It depends on your business, your customer’s behavior that you want to change. The idea of best practice, or average, means sometimes it works, and sometimes it doesn’t. So you should figure it out for your business before, and then ask based on what drives that for your business. For Narrator.ai, that’s our bread and butter. There’s a reason why every analyses is run on your data. If you have a call center, and somebody calls you, should you pick up the phone? Or send them to voicemail? It seems obvious, but more often than not, the answer is surprising. It might be the case that calling increases conversion, but answering a phone call doesn’t matter, because the act of calling is showing interest. And whether you pick up and sign them up, or if they go online and sign themselves up, that might not affect it at all. There’s a lot of nuance. Your intuition is good, but we make decisions based on your past behavior.
What's the best way we can stay updated with you?
I’m not on social media, but if you submit your email on Narrator.ai, then we send you our biweekly newsletter updating you on what the company is doing and where we’re headed. And that’s the best way to keep in the loop. Or email me if you’re excited! My email is firstname.lastname@example.org and I have no problem talking to anyone if they’re excited about these topics. Eventually as we grow, I want to create this community for analysts who can share their work, and that data analysts can be similar to engineers on open-sourcing, and that’s where I hope to bring things to in the future.