Forget ‘Big Data’ and instead, focus on sharp data.
I was recently asked by a fintech investor what EquipmentConnect was planning to do on the ‘Big Data’ front. Unfortunately she was dissapointed with my response that we would not have sufficient data for ‘Big Data’ analysis any time soon and that in any case, I didn’t think it was a particularly worthwhile focus. Let me share why.. But first thing first, what is ‘Big Data’?
Big Data.. as with all subjects of hype this term is somewhat ambiguous. It is defined as follows.. “Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions”.
Big Data often refers to data sets that were not traditionally considered statisically relevant or computationally efficient for modelling a specific hypothesis. So for example, bad weather patterns and road accidents were traditionally modelled together but now with improved analytics and cloud computing power, loosely related data sets such as for example, the health of local drivers or rates or rates of car theft or long distance driving patterns, are now just as likely to be simultatenously studied. ‘Big Data’ is here to stay.
‘Big Data’ has been hard to avoid as both startups and corporates are giddy with excitement and keen to latch on for some PR spin. As a general rule, when both hipsters and management consultants are ranting on the same subject, its probably time to bunker down – ‘hype attack’ is approaching.
In certain cases, yes, companies who capture huge amounts of data in their normal business day should, where legally and ethically appropriate, use that data to further their insight. There are startups whose birthright is delivering useful insight from Big Data and I don’t hold issue here. Houseprice.ai, Co-Founded by Giovani Miano, is one example. Their ML model is applied to a complex and valuable application (pricing real estate) and is backed by the necessary quantity of data to hold weight.
However for most fintech startups, regardless if your focus is lending or digital banking or insurtech, Big Data isn’t worth more than an intern’s spare day in January. It should be side stepped to focus on obtaining sharp data which is what really matters for us.
‘Sharp data’ isn’t a term you will be familiar with because I just made it up. Essentially what I mean is critical data that you already know fits a hypoythesis or model or solves a question that you already have in mind. It may be difficult to source this data prepared or it may need to be manually collected. The fundamental rule is simply that you find the sharpest data to measure what you know or to validate what you believe is logically correct. You finish your analysis by validating and measuring with the data unlike ‘Big Data’ where the data is your starting point.
More reasons why 99% of Fintech starts should ignore ‘Big Data’ and instead focus on sharp data:
- Most fintech startups just don’t have access to sufficient data sets. This is because their customer base is too low and because in many cases the cost of acquiring large data fields is prohibitive.
- Bad ‘Big Data’ analysis is really bad. Correlation is often independent of causation and linked to a 3rd variable. Sometimes data sets suffer from selection bias or self correlation. A sample may not be well selected or badly clustered. Yes there are mitigants but the problem with Big Data is that fundamentally you often start with the data hoping to find a result. Destination Unknown. Which is just asking for a wild goose chase! So in essence without good data scientists you are doomed which by the way, most early fintech startups just can’t afford!
- Even with improvements in cloud computing and transmission costs, Big Data analysis is expensive.
- Proponents of Big Data analysis tend to ignore the best data aggregation we naturally possess – the experience and consequential wisdom of professionals. Let them set the direction.
So ignore data then.. ? No! Just figure out what specific data makes sense to analyse and work on that.
In my view human experience, rationalisation and intuition are incredibly difficult to replicate with a computer and are key at pointing us in the right direction. By first setting the compass ourselves and then using ‘Sharp’ data to measure and quantify the process is more productive and end results are delivered more cost effectively.
If one considers for a moment, perhaps the most experienced user of sensitive data, the military, the case for focusing on Sharp Data instead of Big Data is clear. When the British Army developed its MAMBA surveillance station it purposefully focused on Sharp Data over Big Data. MAMBA is an acronym for Mobile Artillery Monitoring Battlefield Radar and is a radar system designed to filter through data and pinpoint rocket, mortar or artillery shell threats out to a distance of 5km. It purposefully ignores the white noise.
Some examples of sharp data that we are employing at EquipmentConnect:
- We are capturing depreciation rates, secondary market pricing and recovery cost information for different models of equipment and machinery. This allows us to better grade the strength of equipment as security behind the credit.
- We are digging out default rate information for various sub sectors that we serve and are considering that in the context of different stages of the economic cycle. How do firms that focus on road construction perform at the beginning of the economic slowdown? how about in a prolonged economic stagnation? How does that compare with construction companies focusing on house building.
- We are also gathering data to deepen our insight into how credit worthy a borrower is – How fast does the SME pay their suppliers? For certain cases our SME borrowers will give us access to live real time information on their bank account. This is combined with data filed at the companies house. We don’t just consider the data statically but look at the trend.
- To better monitor the equipment been monitoried we collect data on usage, location and condition of equipment and machinery. All of this in real time so funders have the pulse of the assets financed.
All of this information helps power our models but both the direction of the model and results are always initiated by human experience and insight. There is a whole breed of new tech startups who are unnecessarily obsessed with Big Data. My fifty pence – stop collecting rocks, sharpen your drilling bits and find some diamonds.