Matthew Cowen
About Newsletter Categories Working Library Subscribe License Search Also on Micro.blog
  • Issue 18 : A step-by-step example of extracting value from unstructured data

    Using some basic statistical analysis, you can extract value fairly simply

    Bienvenue/Welcome to all the new subscribers. Thank you so much for signing up. Don’t forget you read all the archives here. Let me know if you have any comments.

    On to this Issue.

    After the last few issues giving you practical advice on your data’s worth, I thought I’d take a different direction and give you a step-by-step tutorial in basic data manipulation, with the aim to extract value from it. It’s a fairly simplistic example but on that shows you what you can do with a little patience and the techniques I ‘m showing here. Bust out your calculators, it’s going to get deep 😉Enjoy.


    Gaining insights from basic, unstructured data

    For both personal and business reasons, I’ve been journaling for a number of years now but have recently slowed down. I first got into it after reading numerous articles from people I respected in the business world and a general interest for well-built applications that achieve brilliantly their goal. The application that got me started and really piqued my interest, was an app called DayOne. It’s a lovingly designed application that really helps you get your thoughts written down. Being that this newsletter is not a review of the application, I won’t go into how the application works and its features but looking at this review will give a great overview.

    To that end, I’d noticed that recently I’d stopped journaling or hadn’t been as regular and rigorous as usual, and I wanted to know why. In the spirit of extracting value from raw unstructured data (call it Big Data if you will), I set out to analyse my journaling from an analytical point of view. As any analysis should be done, I did have in mind a goal. First there were several questions to answer, then an analysis to see if I could “nudge” (see Side Bar - Nudge Theory) myself in to better journaling or at least better regularity.

    A quick note, whilst I’m fully aware that this is not specifically “business data”, its serves as an illustration how some simple data can reveal more information than you may have, at first, thought about.

    Some of the questions I wanted answered were designed to help me get back on wagon, so to speak. Let’s have a look in detail at each question. 

    Essentially there are two big questions that required answering:

    • When and why have I stopped journaling?

    • What could I adjust to incentivise me into more regular journaling journaling?

    In this first part, I’m concentrating on the first question as it is the basis on which to answer the second question. To make things legible, I created a MindMap of the full question and sub-question list related to my first interrogation.

    20190529-DayOne Stats Questions.png

    The mind-mapped query tree

    Phew, that’s a lot of questions for something that on the face of it sounds very simple. In this Issue I will delve in to questions 1 through 4, and dedicate another issue to question 5 and 6, which in analytical terms require more effort, and I’d like this newsletter not to serve as a soporific! Before we dig in to the details, I developed a six-step plan to get me towards answering the question tree and possibly resolving the base issue; how could I tweak things to incite me to journal better and more often?

    DraggedImage.png

     My Six-step plan

    Getting Data

    Getting the data is generally a simple process and you should be able to find useful data without too much trouble. My case was no different. I had a journaling app, it had entries, 385 to be precise, so all I had to do was export it. Luckily for me, DayOne features an export function and offers several export formats in which to export. As I was about to manipulate data I chose .txt (Plain Text). It would have been easier in a .csv (Comma Separated Values) format, but it didn’t stop the process. The data was in fact rather oddly structured, partly to be human-readable I guess, but it required some work done to it to get it into a useable state. More on that later.

    Journal.png

    From a business perspective, many applications will offer an export function in to various formats, but if the application you’re working with doesn’t a quick exchange with support will likely provide the data you’re looking for.

    Choosing Analysis tools

    Now I had some raw data, albeit in an unusable format for the time being, I set about seeing what data analysis tools would be best suited. Without getting in to a big philosophical discussion, there were two obvious candidates; Microsoft Excel and Microsoft PowerBI. I’ve used both previously and appreciate both systems for different reasons, but in this case, I thought that the easy-to-use data manipulation tools built-in to PowerBI would suit my needs better. In some cases, further statistical analysis might require the use of R to better dig deeper in the data.

    Alternatives to PowerBI exist, here’s a list of some:

    Tableau (a free to use edition called Public is available too)

    Qlik

    Zoho Analytics

    DataDog

    Data Munging

    The next step was to do what is called “Data Munging”. The wikipedia definition is:

    ...the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

    This is also referred to as Data Wrangling and the two terms are interchangeable. Whilst I was not exactly transforming the raw data into another format, I was certainly cleaning the data up and rationalising what to keep. Below is a sample export from DayOne from a journal that collects daily quotes.

    DraggedImage_1.png

    The raw data exported from DayOne

    Looking at the data, we can see several problems that need to be fixed in order to exploit it. For example, there are no specific delimiters (Tab, Space, Comma, etc.). The date and time are mixed on the same line with “ at “ in the middle and appended to the end is the time zone. Two lines down we see the contents of the journal entry, in this case on one line, but a larger journal entry will span multiple lines. The munging involved cleaning this up.

    Using text find/replace tools it was a simple matter of getting rid of “ Date: “, using a replace string of nothing. Performing pretty much the same effort to remove the “ at “. Then finally removing the GMT. I did, however, take advantage to insert a comma + space after the date to create a quasi .csv file to import into Microsoft Excel for the next phase of the munging.

    DraggedImage_2.png

    The start of the clean-up process

    Importing and Rationalising Data

    Importing this in to Excel gives us a formatted file with the date and time separated by columns, with the text data and extra lines occupying the areas in between these dates and times. For the purposes of this exercise I chose to remove all the text data and leave the two columns. The end result, formatted as a table and sorted by date as presented in Excel:

    DraggedImage_3.png

    The cleaned-up and structured data, ready for use in PowerBI

    Now with this data cleaned up, it was time to start extracting value from the data in the chosen analysis tool, Microsoft PowerBI in this case. PowerBI is a simple and very powerful tool to help you import and work with not only one data source, but multiple simultaneously. Once the data is imported to PowerBI you can create relationships using simple drag and drop tools. For example if you were trying to understand the relationship between Ice Cream sales and the weather, the table with Sales data (including the dates and times) could be ‘joined’ by a relationship to the weather table — data that was imported from your local weather bureau, for example. Mapping sales again weather becomes a trivial matter from thereon. Let’s have a quick look at my simple one-table data and see if I can answer some of the above questions.

    Applying Visualisations and Analysis

    The first question I wanted to answer was the distribution of entries and if there was consistency in this. The following visualisation is pretty self-explanatory. Clearly, I started in earnest in 2017 and continued in 2018, with a big fall off in 2019. Yes, consistency for 2 years but a drop off recently, even when adjusting for the fact that 2019 is not yet half done. This one histogram answers questions 1 and 3 from the above MindMap (Is there consistency over the years? and Is there a noticeable pattern?).

    Entries by year.png

    Count of journal entries by year

    In answering question 2 (did I journal more in some months as opposed to others?), the following histograms were created:

    Distribution by Month.png

    Count of journal entries by month (all years)

    Distribution by DOW.png

    Count of journal entries by day (all years)

    Again, pretty self-evident, I’ve been more consistent in some months compared to others (October, September and December). There are other inferences I can ascertain too. For example, I’m not subject to a rush of New Year’s resolutions, deciding to do something in the New Year then quickly dropping out (Gym membership offers play on this fashion), Tuesdays seem to be the day I journal most often with Wednesday and Thursday coming close in frequency with Monday, Friday, Saturday and Sunday being days I miss more often, I peak in the middle of the week if you will, thus half-answering question 4 (Was I more likely to journal certain days of the week?).

    The other half of the answer came from the following histogram:

    Distribution by Day and Time.png

    Split of journal entries by day of the week and the hours at which the entries were made

    It’s a little difficult to read, as the data is a little spread out, but essentially we can concur that I write most entries in the morning, between the hours of 5am and 8am - to better understand look at the key in the top left of the image which show the title of day in blocks on 1 hour. Yes, I have written in the journal around 4am! So, I’m a morning person. I already felt this was the case, but to have it all but confirmed by data is interesting. Depending on the type of task at hand this kind of data can help you schedule when to attempt those tasks. For me, the morning is better for more reflective tasks, thinking and writing.

    Just to cut the data again into another monthly analysis, the following histogram was created:

    Distribution by Month and Time.png

    Split of journal entries by month and hours of the day of the entries

    By month, the one stand-out thing that needs investigating is, why do I have more entries in the afternoons on the months that I made more entries? This, admittedly simple and silly example, does in fact show how simple data can help us make better analysis by revealing things that wouldn’t necessarily be seen when using standard tools like tables in Excel. But… Beware of causation and correlation (see Side Bar - Causation versus Correlation).


    Conclusions

    The big takeaway I wanted to impart, is the possibility to reveal interesting and useful facts from even the simplest of data sets using modern tools. Digital Transformation calls for us to gather more data and this is one of the reasons why. This long issue only dips in to the possibilities, in fact questions 5 and 6 on the list would be the next logical steps to take. To do I would need to modify the table with columns for the number of entries and the word count of each entry. Measuring, for example, negative or positive type word counts could be used to gauge overall mood in the entries, with the obvious caveats. But you can see how a simple multi-column sheet can provide rich insights. As, I mentioned before, joining other datasets to this model could provide even better analysis (Weather, GPS, etc.) … possibilities!


    Side Bar - Nudge Theory

    Nudge Theory is, according to this wikipedia article:

    Nudge is a concept in behavioral science, political theory and behavioral economics which proposes positive reinforcement and indirect suggestions as ways to influence the behavior and decision making of groups or individuals. Nudging contrasts with other ways to achieve compliance, such as education, legislation or enforcement.

    Nudge Theory has its roots in cybernetics and clinical psychotherapy before being more formally and scientifically described sometime after 1995.


    Side Bar - Causation versus Correlation

    When looking at data it is imperative that you understand the differences between causation and correlation. It cannot be assumed that something “caused” the other thing because it’s correlated, even when the histogram shows it is. Further analysis should be done to try to disprove the cause, and when you can’t you can have a little more knowledge to believe the cause, but not 100%. Conflating the two is a classic error in data analysis and one to be avoided at all costs.

    To better understand the differences have a read of this article that picks apart, as one example, a very serious article in the Economist that suggested that Ice Cream consumption was related to IQ, i.e., the more frequently eaten the high the IQ… I’ll let you discover it for yourselves.


    Reading List

    Metadata is the biggest little problem plaguing the music industry

    A great article from The Verge, on the other side of data usage and its potential for harm to copyright holders.


    The Future is Digital Newsletter is intended for a single recipient, but I encourage you to forward it to people you feel may be interested in the subject matter. Thanks for being a supporter, have a great day.

    → 10:00 AM, Jun 7
  • Issue 17 : The Security and Privacy Wars in the Digital Age

    It’s all about software

    After a couple of practical issues, I thought I’d get back to analysis and strategy — kind of the initial reason for this newsletter 😄 Hope you enjoy it.

    Onwards with this issue.


    Security and Privacy Wars

    A long time ago, in Internet terms, we had a war of security between Unix and Windows. Windows, the new shiny up and coming OS was open by default and that openness enabled all sorts advances, but like anything it can be used for good and bad and that openness allowed a lot of bad things to happen, some that we are still dealing with decades later. Unix, on the other hand, was secure by default, because everything was turned off by default. You were required to open up things as you needed them and were discouraged from opening things unless it was absolutely necessary and could be justified. The evident reality of today, is that the Unix model won out and Windows is currently exponentially more secure as a result of being more cautious. I’m simplifying somewhat, but you get the picture.

    Given the fact that history often repeats itself, this battle seems to be taking place in the mobile era. The two protagonists being Apple and Google of course. Google’s approach is akin to Microsoft’s from its burgeoning years, with Apple’s being more like the Unix model — that’s no coincidence, in that macOS is in fact built upon the Unix kernel, Berkley Systems Development. Many tech nerds will cry foul of my assertion, rightly detailing that a kernel isn’t the whole thing. That's fine, it's to illustrate the point.

    Apple’s implementation of its developer APIs (Application Programming Interface) shows us that they’re much more concerned with securing and protecting as much as possible from the get-go. Something Google doesn’t seem to be either interested in, or capable of. That’s unfair, Google is perfectly capable of it. Perhaps not with the current implementation of Android, but the engineers at google could easily do so if required. So that leaves us with the only explanation that is sane, that Google has no interest in securing Android like iOS. And that’s fine, it’s their choice. However, I thought it would be an interesting exercise to compare and contrast some areas of Google and iOS to give context to the impending privacy wars that are just gearing up, as you’ll see later.

    Security: 2-factor authentication

    During a WWDC State of the Union address — the keynote for developers, aimed at developers, rather than the morning’s keynote that is mostly for users and the press — Apple spoke about a number of enhancements for the future of security and privacy. The statistic that around 2 thirds of all Apple ID users have enabled and utilised 2-factor authentication versus around 10% on other platforms was somewhat surprising. Why do users on other platforms not set it up? — because it exists and has done for a while.

    Well, it’s a relatively easy question to answer in fact. It’s simply too difficult for most people to understand and setup. Apple makes it a little easier, but again, here many users still don’t use it. The Google’s OS is much more widely used in absolute numbers and is used in places where the security necessity is either less-required or less understood. With many more multiples of users, Google has a hard time getting people on board, but due to its open doors policy in the beginning, getting the updates required to help secure its users is additionally extremely difficult.

    Security: OS updates

    Google’s OS is fractured to a staggering degree. Just over 10% of Android users are using the latest version, Pie. And even if you wanted to update, the sheer numbers of Android phones in circulation that are below the minimum threshold for updating to more recent and hence more secure versions is a testament to the sheer disregard that Google, the manufacturers and telecommunications companies have for you as a customer.

    Apples adoption statistics are unsurprising because Apple does a much better job of updating users’ OSes and supporting legacy devices compared to Google and its OEM hardware partners. In fact, Apple claims that 83% of devices sold in the last four years are running the latest version of iOS, iOS 12. That’s a staggering achievement and doesn’t end there. Not only that, but 80% of ALL devices are running the same version, the last 20% are presumably a mix of those not updated yet and those that cannot. In absolute numbers that’s around 28 million phones around the world.

    Android smartphones make up around 85% market share compared to Apple’s 15%, so if we take Apple’s figure of around 900 million iPhones in circulation, that would make a total of over 4 billion Android phones worldwide, all types. If only 10% are on the latest release (40 million), that’s a whole lot of phones that have potentially dangerous security flaws.

    Privacy: Data protection (on device encryption)

    Although Android supported full disk encryption since Android Gingerbread (version 2.3) released in 2010, however, the implementation leaves much to be desired.

    Firstly, it’s an opt-in thing, apart from some of the latest most high-end devices on the market running Lollipop (5.x onwards). Once again, average users are unlikely to take advantage of this somewhat essential feature for your security. Apple’s iPhone OS 3 introduced full disk encryption in 2009, yes, the era of the iPhone 3GS! That it was automatically applied with literally no user interaction meant users were more secure by default.

    Security: Password Management

    iOS includes a rudimentary built-in password management tool called Keychain that allows the storage of passwords, the recall when needed and a syncing model that surfaces passwords saved to it on all of your Apple devices, save it once and its available everywhere. More recent versions suggest strong passwords when presented with password creation or change dialogs.

    Apple hasn’t stopped there, it’s adding many features to macOS too, its less wall-gardened OS. Of note, is a new Notarial Service for apps that are distributed outside of the App Store. It allows Apple to lock-down malicious code before its even distributed. In basic terms, an app that uses the service connects to Apple servers on startup to check it still has a valid pass. If that pass has expired — which could be for a number of reasons — the app is prevented from starting, thus, in theory, protecting your computer and all it holds.

    As we get further in to the digital age, it’s privacy that matters##

    In May of this year, both Facebook and Google held conferences — Facebook F8 and Google I/O — both Google and Facebook made announcements that hinted at a new direction for some of their products, stating that they’re turning to privacy by default.

    Google’s announcement as told in Wired:

    Google placed a big emphasis on user privacy in this keynote. It’ll now be easier for users to access their Google security settings from their smartphone and from there quickly delete their web history and app activity. The firm will also process more user data on the device without uploading it to its own servers.

    Google Maps is also getting its own minor privacy overhaul, with a new incognito mode that won’t remember search results or dropped pins. This brings Maps in line with Chrome – which has had incognito mode for a decade – and YouTube.

    On the security front, Google is making it easier for Android phone owners to verify their identity through two factor authentication. For certain Android smartphones, the phone itself will act as a security key, allowing users to verify their identity with a single press, doing away with the need to receive and input a code.

    Facebook did the same, as recounted by BuzzFeed:

    Facebook CEO Mark Zuckerberg kicked off his keynote with a privacy-focused speech. "Privacy gives us the freedom to be ourselves ... so it's no surprise that's the fastest way we're communicating online is small groups," Zuckerberg said. "That's why I believe the future is private." Following the comments, the CEO said, jokingly, "I get that a lot of people aren't sure that we're serious about this. I know that we don't have the strongest reputation on privacy right now."

    Let’s be clear on one thing, they're still collecting data in every way they can and matching that data up to potential advertisers (including themselves) in order to make money. Their fundamental BM has not changed, they’ve just found new ways of executing on it. What they have done however, is try to change the conversation towards privacy, allegedly in a thinly veiled attempt to divert attention or subvert impending law suits against them from the EU and the DOJ (US).

    Listening to episode 244 of the Critical Path podcast by Horace Dediu, I had a similar realisation put forward by Horace; Personal data should be treated like a controlled substance. 

    In Digital Transformation you're going to be handling data all the time, and as a result you’re going to need to treat it like that controlled substance as described by Horace. Chemicals and arms can be legitimately owned and used but they are controlled (in most civilised countries anyway) to a point that harm done from them is limited. The GDPR in this, its first guise, tries to develop a framework that controls personal data in much the same way; you need to explicitly ask for permission BEFORE collecting it, you need to clearly state the purpose you're asking for it and lastly need to clearly and explicitly detail how you are going to use it.

    Hardware platforms and their apparent stance on openness vs closedness, security and privacy, are starting to matter less and less, and it is in the software layer that lies the opportunity and risk. Software is eating the world. The coming European Copyright Directive, recently approved but not yet in law, will expose this to an even greater degree.


     Reading List

    05onfire1_xp-articleLarge-v2.jpg

    GDPR After One Year: Costs and Unintended Consequences - Truth on the Market

    Here’s a different angle on the usefulness of GDPR, worth the read.

    GDPR can be thought of as a privacy “bill of rights.” Many of these new rights have come with unintended consequences. If your account gets hacked, the hacker can use the right of access to get all of your data. The right to be forgotten is in conflict with the public’s right to know a bad actor’s history (and many of them are using the right to memory hole their misdeeds). The right to data portability creates another attack vector for hackers to exploit. And the right to opt-out of data collection creates a free-rider problem where users who opt-in subsidize the privacy of those who opt-out.

    cq5dam.web.1440.660.jpeg

    Source: Deloitte

    Bringing digital to the boardroom

    DIGITAL transformation is not just about adopting new technologies. Its significance, especially in the business world, extends to how technology can be used to create—and sustain—a competitive advantage.

    As such, digital transformation, along with the potential for disruption, is high on the agenda for executives at many financial institutions, as well as their boards of directors.

    Not just financial institutions I’d say, and I’d go even further and suggest that most Caribbean businesses would do well to understand this and heed the advice given in this article.


    The Future is Digital Newsletter is intended for a single recipient, but I encourage you to forward it to people you feel may be interested in the subject matter.

    Thanks for being a supporter, I wish you an excellent day.

    → 10:00 AM, May 31
  • Issue 16 : Part 5 - Practical steps towards your Digital Transformation journey

    Data collection and turning that data into an asset

    Good day, all. This issue is a continuation from Issue 15, I encourage to read it before reading this issue, it’ll help your understanding of this one as it continues on the theme of data and its importance in Digital Transformation.

    I recently recorded another podcast with Kadia Francis, the Digital Jamaican. You should check out her blog and podcast series, really good stuff. I’ll let you know when it’s out, but we talked about some of the topics in the last few issues of this newsletter. I had a great time recording and it’s something I’d like to do more of in the future. I’m open for propositions. Stay tuned.

    On to the issue.


    Source : dailyworth.com

    Turning data in to a benefit

    In the last issue, I noted that data is the new oil in the digital economy, and showed a few simple strategies for finding and analysing data, where data can emanate from and finally a couple of simple tools to use in linking data sets between business applications to break down the silos of data you have in your applications. What I didn’t get in to, is the finer details of those data and what is important when you are collecting and analysing them.

    Big Data, you’ve all heard the term, with many definitions as to what exactly Big Data is. I don’t know actually know if there is the “definitive” definition or not, but it certainly doesn’t mean large quantities. This is a myth, and one that persists. Sure, Big Data can be huge in size, and there are data sets being used in everyday activities like weather pattern prediction, that are absolutely gigantic in size, but it’s more pertinent definition hinges on the fact that it is unstructured.

    Remember the definition of unstructured data from the last issue:

    Unstructured data represents any data that does not have a recognizable structure. It is unorganized and raw and can be non-textual or textual. Unstructured data also may be identified as loosely structured data, wherein the data sources include a structure, but not all data in a data set follow the same structure.

    Additionally, Big Data usage and cost has become perfectly aligned with cloud infrastructure, particularly in the database space. Previously companies would have had to invest massively in database infrastructure and data processing infrastructure, mostly locally stored with all the overheads that that requires, cooling, electricity, backup infrastructure, fire safety and large personnel costs. These were often huge investments leaving them available to only the (rich) few. In today’s paradigm, cloud database infrastructure is not only cheap, but offers the possibility to any organisation to use all but the most powerful compute systems in existence — the most powerful being reserved for education, research and military purposes mostly. And remember cloud infrastructure has a plasticity that on-premise infrastructure doesn’t — need 10 processors now but at peak volume, 100 are needed. No problem in the cloud world. Not only that, but analytics add-ons for those cloud databases are available directly from the database supplier and increasingly from other suppliers that have USPs that go above and beyond those of the database vendor. We’ll take a look at some of these in future issues. The thing to remember nowadays is that big data doesn’t necessarily have to come with big costs. And, in some cases you don’t even have to pay a penny to store it. I highlighted this in the last issue but didn’t explicitly mention it in this context; but places like data.gov (US), Facebook, Twitter and many others, do all the back-office stuff for you.

    Other data that is unstructured and being generated at a dizzying rate is data the records or tags things with the longitude and latitude, location data. All modern smartphones record location data and many cameras do similarly when taking photos. Clearly the use cases for location data are endless and are sometimes used in surprising fashions, and not necessarily nefariously. If we ourselves recorded our location on an ongoing basis, it would probably provide insights into our own behaviours that could help us with issues as diverse as health — imagine if you discovered you traversed daily a particularly polluted zone and the you have been doing this for years, correlating that with your asthma incidents — to things as mundane as optimising journeys for the best fuel usage.

    I briefly mentioned in the last issue that the type of sensors that have been used since the early 2000s are providing “…a near constant avalanche of data…”, in what has now become known as the Internet of Things, or IoT. IoT is no being deployed right throughout the entire value chain in highly digital businesses, and this influx of data is providing insights that were previously inconceivable. At a large Microsoft Conference, called Inspire, I remember a couple of years ago a discussion and demonstration of the power of data collection for ThyssenKrupp:

    ThyssenKrupp Elevator wanted to gain a competitive edge by focusing on what matters most to their customers: reliability. Drawing on the potential of the Internet of Things (IoT) and Microsoft technologies by connecting their elevators to the cloud, gathering data from their sensors and systems, and transforming that data into valuable business intelligence, ThyssenKrupp is vastly improving operations—and offering something their competitors do not: predictive and even preemptive maintenance.

    Read that statement again, …something their competitors do not (currently) have. Predictive and preemptive maintenance. In other words, allowing ThyssenKrupp to schedule preventive maintenance and even predict lift failures before they happened. If you can get past the obvious advert for Microsoft, this quick video gives you a great idea of what I’m describing about data as an asset. Again, the possibilities are endless for data collection and data analysis. Microsoft has even gone further, and christened the terms Intelligent Cloud and Intelligent Edge and defined them as:

    The intelligent cloud is ubiquitous computing, enabled by the public cloud and artificial intelligence (AI) technology, for every type of intelligent application and system you can envision.

    The intelligent edge is a continually expanding set of connected systems and devices that gather and analyze data—close to your users, the data, or both. Users get real-time insights and experiences, delivered by highly responsive and contextually aware apps.

    Screenshot 2019-05-22 at 20.44.41.png

    Source: microsoft.com

    But all this data collection and data storage isn’t and couldn’t be anything useful if it isn’t shaped and presented to provide insights, or what has become known as over the last 20 years, Business Intelligence (BI). I alluded to this earlier in this issue, but advanced BI tools and a few simple skills in Data Science have become a hot topic for most businesses in their Digital Transformation journey. Take a look, for example, how many jobs and how well-paid Data Scientist roles are currently. There is a real scarcity of good data analysts in business roles.

    Rather fortuitously for me, this last Wednesday, the NY Times published a long but fascinating article called How Data (and Some Breathtaking Soccer) Brought Liverpool to the Cusp of Glory about the use and subsequent valorisation of data to help Liverpool FC out of the blues of the last 10 years or so. Transforming them into one of the top teams to beat in Europe, with some even saying that this might be the start of a new era for Liverpool FC, like its previous run of form from 1975 to 1990.

    For four years, from 2008 to 2012, Graham advised Tottenham. The club was run by a series of managers who had little interest in his suggestions, which would have been true of nearly all the soccer managers at that time. Then Fenway bought Liverpool and began implementing its culture. That included hiring Graham to build a version of its baseball team’s research department. The reaction, almost uniformly, was scorn. “ ‘Laptop guys,’ ‘Don’t know the game’ — you’d hear that until just a few months ago,” says Barry Hunter, who runs Liverpool’s scouting department. “The ‘Moneyball’ thing was thrown at us a lot.”

    Graham hardly noticed. He was immersed in his search for inefficiencies — finding players, some hidden in plain sight, who were undervalued. One afternoon last winter, he pulled up some charts on his laptop and projected them on a screen. The charts contained statistics such as total goals, goals scored per minute and chances created, along with expected goals. I was surprised to see Graham working with such statistics, which he had described to me as simplistic. But he was making a point. “Sometimes you don’t have to look much further than that,” he said.

    And that’s the point. Some simple statistics may help you better understand a taxing issue and hence aid in your resolution of the problem. Thoroughly recommended.


    The risks of data collection

    I think it remis of me if I didn’t at least talk briefly about the dangers of data collection. And I’m sure you don’t need me to tell you that holding a lot of data, and more specifically, personally identifiable data, is a risk that you need to assess in your business. In fact, this is the real reason for the General Data Protection Regulation ’s (GDPR) being.

    If we look at the intention of the GDPR, it’s that data should be treated like a controlled substance. A controlled substance being something like weapons, drugs and so forth. Clearly the general public shouldn’t be purchasing, storing, using and selling on these types of substances and products. The GDPR makes us look at personally identifiable data in the same way; we shouldn’t need to collect, store, process and/or sell-on these data. Unless…

    Where the similarity continues with controlled substances, arms and drugs can actually be legally bought and sold, but under (in most civilised countries at least) strict controls. GDPR, like the controller substance trades, ensure that we justify why we need the data, for what purpose it’s going to be used in explicit terms and a promise that it won’t be used for other purposes. Additionally, GDPR prevents the resale of that data, again, unless there is specific consent from the affected users.

    When you develop your data collection strategies, you need to think quite carefully about the data you need and if it is classified as personally identifiable data and plan accordingly, from consent forms, operations structures to ensure that it isn’t used outside the initial scope and that it is secured accordingly and that you can provide proof of its ultimate destruction. Think in terms of the entire lifecycle of the data; creation, collection, processing, transformation, encryption, depersonalisation, restitution, backup and destruction.

    Remember, according to GDPR you are responsible for proving this. I’d like to go further in to this subject in the future, especially since we’ve had numerous examples of data care failures from the likes of Facebook and Cambridge Analytica, and I’m currently researching the subject. Soon.


    Reading List

    Source: StockSnap (Pixabay)/ict-pulse.com

    ICTP 056: Building Caribbean-relevant software applications, with the team from Rovika

    A talk with Dennison Daley and Manish Valechha from Rovika, a software house based in Montserrat. They have developed apps for Montserrat and BVI governments and have potential for wider-ranging projects throughout the Caribbean.

    Source: E-Estonia

    20th William G. Demas Memorial Lecture to focus on digitisation

    If you weren’t aware, Estonia has been on a national Digital Transformation journey for a number of years now. Estonia even offers full digital citizenship , or e-Residency. Estonia was the first in the world to achieve this. Starting 1997, Estonia implemented their e-Governance strategy and have continually innovated and developed new digital services for their residents, both physical and e-Citizens. They have fully embraced technologies like Blockchain — in 2008 in fact! — and have ambitious plans for future services like intelligent transportation.

    In this, the 20th William G. Demas Memorial Lecture, Calum Cameron of Estonia is to speak about “Transforming to a Digital Society: Principles and Challenges”. Unable to be there in person, I’m hoping I can get a feed or at least a video of the speech. If you have any information to help me, let me know please. Thank you.

    Source: epthinktank.eu

    Europe, a unique partner for Caribbean

    I alluded to some of this in the podcast with Kadia, but this essay gives a good overview of how Europe can help the Caribbean and why it’s important for the Caribbean to work together facing increasing competition from around the world. Definitely worth reading even if it’s a little on the propaganda side from a High Representative of the Union for Foreign and Security Policy.


    The Future is Digital Newsletter is intended for a single recipient, but I encourage you to forward it to people you feel may be interested in the subject matter.

    Thanks for being a supporter, have a great day.

    → 10:00 AM, May 24
  • Issue 15 : Part 4 - Practical steps towards your Digital Transformation journey

    Data, data, data

    Good day everyone. As promised, back to a practical issue, this time less generalist and more centred on purely digital initiatives. Hope you enjoy it, let me know, don’t be shy.

    On to this issue.


    Data is the new oil

    If nothing else, Digital Transformation is inseparable from data, not just any data, but good data and data that can be exploited in the short term as well as the long term. 

    Data is the new oil in the digital economy

    Although there is some dispute in the reality of the phrase, with some reasoning that it’s not, the phrase holds true for many businesses looking towards digital transformation. Businesses produce data all the time, but it is mostly lost, stored but not accessed or downright under-exploited. We are data-rich but analysis-poor, and it’s to our detriment. On the other extreme, data is not God, so some restraint and sensible treatment is necessary. I’m getting ahead of myself, so let us look at data and how we can, firstly, identify data to capture then develop data capture strategies.

    Where is the data coming from?

    Before digital systems were the norm, captured data was organised and designed in a systematic manner, with businesses specifically thinking about what it was they wanted to retain. Investments in the tools necessary to measure data were onerous and notoriously unreliable. 

    First Building Management Systems (BMS) developed by the likes of Sauter and Johnson Controls amongst others, were simple systems based on Programmable Logic Controllers or PLCs. A far-off thermostat would send (infrequently often) a signal to the central until that would apply a simple logic then actuate the command on another basic module to apply the remedial action. High temperature in the main meeting room due to human activity would trip the thermostat to send a signal to the central until which would in turn look at the PLC code to send the start fans and send cool air to the room (I’ve highly simplified it here, but you get the picture):

    IF HIGH TEMP THEN OPEN COLDWATER VALVE AND START COOLING FANS

    In more operations roles, data would be extracted from systems such as the Payroll and in some cases specific market surveys were published to generate data, but nothing like the amount of data we produce in the modern world. In fact, data is being produced not only in vastly larger quantities, but that data is being generated automatically whether we want it or not.

    The applications, devices and sensors that have been deployed since the early 2000s, there is a near constant avalanche of data being produced on anything from the length of time you slept to detailed nuances of the movement of people in a particular corridor of a shopping mall. Social Media additionally generates tons and tons of information about us and our surroundings. The whole supply chain from development to the final death of a product is sourcing data all the time.

    The indications are that this rate of increase is not slowing but increasing. What was benign data may be turned in to extremely valuable data in a short period of time. The Facebooks and Googles of the world know this and are making it easier to capture and analyse data internally and making much of their tools available for the general public.

    The difference between then and now

    The data projects of the past were designed and implemented to strict rules and guidelines and the data produced was subsequently structured and used to fulfil basic objectives. To give you an idea of exactly what structured data is let’s first look at its definition according to techopedia.com:

    Data conform to fixed fields. That means those utilizing data can anticipate having fixed-length pieces of information and consistent models in order to process that information.

    We can see that structured data has specific attributes that let us or our computers know beforehand how that data will appear, allowing us to apply processing easily. Examples of structured data are elements such as the information on your Passport; Name, Age, Place of Birth, Expiry Date and other basic simple fixed data types.

    Modern data, on the other hand, is unstructured and is defined by techopedia.com as:

    Unstructured data represents any data that does not have a recognizable structure. It is unorganized and raw and can be non-textual or textual. Unstructured data also may be identified as loosely structured data, wherein the data sources include a structure, but not all data in a data set follow the same structure.

    Examples of unstructured include the data generated by corporate Customer Relationship Management (CRM) systems or the previously mentioned Social Media applications like LinkedIn, Twitter and Facebook. Although the data generated is classified (name, amount spent, next call-back date etc) the data is loosely tagged rather than forced in to specific lines and columns in a spreadsheet. In fact many modern systems deliberately unstructured data to push the boundaries of what can be extrapolated with these datasets, sometimes producing surprising results.


    Data capture strategies

    Let’s first look at some of the places you can find data relevant to your business. There are many sources and clearly, I can’t list them all and quite frankly I don’t know of all of them either. But there are some basic examples to help you get started on your own data research.

    Google Trends

    Google Trends is probably one of the most well-known. It’s a free service to all comers, and you don’t even need a Google account to benefit from its information. Simple to use, the interface is about as Google as you can get. Type a word or two and Google Trends will tell you how they are trending in Internet search over the last several years. Google Trends ranks its data on a scale from 0 to 100. 100 being peak interest on the subject and 0 either meaning 0 interest or no data was available. In the following example, I searched Digital Transformation and set the time span to “2004 - present”. This was the result:

    Screenshot 2019-05-15 at 14.30.52.png

    Source : Google Trends

    Digital Transformation is at peak interest currently, which is logical, but didn’t start becoming of interest until around 2014. That in and of itself is an interesting data point, Digital Transformation is a fairly new phenomenon. Additionally, you can pit search term against search term to gain insights in their comparative interest. Useful to judge interest in one type of product versus another. Being that this is not a lesson in how to use Google Trends, I won’t go into too much detail, but you should know that the data can be displayed by region and you are automatically suggested related queries, again giving you an idea of the things people are searching for.

    LinkedIn

    Being the professional network with over 500 million users, LinkedIn holds an enormous amount of data on companies and employees. Whist there is no direct simple to use interface aside from LinkedIn itself, tools are available to extract data directly from LinkedIn for manipulation in other systems. One such example is LinkedIn Connection Analytics Dashboard from Vishal Pawar, Chief Architect at Aptude. This uses a PowerBI template to analyse the data exported from LinkedIn. There’s plenty more information from the link.

    DraggedImage.png

    Source : https://www.linkedin.com/pulse/brand-new-perform-free-linkedin-connection-analytics-vishal-pawar/

    Don’t forget good old data exports from legacy applications. These data can be useful when integrated in an analytics application like PowerBI or Google Analytics, even some basic statistical analysis in tools like Microsoft Excel can provide useful information.


    Cloud Applications or Software as a Service (SaaS)

    The value of cloud applications goes beyond the initial promises of yesterday. Most SaaS applications like Office 365 and Google Apps were sold on the promise that they entailed no upfront costs and offloaded the day-to-day operations management freeing the client to concentrate on the real value-added aspects of the business. Whilst this is somewhat true, it is incredibly short-sighted. The ‘real’ value of cloud-based apps is their data generation that can be collected, integrated, joined and exploited by all sorts of systems providing value that is greater than the sum of its parts.

    Take for example a small business that has fairly standard needs for operations software, accounting, CRM and project planning. In the past the business would purchase a dedicated accounting application like Sage. A dedicated CRM, Salesforce for example and probably use Microsoft Project. These applications created silos that largely prevented the use of information between the systems. In actual fact, a very well remunerated job in the pas was that of a data integrator who had skills to “join” systems in a very basic manner to try to extract value of multiple systems rather than individual ones.

    Today, my advice for a small business would be to go completely SaaS but choose the systems that offer data integration APIs or connections. Modern SaaS applications often allow linkage to platforms such as Office 365, CRM software like Hubspot, through mailing list management software (did anyone say MailChimp?) right through to accounting and billing. A client created in the CRM as a prospect should automatically appear in the project management system and be created as a client in the accounting software regardless if it has purchased anything or not. The value generated understanding through the different systems when, why, how, how much etc has enormous potential.

    Linking these data sources usually happens through one of two paths, either directly by the applications interface where data connections are exposed directly, i.e., from one application you log in to the other application and grant access to data. The second path is a little more long-winded, but often achieves the same thing and can often provide even more capabilities. There are three well-known systems tax fit this description. Let’s have a quick look at each one.

    Microsoft Flow

    For a business with significant investment in Microsoft and particularly Office 365 and its components, the obvious choice is Microsoft’s own workflow management tool, Flow. It allows you to turn multi-step repetitive processes in to true workflows rich in data. Not limited to only Microsoft applications, Flow allows links to other SaaS platforms like MailChimp, Facebook, Google and Slack. Microsoft Flow is free to use with options for Premium connections and workflows.

    IFTTT

    If This Then That is considered the granddaddy of the initial wave of workflow applications and is still one of the most popular. Its real value is in the consumer SaaS space where it allows, like Flow, the linking together of hundreds of different platforms controlling lights to sending emails when there’s a tweet with a particular keyword. IFTTT is free to use.

    Zapier

    Zapier is probably the most sophisticated and consequently the most complicated to use of the three, but don’t let that discourage you. It’s the most accomplished and reliable workflow application I’ve used. I personally use it to link data between my CRM, Time tracking system, accounting software and various social networks.

    As you can see, data is all around us and reasonably easily exploitable with a little help from tools like those I’ve highlighted here. If you want some help with your own needs, get in touch I’d be happy to help.

    In the next part I’ll look at more data collection and introduce notions of simple data usage with powerful analytical tools. I look forward to publishing the next issue.


    Recommended Reading

    MeasureWhatMatters.jpg

    Measure what Matters, whilst not strictly about data collection and analysis, is a good book to get you thinking about the right type of data collection strategies with achievable outcomes (Key Results). It has a forward from Larry Page, one of the cofounders of Google who starts off by saying:

    I wish I had this book nineteen years ago, when we founded Google.


    Reading List

    Will Artificial Intelligence Enhance or Hack Humanity? - Wired

    Screenshot 2019-05-15 at 16.22.46.png

    A really interesting interview with Yuval Noah Harari, Fei-Fei Li by Nicholas Thompson of Wired.


    The Future is Digital Newsletter is intended for a single recipient, but I encourage you to forward it to people you feel may be interested. The more the merrier.

    Thanks for being a supporter, have a great day.

    → 10:00 AM, May 17
  • Issue 14 : Follow-up, Management in the Digital Age

    And Digicel's missed opportunity for Digital Transformation

    Good day everyone. Today’s Issue is a follow-up from Issue 12, Management in the Digital Age. Let’s get straight to it.

    Don’t forget to listen to the podcast I was a guest on ;) 

    Kadia Francis (@digitaljamaican) asked me a question on twitter, I thought was very interesting:

    @TFIDNewsletter Absolutely loved this. In terms of obsoletion isn't the same true for companies reluctant to incorporate technology into their business operations? We are in complete agreement that the future is digital, I say it all the time so stubbornness in that regard would be fatal.


    Operations and their role in the digital future 

    The question was in response to Issue 12 where I talked about management obesity, in the sense that, not only has the definition of management changed, but the quantity of managers necessary to operate a business these days had been constantly on the rise. This paradigm shift has produced a fundamental change in the structure of businesses:

    This change in manufacturing layer has brought about a change in company structure. Low-skilled low-pay jobs are being replaced by highly skilled jobs paying higher salaries. But it’s not just salaries, responsibilities have also increased. As a line manager for a spindle maker, the responsibility was limited to the line and the workers occupying it. In the new plants, managers are responsible for much more and decisions have more implications for both productivity and efficiency, with knock-on effects that may cause the loss of hundreds of thousands of dollars. And, being that there are no low-skilled workers anymore, everyone has essentially become a manager. This is what I termed Management Obesity, a factory with only managers and no workers.

    We’ve gone from a pyramid structure of management and workers to one with a portlier girth.

    IMG_C01202E2383D-1.jpeg

    The old management structure versus today’s

    It’s not limited to only manufacturing either. In much of the service industry, offices have become more and more staffed by managers of things and processes and not necessarily people. Many of today’s tasks in modern offices didn’t even exist 5 years ago, Marketing, Social Media Manager, Community Manager, Technical Officers, Environmental Departments, I could go on.

    This issue was largely targeted at the production side of a business, but as Kadia rightly points out, operations are also ripe for digital transformation. Let’s look at the various parts that make up an organisation and I’ll attempt to show how digital transformation is or will affect them.

    The best approximation of the various elements of a business can be found in the value chain. I introduced and talked about it in Issue 11:

    Investopedia defines the Value Chain as:

    A value chain is a business model that describes the full range of activities needed to create a product or service. For companies that produce goods, a value chain comprises the steps that involve bringing a product from conception to distribution, and everything in between—such as procuring raw materials, manufacturing functions, and marketing activities.

    A company conducts a value-chain analysis by evaluating the detailed procedures involved in each step of its business. The purpose of value-chain analyses is to increase production efficiency so that a company may deliver maximum value for the least possible cost.

    Essentially, Michael Porter’s Value Chain model allows you to break down your business into sections and attribute to each section their relative value and contribution to the end result, the value add of your products and services. As an example, someone who sells fruits and vegetables may have a fairly simple value chain where elements like logistics and marketing are the principle value creators of the business. A product like Apple iPhone, I think we can all see, is a very different proposition, with and extremely complex value chain.

    The value chain is represented as a graphic where all the distinct parts contribute to the right-hand side, which is the profit. This graphic shows the most common representation of the value chain:

    16f540ae-19c6-41a7-a19e-44f706f7006c_1000x528.png

    Denis Fadeev CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)

    There are two distinct sections; Support Activities, Primary Activities that are in play in any business that generate value helping generate revenue, and hence margin.

    Looking at the support activities like the firm infrastructure, which include HR, Accounting, Finance, Legal, PR etc., it is easy to see in areas such as HR, digital transformation has not only started, but has started to gain some real traction. An example is the use of the LinkedIn platform. Integrating it in to your business not only as a corporate rolodex, but as an essential tool for recruitment provides benefits that are over and above that of traditional recruiters. LinkedIn allows targeted advertising, peer recommendations, full (up-to-date) CVs — as long as you’re willing to accept that they are not formatted in the old-school way — and most importantly the networks in which potential staff revolve in. This is done in minutes compared to hours/days, if at all, in the past.

    With Accounting — and by the way, I’m very fearful for the long term future of basic accountants — today’s software is available for free for basic skills to a percentage cut on invoices for advanced services, allowing even the most reticent and incapable, of posting to the correct account and consolidating accounts in a matter of minutes. And it is here we find the opportunity for accountants in the future, but I’ll leave that hanging for a while…

    Most accounting software in the cloud allows for uploading of receipts, with AI image processing applied, can automatically post to the correct account for your expenses, parking, fuel, purchases and so forth. Gone is the manual entry and the keeping (almost certainly) badly organised piles of paper.

    For Legal, it will be the same. Basic legal questions will be answered by machines capable of determining the right advice for simple solutions. This frees up business that would ordinarily wait for Legal to get back to them. Again, the opportunity for value-add by the lawyers is enormous.

    In the primary activities of a business, procurement, manufacturing, logistics, marketing and sales and service, we see much transformation already taking place. 

    Procurement is undergoing fundamental changes in the processes with automation of orders 

    I talked about manufacturing and operations in Issue 12: Management in the Digital Age, talk a minute to read it if you haven’t already. Marketing and Sales have been revolutionised by digital technologies and continue to strive forward almost faster than the other parts of any organisation. In fact, when I talk about digital transformation to most businesses and people, they think in terms of marketing strategies using Facebook and Instagram, but this is only one side of a multi-sided problem.

    So, you see the same shift and reduction in the obese middle is taking place not only at the production side of a business, but at its very core of operations throughout the enterprise. It’s a point I wanted to get across but didn’t articulate it very well at the time.


    Digicel, cost-cutting and a missed opportunity for Digital Transformation

    Digicel.png
    Fastforwardai.png

    The OECS Business Focus site reports that Digicel has announced a partnership with fastforward.ai to “… accelerate its vision to become the digital lifestyle partner for its customers …”.

    Right off the bat I’ll note my scepticism in how this will play out for Digicel. To me it looks more like a cost-cutting exercise dressed up as a new and revolutionary offer for their clients. To illustrate why I think this and what I think will play out, let’s look at the current position of Digicel and the offer that is fastforwrd.ai.

    Digicel has for a number of years been struggling financially. Revenues are not growing, and costs are increasing. Looking at the since withdrawn F-1 filed in June 2015 prior to their attempt at listing on the NYSE, we can see a company that is having difficulties with debt — estimated at around 6 billion USD —, risks and cashflow.

    In 2015, its revenue was made up of 60% voice communications, this has since fallen off a cliff as far as anyone can tell, and it is unclear whether or not data services will replace the lost revenue stream. In fact, Digicel’s revenue in 2013, 2014 and 2015 was stable at around 2.7B USD but it made a Net Loss in 2013 and 2015 with only a bright spot in 2014 where it made a Net Profit with some of this attributed to Finance Income. I’m not a financial analyst, so I don’t want to go into too much detail, but suffice to say, things are not easy for Digicel.

    Digicel has a number of strengths that are non-negligible, and it is these Digicel should develop their innovations going forward. They currently boast 14 million subscribers in the regions they are implanted (Latin America and the South Pacific). They have built strong direct to customer relationships and their products supply lots of data about usage and habits.

    They have, like most businesses, a few weaknesses too. The prepay/post-paid ratio is lower than in European or American markets — to be fair, this is a symptom of the zones they are present rather than a failing on their part —, they’re facing rising costs (5G anyone?), with tighter margins due to a nose-dive in lucrative voice services. And as they admitted in the F-1, they face “significant competition” in its markets.

    Fastforward.ai turns out not to be what I’d hoped. When I first saw the announcement, I thought that Digicel was expanding in to digital payments, something that is badly lacking in the Caribbean and as a result a massive opportunity for an established player with experience in transaction businesses and a direct-to-customer relationship. Much online commerce relies on digital payments and development is just starting in the Caribbean Fastforward.ai is a platform that allows marketing on social media platforms like Facebook and Twitter and is used to automate of customer support and promote offers direct to users. Hence why I feel this is more cost-cutting rather than innovation.

    Perhaps I’ve misunderstood the opportunity for Digicel with this partnership, but I can’t quite see how it will replace its falling revenues from the cash cow voice services with this initiative. Noted for review in the future.


    Reading List

    Digital Transformation Strategy to Launch Shortly in St. Kitts & Nevis - OECS Business Focus

    Source : OECS

    I’m Looking forward to learning about the details of this strategy in the coming months. One to watch.

    In South-East Asia, Grab and Gojek bring banking to the masses - The Economist

    Source : The Economist

    This is a follow-up too. We talked about ride-hailing in the Caribbean and one of the difficulties highlighted was the lack of bank accounts held on the islands and the (virtually none) use of digital payments. This article highlights the opportunities identified in South-East Asia and I think it is a model worth adapting to our context. What do you think?


    The Future is Digital Newsletter is intended for a single recipient, but I encourage you to forward it to people you feel may be interested in the subject matter.

    Thanks for being a supporter, Have a great day.

    → 10:00 AM, May 10
← Newer Posts Page 21 of 24 Older Posts →
  • RSS
  • JSON Feed