Close Menu
TechCentralTechCentral

    Subscribe to the newsletter

    Get the best South African technology news and analysis delivered to your e-mail inbox every morning.

    Facebook X (Twitter) YouTube LinkedIn
    WhatsApp Facebook X (Twitter) LinkedIn YouTube
    TechCentralTechCentral
    • News
      Cabinet approves draft AI policy for public comment

      Cabinet approves draft AI policy for public comment

      6 April 2026
      Icasa data confirms the scale of South Africa's pay-TV collapse

      Icasa data confirms the scale of South Africa’s pay-TV collapse

      6 April 2026
      How AI agents are reshaping banking in South Africa - Lindelani Ramukumba, Absa

      How agentic AI is reshaping banking in South Africa

      5 April 2026
      South Africa's 5G boom is bypassing rural areas: Icasa

      South Africa’s 5G boom is bypassing rural areas: Icasa

      5 April 2026
      WhatsApp is eating South African operators' revenue

      WhatsApp is eating South African operators’ revenue

      4 April 2026
    • World
      DeepSeek V4 to run on Huawei silicon as China builds its own AI stack

      DeepSeek V4 to run on Huawei silicon as China builds its own AI stack

      4 April 2026
      Amazon in talks to buy satellite operator Globalstar

      Amazon in talks to buy satellite operator Globalstar

      2 April 2026

      Apple plans to open Siri to rival AI services

      27 March 2026
      It's official: ads are coming to ChatGPT

      It’s official: ads are coming to ChatGPT

      23 March 2026
      Mystery Chinese AI model revealed to be Xiaomi's

      Mystery Chinese AI model revealed to be Xiaomi’s

      19 March 2026
    • In-depth
      The biggest untapped EV market on Earth is hiding in plain sight

      The biggest untapped EV market on Earth is hiding in plain sight

      1 April 2026
      The R18-billion tech giant hiding in plain sight - Jens Montanana

      The R16-billion tech giant hiding in plain sight

      26 March 2026
      The last generation of coders

      The last generation of coders

      18 February 2026
      Sentech is in dire straits

      Sentech is in dire straits

      10 February 2026
      How liberalisation is rewiring South Africa's power sector

      How liberalisation is rewiring South Africa’s power sector

      21 January 2026
    • TCS
      TCS | MTN's Divysh Joshi on the strategy behind Pi - Divyesh Joshi

      TCS | MTN’s Divyesh Joshi on the strategy behind Pi

      1 April 2026
      Anoosh Rooplal

      TCS | Anoosh Rooplal on the Post Office’s last stand

      27 March 2026
      Meet the CIO | HealthBridge CTO Anton Fatti on the future of digital health

      Meet the CIO | Healthbridge CTO Anton Fatti on the future of digital health

      23 March 2026
      TCS+ | Arctic Wolf unpacks the evolving threat landscape for SA businesses - Clare Loveridge and Jason Oehley

      TCS+ | Arctic Wolf unpacks the evolving threat landscape for SA businesses

      19 March 2026
      TCS+ | Vox Kiwi: a wireless solution promising a fibre-like experience - Theo van Zyl

      TCS+ | Vox Kiwi: a wireless solution promising a fibre-like experience

      13 March 2026
    • Opinion
      The conflict of interest at the heart of PayShap's slow adoption - Cheslyn Jacobs

      The conflict of interest at the heart of PayShap’s slow adoption

      26 March 2026
      South Africa's energy future hinges on getting wheeling right - Aishah Gire

      South Africa’s energy future hinges on getting wheeling right

      10 March 2026
      Hold the doom: the case for a South African comeback - Duncan McLeod

      Apple just dropped a bomb on the Windows world

      5 March 2026
      VC's centre of gravity is shifting - and South Africa is in the frame - Alison Collier

      VC’s centre of gravity is shifting – and South Africa is in the frame

      3 March 2026
      Hold the doom: the case for a South African comeback - Duncan McLeod

      Hold the doom: the case for a South African comeback

      26 February 2026
    • Company Hubs
      • 1Stream
      • Africa Data Centres
      • AfriGIS
      • Altron Digital Business
      • Altron Document Solutions
      • Altron Group
      • Arctic Wolf
      • Ascent Technology
      • AvertITD
      • BBD
      • Braintree
      • CallMiner
      • CambriLearn
      • CYBER1 Solutions
      • Digicloud Africa
      • Digimune
      • Domains.co.za
      • ESET
      • Euphoria Telecom
      • HOSTAFRICA
      • Incredible Business
      • iONLINE
      • IQbusiness
      • Iris Network Systems
      • Kaspersky
      • LSD Open
      • Mitel
      • NEC XON
      • Netstar
      • Network Platforms
      • Next DLP
      • Ovations
      • Paracon
      • Paratus
      • Q-KON
      • SevenC
      • SkyWire
      • Solid8 Technologies
      • Telit Cinterion
      • Telviva
      • Tenable
      • Vertiv
      • Videri Digital
      • Vodacom Business
      • Wipro
      • Workday
      • XLink
    • Sections
      • AI and machine learning
      • Banking
      • Broadcasting and Media
      • Cloud services
      • Contact centres and CX
      • Cryptocurrencies
      • Education and skills
      • Electronics and hardware
      • Energy and sustainability
      • Enterprise software
      • Financial services
      • HealthTech
      • Information security
      • Internet and connectivity
      • Internet of Things
      • Investment
      • IT services
      • Lifestyle
      • Motoring
      • Policy and regulation
      • Public sector
      • Retail and e-commerce
      • Satellite communications
      • Science
      • SMEs and start-ups
      • Social media
      • Talent and leadership
      • Telecoms
    • Events
    • Advertise
    TechCentralTechCentral
    Home » Sections » AI and machine learning » An effective AI strategy demands a sound data strategy

    An effective AI strategy demands a sound data strategy

    Promoted | AI and ML depend on up-to-date, use-case-appropriate data to function at all, let alone achieve high-level business goals, writes FNB's Mark Nasila.
    By Mark Nasila17 May 2023
    Twitter LinkedIn Facebook WhatsApp Email Telegram Copy Link
    News Alerts
    WhatsApp
    The author, FNB’s Mark Nasila

    “Data is the lifeblood of any AI system. Without it, nothing happens.” — David Benigson, Signal

    A data strategy outlines how an organisation will manage and leverage its data assets to achieve its business objectives. It involves defining the data architecture, governance, management and analytics practices used to ensure that data is accurate, accessible and secure.

    A good data strategy should align with the overall business strategy and provide a framework for making decisions about data acquisition, storage, processing, analysis and usage. It should also address issues related to data quality, privacy and regulatory compliance. Ultimately, a data strategy aims to enable an organisation to derive insights and value from its data to support better decision-making and improve business outcomes.

    Without a solid data strategy, the chance of realising business objectives with artificial intelligence (AI) and machine learning (ML) is greatly reduced while at the same time the risks are magnified. Ultimately, AI and ML depend on up-to-date, use-case-appropriate data to function at all, let alone achieve high-level business goals.

    Data or the lack of the right data strategy is the number one bottleneck to scaling or doing anything with AI

    To work effectively, ML requires large quantities of quality data. To obtain this data, a process for identifying, procuring and accessing it must be established. This requires governance guidelines and a data ecosystem that supports both exploratory and production environments. But, as always, access and flexibility must be balanced with security, privacy and quality control.

    “I can’t stress this enough: data or the lack of the right data strategy is the number one bottleneck to scaling or doing anything with AI,” said Nitish Mittal, a partner in the digital transformation practice at Everest Group. “When clients come to us with what they think is an AI problem, it is almost always a data problem. AI depends on viable data to prosper. That’s why it’s important to think about the data first.”

    Data-centric AI

    When creating a data strategy for AI, it’s essential to focus on the relevant data to fuel the appropriate use cases. It’s important to engineer the data to the use case, not merely to collate and centralise it.

    Andrew Ng is the founder and CEO of Landing AI, a company trying to make no-code AI solutions, and a pioneer in the field of deep learning. In an interview with Fortune in June 2022, Ng explains how he’s become a vocal advocate for what he calls “data-centric AI.”

    Ng says the availability of state-of-the-art AI algorithms is increasing thanks to open-source repositories and the publishing of cutting-edge AI research. This means businesses can access the same software code as larger companies like Nasa or Google. However, the key to success with AI is not the algorithms themselves, but rather the data used to train them. This involves gathering and processing data in a governed manner.

    Data-centric AI is what Ng calls “smartsizing” data: using the least amount of data to build successful AI systems. He believes this shift is essential if businesses are going to take advantage of AI, especially those that may not be able to afford data scientists of their own or whole teams to focus on their data strategies.

    Ng says companies may need less data than they think if it is prepared the right way. With the right data, even a few dozen or a few hundred examples can be sufficient for an AI system to work not just effectively, but comparably to those built by consumer internet giants that have billions of examples at their fingertips.

    Preparing the data, according to Ng, means ensuring it’s “Y consistent”. That is, there should be a clear boundary for classification labels. For instance, in the case of an AI system designed to find defects in pills, labelling any scratch shorter than a certain length as “not defective” and any scratch longer than that as “defective” can help the system perform better with less training data, compared to inconsistent labelling that may introduce ambiguity or false positives or negatives.

    An effective data strategy should comprise the following components: acquisition and processing, quality, context, storage, provisioning, and management and security. The strategy should involve obtaining and processing the necessary data for developing prototypes and algorithms. The data set should be of good quality with minimal bias and high accuracy labelling of training data to address business challenges.

    Understanding the source and flow of data is also essential to share it effectively within the organisation. The storage of data should also be appropriate, and its structure should support the objectives concerning access, speed, resilience and compliance. Optimising the accessibility of data to the teams that need it and implementing safeguards are important too. Finally, data management and security should be in place to ensure appropriate use of datasets, including data security, access, and permissioning.

    Understand data context by capturing the human elements

    To make informed decisions about data usage, it is important to document the human knowledge regarding how the data was collected. This will help you make sound decisions based on the downstream analysis of the data, and helps drive explainability and accountability. A data point might be useful, but not if you don’t know where it stems from.

    To ensure effective use of data, it is important to understand its provenance, including where it came from, how it was collected, and any limitations in the collection process. Consider whether the data relates to a specific group or a diverse population, and determine if any digital editing has been applied to images or audio. Each of these changes can affect its useability.

    Accuracy and precision of your data matter, so it’s important to define your variables and understand the systems and mappings through which your data points have passed. Defined variables help to differentiate between raw data, merged data, labels and inferences. When processing data through multiple systems and mappings, problems can arise, causing the quality of the data to degrade over time. To avoid this, ensure that your mappings retain detail to preserve the accuracy and precision of the data throughout the process.

    Generating artificial data can help fill gaps in real-world datasets and eliminate the need for potentially sensitive private data

    To simplify the process of labelling data, it can be helpful to use established AI and data methods. For visual classification, a tool like ImageNet — which can identify relevant image categories and object location — can be used. By highlighting a specific area in the image, labellers can then provide more detailed classifications, such as identifying the model of a car.

    To make the data labelling process easier for natural language processing (NLP), you can use existing textual content and classifiers like sentiment analysers to categorise data into general groups that can be confirmed by a person and then used for further applications.

    Clustering techniques can be used to group similar data together, making it easier to label in larger volumes. Additionally, generating artificial data can help fill gaps in real-world datasets and eliminate the need for potentially sensitive private data. Gartner predicts that by 2024, synthetic data will make up 60% of all data used for AI and analytics, making it a growing area of interest.

    Handling imbalanced data sets

    An AI-powered solution is only as good as the source data it’s fed. Faulty data leads to faulty outputs. One of the leading sources of inadequate results are imbalanced data sets. For instance, if a particular group is over-represented in a dataset, it can lead to minorities being overlooked or their needs being inaccurately predicted. There are various sorts of imbalances — including intrinsic and extrinsic ones — and various methods, eg, over-sampling, under-sampling, synthetic minority oversampling technique, and generative adversarial networks that can be explored to overcome them.

    To successfully create an AI strategy, it’s imperative to have an equally robust data strategy that removes complexities, aligns data with business objectives, is constantly checked and adjusted to mitigate bias or other failings, and which those responsible for data collection and management in the business buy in to and support. Without data, there is no AI, but with it, the possibilities are nearly limitless.

    • The author, Mark Nasila, is chief data and analytics officer in FNB’s chief risk office
    • Read more articles by Mark Nasila on TechCentral
    • This promoted content was paid for by the party concerned
    Follow TechCentral on Google News Add TechCentral as your preferred source on Google


    FNB Mark Nasila
    WhatsApp YouTube
    Share. Facebook Twitter LinkedIn WhatsApp Telegram Email Copy Link
    Previous ArticleWhy security operations centres make sense for smaller businesses
    Next Article CNET journalists warn: AI ‘threatens our jobs and reputations’

    Related Posts

    FNB CEO Harry Kellan steps down after just two years

    FNB CEO Harry Kellan steps down after just two years

    30 March 2026
    Optasia wants to do for banks what it did for telcos - Salvador Anglada

    Optasia wants to do for banks what it did for telcos

    24 March 2026
    FNB launches eWallet on WhatsApp as it overhauls service

    FNB launches eWallet on WhatsApp as it overhauls service

    11 March 2026
    Add A Comment

    Comments are closed.

    Company News
    Synthesis helps financial enterprises transform with new Gemini Enterprise - Digicloud Africa

    Synthesis helps financial enterprises transform with new Gemini Enterprise

    2 April 2026
    The next churn wave is already in your contact centre conversations - CallMiner

    The next churn wave is already in your contact centre conversations

    2 April 2026
    Mining's problem isn't output, it's execution - Workday

    Mining’s problem isn’t output, it’s execution – Workday

    1 April 2026
    Opinion
    The conflict of interest at the heart of PayShap's slow adoption - Cheslyn Jacobs

    The conflict of interest at the heart of PayShap’s slow adoption

    26 March 2026
    South Africa's energy future hinges on getting wheeling right - Aishah Gire

    South Africa’s energy future hinges on getting wheeling right

    10 March 2026
    Hold the doom: the case for a South African comeback - Duncan McLeod

    Apple just dropped a bomb on the Windows world

    5 March 2026

    Subscribe to Updates

    Get the best South African technology news and analysis delivered to your e-mail inbox every morning.

    Latest Posts
    Cabinet approves draft AI policy for public comment

    Cabinet approves draft AI policy for public comment

    6 April 2026
    Icasa data confirms the scale of South Africa's pay-TV collapse

    Icasa data confirms the scale of South Africa’s pay-TV collapse

    6 April 2026
    How AI agents are reshaping banking in South Africa - Lindelani Ramukumba, Absa

    How agentic AI is reshaping banking in South Africa

    5 April 2026
    South Africa's 5G boom is bypassing rural areas: Icasa

    South Africa’s 5G boom is bypassing rural areas: Icasa

    5 April 2026
    © 2009 - 2026 NewsCentral Media
    • Cookie policy (ZA)
    • TechCentral – privacy and Popia

    Type above and press Enter to search. Press Esc to cancel.

    Manage consent

    TechCentral uses cookies to enhance its offerings. Consenting to these technologies allows us to serve you better. Not consenting or withdrawing consent may adversely affect certain features and functions of the website.

    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    • Manage options
    • Manage services
    • Manage {vendor_count} vendors
    • Read more about these purposes
    View preferences
    • {title}
    • {title}
    • {title}