APAC CIOOutlook

Advertise

with us

  • Technologies
      • Artificial Intelligence
      • Big Data
      • Blockchain
      • Cloud
      • Digital Transformation
      • Internet of Things
      • Low Code No Code
      • MarTech
      • Mobile Application
      • Security
      • Software Testing
      • Wireless
  • Industries
      • E-Commerce
      • Education
      • Logistics
      • Retail
      • Supply Chain
      • Travel and Hospitality
  • Platforms
      • Microsoft
      • Salesforce
      • SAP
  • Solutions
      • Business Intelligence
      • Cognitive
      • Contact Center
      • CRM
      • Cyber Security
      • Data Center
      • Gamification
      • Procurement
      • Smart City
      • Workflow
  • Home
  • CXO Insights
  • CIO Views
  • Vendors
  • News
  • Conferences
  • Whitepapers
  • Newsletter
  • Awards
Apac
  • Artificial Intelligence

    Big Data

    Blockchain

    Cloud

    Digital Transformation

    Internet of Things

    Low Code No Code

    MarTech

    Mobile Application

    Security

    Software Testing

    Wireless

  • E-Commerce

    Education

    Logistics

    Retail

    Supply Chain

    Travel and Hospitality

  • Microsoft

    Salesforce

    SAP

  • Business Intelligence

    Cognitive

    Contact Center

    CRM

    Cyber Security

    Data Center

    Gamification

    Procurement

    Smart City

    Workflow

Menu
    • Software Testing
    • Cyber Security
    • Hotel Management
    • Workflow
    • E-Commerce
    • Business Intelligence
    • MORE
    #

    Apac CIOOutlook Weekly Brief

    ×

    Be first to read the latest tech news, Industry Leader's Insights, and CIO interviews of medium and large enterprises exclusively from Apac CIOOutlook

    Subscribe

    loading

    THANK YOU FOR SUBSCRIBING

    • Home
    • Software Testing
    Editor's Pick (1 - 4 of 8)
    left
    The Quest to Succeed in this Changing World

    Matthew Faries, CIO, AFG Australian Finance Group

    The Transforming Role of the CIO

    Jana Branham, Chief Information Officer, ACH Food Companies

    Consumerization of IT & What Enterprises Should Anticipate

    Denny Charlie, EVP & CIO, Soho Global Health

    Software Testing for Devops and Microservices Era - Common Misconceptions and Industry Realities

    Sachin Mulik, VP, Quality Engineering, Amdocs

    The Future is Automated

    Jason Williams, Head of Technology, Worldline

    Drowning in Data? Your Enterprise Might Be an AI Candidate

    Troy Lau, Division Leader for Ai, Human and Data Technologies, Draper

    Cooking with leadership and high performing teams

    Stuart Seymour, Global Head of Cyber Defence and Cyber Security Operations, Bat

    The relevance of Corporate Security in the ESG Agenda

    Wendel Correia, Head of Latam Corporate Security & Global Anti-Illicit Trade Head, Syngenta Group

    right

    Is the PDF the cul-de-sac of data?

    Martin Pickrodt, Chief Information Officer, Mesitis

    Tweet
    content-image

    Martin Pickrodt, Chief Information Officer, Mesitis

    The information age has given a new meaning to Francis Bacon’s “Knowledge is Power”. Information and data are becoming ever cheaper to create, store and transmit; data is everywhere and a fundament for business decisions, loan approvals and marketing strategies. For us at Canopy, we have a need to analyse financial data and statements for aggregation purposes which is what we will focus on here.

    While most companies have embraced this digital age, old habits die hard and we can still find a lot of paper trails: orders, invoices, bank statements, investment analysis etcetera. As a user of the data this can be very frustrating; we need to process our data for purposes of analysis, validation, accounting or even regulatory requirements.

    Therefore, our dream is open standards and the willingness of counterparties to provide us with ‘our data’ in a proper format. Ravi Menon, Managing Director at the Monetary Authority of Singapore (MAS) emphasized in a recent speech the importance of open data standards: Common standards help against fragmentation, inefficiency and inconvenience; seamless data sharing will enable higher quality of data which is free of error and commonly understood. It will also allow the aggregation of data and make it intelligible, meaning: machine readable and machine useable.

    Yet companies still work with paper or its modern day derivative: the PDF file.

    PDF documents are great: every computer can open them and the reader can see exactly what the sender of information intended to show. A major advantage over any text document format which may or may not take the liberty of changing the formatting upon opening and wreak havoc to any nicely crafted design. PDFs are therefore stable display platforms avoiding any issues relating to PEBKAC –

    a popular term describing ‘Problem Exists Between Keyboard And Chair’.

    And yet, PDF documents are terrible: they do not fulfil the requirement of being machine readable. While you can open the document for visual inspection or printing, you are not able to further process that information. The millstone around the neck of your data; you may have it but using it in any substantial way is a chore. That is because it is a document format and not a data exchange standard. The recipient is therefore stuck with data that is not machine useable.

    Why are PDF documents so popular then? Apart from the stability of the viewing experience, senders of information see the non-machine-readability as an advantage. As the data cannot easily be used any further a perception of safety is created.  Comparisons are harder to draw, insights are harder to gleam and you will find it difficult to bring your dataset to a competitor. It’s the placeholder for analog technology in a digital world, a neat replacement for paper.

    What can be done with the quantities of data that we would like to use? We need to bring it back into a real electronic format, that much is clear. Many companies choose the hard way: manually re-enter the relevant items. Users of large data amounts, especially when faced with repeat processes like monthly statements, revert to outsourcing: a back office in a remote part of the world that will do the grunt work. A solution that will fail the test of true scalability as well as accuracy, not to mention security concerns.

    " PDF extraction is only a temporary solution on the path of making data feeds ubiquitous."
     
    At Canopy, we are heavy users of PDF data and we have an overarching requirement for accuracy as well as privacy. The inevitable solution is then to electronically read out and re-interpret the pdf statement itself. We call it cracking the statement. Extracting the data is the easy part. PDF conversion solutions are plentiful and can do a good job in the initial extraction. From here the hard part starts: This pile of data needs to be structured and transformed. Table headlines and columns need to be recognized, headers and disclaimers need to be ignored and the content of fields need to be filled. Not very straight forward but with a few bright programming minds the formats can be cracked. Once done there is no more need for the labour intensive outsourcing solution half way around the world. Data is machine readable again, errors become extinct and we can establish straight-through-processing. Additionally, speed of transformation improves significantly and the amount of data points can be increased without worrying if the back office needs expansion. Data becomes information again.

    PDF extraction is only a temporary solution on the path of making data feeds ubiquitous. We hope that companies and especially financial institutions will listen to the gentle nudge from regulators and the call from clients in this matter. Until then, automated extraction is a powerful application for bank customers and beyond. Not only is it hugely accurate and fast, it also saves cost.

    tag

    Financial

    Data Exchange

    Weekly Brief

    loading
     Top 10 Software Testing Solutions Companies- 2023
    Top 10 Software Testing Consulting / Services Companies – 2023
    ON THE DECK

    I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info

    Read Also

    Artificial Intelligence - Myths And Truths

    Artificial Intelligence - Myths And Truths

    Geraldo Pereira Junior, Chief Information Officer, Ypê
    Sustainable Future through Innovative Technology Solutions

    Sustainable Future through Innovative Technology Solutions

    Faisal Parvez, Director, BT Business CIO
    The Future Relies on Augmented AI

    The Future Relies on Augmented AI

    Laurent Fresnel, CIO, The Star Entertainment Group
    Digitalization with the use of digital technologies/Improving business through digital technologies

    Digitalization with the use of digital technologies/Improving business through digital technologies

    Wilbertus Darmadi, CIO, Toyota Astra Motor
    How Marco's Pizza Leaned On Technology To Succeed Amid The Pandemic By Quickly Pivoting To Contact-Free Delivery And Curbside Carryout

    How Marco's Pizza Leaned On Technology To Succeed Amid The Pandemic By Quickly Pivoting To Contact-Free Delivery And Curbside Carryout

    Rick Stanbridge, VP & Chief Information Officer, Marco’s Pizza
    Bunnings  Diy Digital Transformation

    Bunnings Diy Digital Transformation

    Leah Balter, Chief Information Officer, Bunnings
    For a Smarter City: Trust the Data, Ignore the Hype

    For a Smarter City: Trust the Data, Ignore the Hype

    Brad Dunkle, Deputy CIO, City of Charlotte
    Smart Community Innovation for the Post Pandemic

    Smart Community Innovation for the Post Pandemic

    Harry Meier, Deputy Cio for Innovation, Department of Innovation and Technology, City of Mesa
    Loading...
    Copyright © 2025 APAC CIOOutlook. All rights reserved. Registration on or use of this site constitutes acceptance of our Terms of Use and Privacy and Anti Spam Policy 

    Home |  CXO Insights |   Whitepapers |   Subscribe |   Conferences |   Sitemaps |   About us |   Advertise with us |   Editorial Policy |   Feedback Policy |  

    follow on linkedinfollow on twitter follow on rss
    This content is copyright protected

    However, if you would like to share the information in this article, you may use the link below:

    https://software-testing.apacciooutlook.com/ciospeaks/is-the-pdf-the-culdesac-of-data--nwid-586.html