By Liming Zhu, Research Director, Csiro’s Data61 and Qinghua Lu, Senior Research Scientist, Csiro’s Data61
It’s not a full-blown crisis yet, simply because many enterprises have not fully embraced such systems in their production process. We see emerging research and technologies around reproducible ML, continuous delivery for machine learning (CD4ML), and debugging of ML-driven systems (e.g. crowdsourced testing, incentive mechanisms). If your organization has solid software engineering and testing practices, expanding them beyond code to model governance/versioning, data versioning, configuration/environment versioning (especially though lightweight containers) can help you be more confident in exploiting the value of data and AI/ML as your competitive advantages.Second, AI/ML-driven systems are learning from a vast amount of often personal and sensitive data, to make critical decisions for us, or sometimes about us. We need to be concerned about ethical and legal aspects on the use of this data and the decisions derived from it. Testing an AI/ML/data-driven system is no longer just against some functional specification and traditional non-functional attributes such as performance, security, reliability, and interoperability. Increasingly, it’s about testing the ethical conformance and legal compliance surrounding an enterprise’s acquisition and access of data (e.g. leveraging blockchain technology for ML accountability), the specific use purpose of data, potential bias in data, models and decisions, as well as explainability of these models and decisions. We are starting to see interesting ethical-by-design technologies, test cases for ethical issues such as fairness being integrated with the development life cycle, and automated compliance testing. We see the needs of continuous delivery moving into “continuous validation” of your AI/ML/data-driven systems through interaction with customers, users, stakeholders, and regulators. Finally, many effective AI and ML systems rely on highly complex models, especially the latest deep learning models. For example, Google’s latest Bidirectional Encoder Representations from Transformers (BERT) model for natural language processing has approximately 300 million parameters in the model (not its training data set). These types of models usually work and have extraordinary predictive powers. But when a test case does not work, the explainability of these models—or the ability to explain why something did not work—can be a challenge in itself. But is it truly a testability challenge or something else? When AlphaGo played its “37th move” by taking a position on the fifth line during that milestone match, best human players thought AI made a learner’s blunder and failed the “test” that we thought we know the answer. Only later we knew as humans we had failed the test of understanding it. So can an AI that learns from a vast amount of data and interacts with its unique environment be truly testable in the traditional sense? I am optimistic that we can test AI, but the meaning and approach of testing will need to be fundamentally re-considered. Rather than testing mechanical computing systems formed by 0s and 1s against our well-defined specifications, we will be testing something that can learn insights that surprise us or we may not even fully understand. Testing could become a learning experience for human and AI in a world of AI-Human co-evolution. That’s the future we are in right now.