Guardrails For AI, What Is Possible Today!

Introduction

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” The Center for AI Safety(CAIS) published this single statement as an open letter on May 30, 2023. The statement was coordinated by the CAIS and signed by more than 300 people. Media has taken up this alarmist viewpoint and amplified it. This news has drowned out the case for controlling existing AI.

Hard core doomers say, AI is going to become super intelligent and kill us all. There is no antidote, including global mitigation. Furthermore, with super intelligence, the actions of AI will be godlike. Not predictable, completely opaque and uncontrollable. AI-boosters see productivity enhancements and a boost to the GDP and a boon to humanity. The first five signatories on the CAIS statement are

Geoffrey Hinton, Emeritus Professor of Computer Science, University of Toronto
Yoshua Bengio, Professor of Computer Science, U. Montreal / Mila
Demis Hassabis CEO, Google DeepMind
Sam Altman CEO, OpenAI
Dario Amodei CEO, Anthropic

Of these, Hinton was a booster before he became a doomer. Hinton’s model is Robert Oppenheimer who led the Manhattan Project. Except in Oppenheimer’s case he actually helped create a doomsday machine in the form of a nuclear bomb before he realized what he had done. Bengio along with Hinton are winners of the American Computing Machinery’s 2018 Turing award, purported to be computing’s Nobel. Yann leCun, the other winner from the same year, is notably absent from this list. Hassabis, Altman and Amodei as CEOs of prominent AI companies, present as boosters. By signing this statement, they present as doomers.

That dual presentation of seeming a doomer and a booster could be based on deflection from their activities or the desire to create moats for first movers, a form of bad faith. This will not be the first time that regulation has been used as a shield by first movers. At least one signatory has second thoughts, Bruce Schneier, the well known crypto-pundit (crypto as in cryptography).

Enterprises are deploying AI today because of actual gains in productivity, time savings, lack of trained personnel and many other putative benefits from AI. There are various estimates about the value brought by AI to global GDP. At the higher end, the figure is estimated to be tens of trillions by 2030. Regulation, which always lags innovation, is aimed at currently deployed AI. Current and new laws have targeted the provable harm they are capable of causing and improving the outcomes. Specific localities and states have adopted slightly different laws.

The architecture of an emerging startup, Modguard, in the accompanying diagram, gives us a path towards compliance with these laws. Conversations with the CEO of Modguard, Raymond Joseph, sparked this article. In this architecture all data and model updates are controlled using a integrated controller. Proofs of the model changes, audits, data changes etc. are deposited in a blockchain for later use in any context. For compliance reporting, evidence in any future legal actions or post-incident analysis.

Architecture of an AI based guardrail system

Vipin Bharathan

Publicly available papers and other references on AI safety and my own understanding of the evolving landscape are also linked throughout sections of this article.

Before diving into the compliance landscape, a survey of GPTs, their promise and their risks are presented. GPT-4 and its ilk such as Bard and Claude are the latest in Generative technology, AI has been in development for several decades. Simultaneously, solutions for two aspects of protecting AI have been proceeding. The first is algorithmic techniques for the removal of bias and protection of user privacy. The second is techniques for protecting against adversarial attacks on the model. In this article the focus is on mitigating present harms rather than a future doomsday scenario where AI will turn us all into paperclips.

Hoisted By Our Own Petard!

“Risk of extinction” usually refers to other species who have the misfortune to share the earth with us humans. In the open letter, the species under threat of extinction is Homo sapiens. The threat arises from artificial intelligence. In AI Circles the threat of existential risk or the risk of humanity’s extinction is x-risk. Alignment is one way of mitigating this risk. Alignment refers to the synchronization of AI goals with Homo sapiens’ goals.

This simplistic view glosses over the problem. The goals of Homo sapiens cannot be ascertained by just questioning a few humans, or few groups of humans; different groups have different goals. X-risk is not a fully fleshed concept. It is very probable that the benefits of AI fall to a select group, every other group is x-ed out unless UBI (Universal Basic Income) or some such scheme is created. So far, AI in its current state, is not x-risk for Homo sapiens.

There is an example that goes to an extreme, the thought that if an AI is created with the goal of maximizing the production of paper clips, AI being super intelligent will do all it can to achieve its goal including harvesting people for the iron in their blood to produce paper clips. Eventually killing us all and drowning the surviving world in paper clips.

X-risk is also notable among the warnings from people such as Stephen Hawking, who cautioned against searching for or contacting aliens. In the x-risk scenario there, humanity could be annihilated by contact with an advanced society. AI can be thought of as aliens in our midst that are nurtured, fostered and let loose. x-risk pushes the doomsday button in our brains. This is why x-risk is the subject of many religious cults, sci-fi movies and other fantasies.

These end-of-the-world scenarios distract from the task at hand. Threats come from more prosaic and familiar sources. These threats have been all around and are growing with the deployment of AI technology. Attention needs to be paid not on future harms based on thought experiments, but on real and ongoing harms happening everyday around us. What can be done today, instead of wringing our hands about a future catastrophe that may never come.

Boosters and Doomers

Boosters say that AI will save the world. Doomers say that AI will destroy us. Hardcore classical doomers say that AI will destroy us, no matter what we do. AI being super intelligent will be opaque, with the indifference of gods or the glee of gods or as Shakespeare says in King Lear “As flies to wanton boys are we to th’ gods, They kill us for their sport.”

AI smiling or laughing or sporting is a thought that does not sit easily. AI is always deadly serious. Imbuing AI with a persona is how drama is shaped, witness countless many creations such as HAL 2000, R2D2, C3PO. That is how we humans have been conditioned. The Turing test has led us astray.

There is a spectrum of opinion, between hardcore doomer and a hardcore booster. The list of AI harms and advantages that are commonly stated between these extremes can be listed as:

AI will kill us all and maybe destroy the whole world to boot. There is no defense.
AI will pose an existential risk to humans. Only a concerted international effort by global agencies to control and regulate AI will mitigate this risk.
AI capable warfare with an army of killer drones and other AI based killing machines with intelligent controllers is an unknown risk. In the hands of an entity indifferent to life such weapon systems can be extremely dangerous.
AI will take all our jobs, forcing widespread immiseration, despair and inequality as a corollary, AI will make some individuals immeasurably wealthy.
AI will harden the bias seen in our current data used in its training to deny schooling, financial freedom and jobs to minorities, women and non-traditional strivers.
AI controlled killing machines on both sides of any conflict will lead to an uneasy but lasting peace based on mutually assured destruction. Or better yet, restraint leading to peace.
AI will increase productivity, leading to a leveraging of productivity more than seen in the past with technological innovation, but at a rate never seen before. This will lead to nirvana and the solution of hard problems, pandemics, climate change and human poverty.
AI induced GDP increase will create an economy where humans can explore their creativity and indulge in art, poetry and music to be the best that they can be. Work will become play.
AI will do all the hard lifting for us. AI will nurture us, uplift us when we are sorrowful, provide company when we are lonely, grant us unlimited sexual pleasure and emotional satisfaction, play with us, support us in our physical deterioration and even grant us eternal life with a healthy cyborg body.

Another possibility is that AI will save the world by destroying humans. I have conjoined the doomer and booster positions into one statement. It is in jest, with a sardonic kernel of truth.

Generative Pre-Trained Transformers (GPTs)

This is a short excursion into the current crop of AI that has generated a lot of hope, hype and fear, GPTs. Generative means that the bot generates text or images. Text includes computer code. Pre-training starts with unsupervised learning where a vast corpus of existing data is fed to the model. A transformer architecture powers the neural net. It is a Large Language Model (LLM). An LLM is large because of the number of parameters that drive the model. A LLM is also a deep learning model, composed of a deep stack of layers of neurons. A transformer can also produce more coherent and human-seeming text due to self-attention. This uses statistics and probability to create human seeming text.

Our collective anxiety around the capabilities of ChatGPT drives the doomsday scenario. ChatGPT was first launched as a chat layer on top of GPT-3. That user interface garnered a lot of interest and engagement from 100s of millions of users in a few months. GPT-4 powers ChatGPT Plus. GPT-4 is the latest version in the GPT series from OpenAI, a San Francisco based AI company, in limited release. Many barriers fell in the first half of this year. GPT-4 aced the Bar exam, AP Biology exams, Oncology exams and other tests.

Such exams are high points in human achievement and intelligence, taking years to prepare for and pass with high marks. Ergo, GPT-4 must be super intelligent. The quality of the text it generates is astonishing to many. ChatGPT and its plus version can imitate Sappho, Hilary Mantel, Toni Morrison, Shakespeare, Dashiell Hammett, Philip Larkin or Edgar Allan Poe. It does not take much to astonish the multitudes. Witness the multifarious, breathless samples posted by real humans in almost all social media apps of their interactions with ChatGPT.

The generative aspect of GPT leads to another definition, General Purpose Technology. Such technology has been trained on a huge variety and quantity of data. Such capability makes the technology appropriate for a variety of downstream applications.

For the interested amateur, Stephen Wolfram is a great source on the workings of GPT. I use the word amateur loosely. To decipher his explanation, a basic background in calculus and statistics is necessary. Patience and persistence to read lengthy arguments in Wolfram’s characteristic style also helps. If you are interested in integrating with the ChatGPT API (Application Programming Interface), Wolfram’s article on integration of ChatGPT with Wolfram Alpha and Wolfram Language is worth reading.

Wolfram Alpha has a back and forth with ChatGPT that removes hallucinations. Hallucinations is a kindly word for the confident untruths that ChatGPT produces under certain conditions.

Ordinary businesses may not have a well developed internal language and a model like Wolfram Alpha does backed by the Wolfram Language. A high-powered corporate lawyer in New York discovered hallucinations to his chagrin when a brief prepared with the help of ChatGPT had several references to hallucinated case-law to bolster his case. Now the long-faced lawyer is forced to explain this situation to a po-faced judge. His firm has been fined $5000, probably what he charges for 2 hours of work.

A series of business rules that codify certain regulations could be input into the model to generate more compliant output. Successful integration to such General Purpose Transformers requires skills. AI talent is in short supply. AI as a service as well as its twin, AI Risk management, is a growth industry.

Stochastic Parrots

The term stochastic parrot comes into common usage from the seminal paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”.

The authors argue that large language models consistently feed large amounts of heterogeneous data into the model, starting the process with unsupervised learning. These sources include twitter, wikipedia, reddit and the internet itself. When you throw in 4chan and truth social along with meta, linkedin etc. the input consists of the data produced largely (60-65%) by white males between the ages of 18-35. We should therefore not be surprised if LLMs sound like a young white male, albeit one with a broad and deep education.

Data is the feed-stock of the AI training process, it is also the output. Data has to be documented, meaning and context have to be injected. Curation of input is crucial. Pre-mortem is necessary. Pre-mortem requires that before deployment, the solution has to be run through challenges including a worst-case scenario. Back testing before deployment is a requirement for many products already. This is the main message of the paper.

The stochastic parrots paper is highly influential, having been cited in more than 488 other papers since it came out in 2021. The end-note clarifies that the paper, even though written by seven authors, lists just four. Three delisted authors had been asked to remove their names by their employers. Censored at the source.

No soul, no goals, no children

This view is that even the most sophisticated generative AI does not have agency or self-awareness. Most achieve results through progressive refinement of their parameters through a combination of unsupervised, self-supervised and semi-supervised deep learning. The terms unsupervised, self-supervised and semi-supervised refer to the level of human intervention during the training of the model. Supervisors are humans who label the control results. The supervisors’ labels, when compared with AI’s own results, produce a difference which has to be minimized for the AI model, the loss function. Supervisor is also a grander term than the reality. Labelers are recruited from low cost English speaking countries such as Kenya, Nigeria and India stuck in a hot back-office or in their bedrooms. Labelers are employed by multi-billion corporations who then feed the labeled data to AI developers. They are paid a pittance for a job that is ambiguously defined and stressful.

AI is not autonomous, it does not secretly work on creating “children” with more sophistication, nor does it have independent physical interfaces to intervene in the world. Humans use a generation of AI to create the next generation AI. Humans empowered by AI are a real and present threat. GPT n+1 is informed by GPT n. AI can also be taken down some twisted paths by its interaction with users as the section on data poisoning reveals. Many experts have done a good job of explaining the anti-doomer viewpoint. They include Sarah Constantin and Scott Aaronson. Aaronson, who is well known as a professor specializing in quantum computing is now working on AI Safety for OpenAI. One of the proposals that he helps shepherd is the watermarking of ChatGPT output. This effort will presumably prevent people passing off ChatGPT output as their own. Students, journalists, authors, lawyers, marketers and many in the content generation field will not be able to resist the use of ChatGPT to write.

The Alternate Viewpoint

Not surprisingly, the debate rages on. The main question is whether the current and newly developed AI models driven by LLMs “understand” the text that they provide random users. All of this hinges on the word “understand”. Hinton, Andrew Ng and others posit that the most advanced LLMs understand their output. Once we have crossed the Rubicon of an AI that understands its output, AI-doom follows close behind.

The authors of the stochastic parrots paper and influential researchers such as Yann LeCun, say that AI does not understand its output. They contend that it is all statistics and probability driven by a process of refinement.

AI Model Lifecycle & Safety

These disagreements about “understand” are due to the lack of tests that empirically prove that AI is capable of independent reasoning about the answers it gives. On the other hand, there are tests for actual harms with deployed AI models. Algorithms and metrics have been developed for measuring and mitigating bias. Harms reduction from these widely used models make regulations possible. Proponents of this view say, models in production are capable of actual harm. Harm reduction should use well known techniques that are computationally tractable. Reduce harm right now, instead of worrying about a theoretical future in which AI will kill us all. Modguard implements several of these algorithms.

Architecture of Modguard

As explained by Raymond Joseph, the CEO of Modguard, the entire AI model lifecycle has to be protected. The architecture diagram shows how this can be achieved for a running model. The AI Model is completely enveloped by the AI shield of ModGuard when it is running. What is not shown is how the model has to pass through a strict review process including pre-mortem during development, as well as before being deployed into this configuration. This way the cocoon starts being woven along with the model as it develops.

Lifecycle of AI showing how business concerns are addressed by Modguard features

Vipin Bharathan

All proofs, data related as well as model and intervention related are saved to a blockchain. Blockchain is used as a trust engine. When compliance reports are produced, such trusted data will be added to the output to strengthen its truth.

Once in place, the model is monitored as it is deployed to production. Strict control is exercised for updates to the model and its training data. Poisoning attempts are intercepted using the monitoring of customer requests and their responses. Addition to the training data is through carefully monitored pathways.

Of the attacks on AI, poisoning ranks among the top concerns.

Data Poisoning

Data poisoning is a way to inject untrue, misinformation, or biased information into the training sets of AI to induce biased outputs from AI or social media systems. Poisoning happens due to the use of tainted sources or allowing interactions with users to sway the output. Since AI is often trained continuously, this can be in the form of user queries and ratings of AI responses. Most such attacks use black box models. A black box model does not need details of training data or the model itself to influence its behavior. The adverserial users profit by inducing bias or stealing user data.

Data poisoning in existing social media and other systems results in fake news and election interference. Viral spread of inflammatory, violent, untrue and unscientific content is enhanced. These posts are highly rated because of likes and re-posts, often fueled by bot armies. This poison causes societal harm and threatens democracy. Existing data joins the stream of data used to train AI systems which have been fed unfiltered and biased data. Data is already poisoned at source. There is no remedy, but curation, to cure this data of bias. Can AI be enlisted for this curation is still debatable. However, AI as an adversarial or monitoring system has been used for a while.

In addition, malicious and persistent users inject poison to bias the output of the AI system during their interactions with it. These attacks can be concealed; to be undetectable, by packaging data in seemingly innocuous queries or trickling the poison over time through many interactions. A survey of poisoning attacks on AI reveals many avenues of attacks, effective against a variety of model types. This is an attack on the security of the system.

Studies have also shown that any attempt at curation or filtering incoming data diminishes the accuracy of the system. That is, accuracy and security are at odds. According to the view in the paper, “On the Impossible Safety of Large AI Models”, an AI trained on a curated de-fanged dataset will not be accurate. However, a large, multi-billion parameter AI trained on a uncurated heterogeneous dataset is not secure, producing certain pathological and biased outputs that also leak user data in the original training set. Further study on the assumptions in this paper are required.

In the real world, Modguard has filed a patent for a method to prevent data-poisoning attacks in a federated setting. The antidote to poison, is a part of their integrated toolkit.

Regulations

Businesses use AI due to labor productivity, time savings,and the superior quality of products and services that they can provide at a lower cost. These advantages have to be balanced with the harms of AI.

Most of the article that precedes this section has been about the AI advances that caught fire in the last few months. However, AI has been used in real life for about a decade or more. The use cases include several crucial sectors. Insurance, Finance, Healthcare and Recruitment among these are targets of regulation. This may be due to the fact that these sectors are familiar to state and local regulators.

AI use in the criminal justice system such as recidivism tracking, and parole have been proscribed in certain jurisdictions due to bias.

As the AI solutions have been rolled out in each of these sectors, the same issues that have dogged the latest versions are present. The most prominent of these can be restated as bias in the decisions reflected in the biased data used in training AI. Add to this the vulnerability to attacks due to a lack of security thinking among the AI proponents. The chief of which was data poisoning attacks.

The current regulatory landscape is more mature in many states and local jurisdictions such as New York City. Work on the federal level is led by NIST (the National Institute of Science and Technology). Tellingly, NIST is under the US Commerce department.

The European Union has been circulating a draft law for approval regulating AI since 2018. A recent amendment on the 16th of May added language on generative AI. The overall thrust is risk-based. In a risk based approach, the regulation targets different types of applications with different levels of requirements. Risk tiers are based on the perceived extent of harm of the application. This is a familiar approach, also used by NIST to address security concerns, digital identity and so on. Such an approach at the local level leads to a patchwork of regulation with different requirements. Modguard’s AI based approach to automate the discovery of these jurisdictions and their requirements is an excellent start. To get a sense of what the regulations could be in the US, two such jurisdictions are examined in detail below.

Colorado Law (SB21-169)

Recognizes the efficiencies of deploying AI systems by insurers. These efficiencies benefit both consumers and insurers. The bill prohibits unfair discrimination based on the protected classes namely race, color, national or ethnic origin, religion, sex, sexual orientation, disability, gender identity, or gender expression. This is mainly through the use of clean external data sources as input into the AI systems. The commissioner of insurance is tasked with compliance. In order to comply, each insurer is required to provide the sources, the manner in which data is used, the use of a risk management framework(RMF), to provide the assessment results as well as an attestation by the Chief Risk Officer that the RMF has been deployed and is in continuous use.

Relief is provided by granting a period to remedy the effects of any discriminatory impact as well as by using the data sources authorized by the division of insurance.

Compliance is through providing proof that the correct data is used as well as using a proper Risk Management Framework. These reports are to be submitted periodically. Since insurers usually do not expertise in these matters, it falls to AI based risk and compliance enterprises to provide the reporting.

NYC Law (Local Law 144)

This law controls the use of Automated Employment Decision Tools. In other words, automatically scanning resumes and coming up with scores on candidates, to be invited for an interview or for a promotion. This does not control human biases in the latter half of the hiring or promotion process. The requirement is to do an independent bias audit within one year of the deployment of the tool and to publicly post the results.

This seems like a move in the right direction, although the gyrations around the implementation of the law indicates the difficulty in controlling this sort of activity. This law was proposed in 2021, with many comment periods. The final rule is slated for enforcement on July 5th, 2023. The lead time for regulation lags the actual use of AEDT by several years. The fine for infractions is a paltry $1500.

Compliance

Companies like Modguard are targeting the compliance market. Compliance reporting is required by regulation. The compliance text itself is generated by an API to GPT-n or some such generative underlay. However, this approach is not cheap as the GPT API charges are by usage, each token that GPT generates costs a certain amount. Compliance reports are required to be quite prolix.

A solution that uses an open source generative tool such as LLaMa would be preferable, since it is cheaper to use. However, this open source model may take a lot of effort to deploy.

Conclusion

Building for the future also means growing our defensive capacities against the rapacity of the change that is already upon us. It is this practice and this capacity that will protect us in the future. Since it is by doing we will learn how to protect ourselves. The AI safety and monitoring tools will help shore up our defenses against that uncertain future. Our allies will be the same sentinels that we pour our energy into developing now.

Small companies like Modguard will be at the forefront of this revolution in AI risk management, as they can preserve their relative independence from the large enterprises who are deploying large scale models that need a large amount of money to deploy. They can also function as an independent auditor, providing a compliance solution at low cost for any enterprise wishing to use AI in a regulated environment.

Stay connected with us on social media platform for instant update click here to join our T witter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Technology News Click Here

Read original article here

Denial of responsibility! Rapidtelecast.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.

AI AI harms ChatGPT Extinction General Purpose technology Generative Pre-Trained Transformers GPT-4 guardrails