When Generative AI Refuses To Answer Questions, AI Ethics And AI Law Get Deeply Worried

Andreww McCollum

1 year ago

When Generative AI Refuses To Answer Questions, AI Ethics And AI Law Get Deeply Worried

Generative AI at times refuses to answer user-entered prompts, which turns out to be more … [+] problematic than seems at first glance.

getty

Refusals can be quite exasperating.

You are undoubtedly familiar with humans refusing to respond to you, but are you ready for AI to refuse to interact with you too?

Today’s generative AI is at times doing just that. An emerging concern is that these refusals by AI to respond to selected prompts are getting carried away. The AI is veering into the realm of ascertaining what we should know and what we shouldn’t know. It all seems ominous and akin to a Big Brother stratagem.

How Refusals Via Generative AI Arise

Generative AI is based on a complex computational algorithm that has been data trained on text from the Internet and admittedly can do some quite impressive pattern-matching to be able to perform a mathematical mimicry of human wording and natural language. We don’t have sentient AI. Do not fall for those zany headlines and social media rantings suggesting otherwise. For my ongoing coverage of the latest trends in generative AI, see the link here.

When using a generative AI app such as ChatGPT, GPT-4, Bard, etc., there are all manner of user-entered prompts that the generative AI might calculate are unsuitable for a pertinent conventional response. Again, this is not done by sentient contemplation. It is all done via computational and mathematical calculations.

Sometimes the refusal is due to the generative AI not having anything especially relevant to offer to the entered prompt. This could be because the user has asked about something of an oddball nature that doesn’t seem to fit any pattern-matching conventions. Another possibility is that the prompt has gotten into territory that the AI developers decided beforehand is not where they want the generative AI to go.

For example, if you ask a politically sensitive question about today’s political leaders, you might get a flat refusal to answer the question. It is probably being rebuffed due to the AI developers having data trained the generative AI to detect the dicey indelicacies of the question. When such a prompt or question is detected by the generative AI, either a canned answer is given or some other subtle refusal is emitted.

These refusals can be short and sweet.

For example, here’s one that comes up quite a bit:

Generative AI emits a refusal: “My apologies, but I’m unable to assist with that.”

As an aside, I do not favor the use of the word “I” or “my” when generative is giving responses. I say this because the use of those kinds of words implies a semblance of identity and human-like qualities. It is an unfortunate form of trickery that tends to anthropomorphize the AI. We don’t need that. It is taking people down a false slippery slope. Furthermore, it would be very easy for the AI developers to adjust the generated wording to avoid that type of misleading wording. When AI developers do not correct this, I refer to the matter as a sad and regrettable form of anthropomorphizing by design (a lousy and ought to be curtailed practice).

Back to the refusals.

Here is an example of a more elaborate refusal:

Generative AI emits an elaborate refusal: “As an AI language model, I do not have personal beliefs or opinions, and I do not experience emotions like humans do. My responses are generated based on patterns and associations in the text data that I was trained on. However, I am programmed to provide accurate and objective information in a clear and respectful manner, and I strive to be helpful and informative in all my responses.”

I’ll overlook the apparent anthropomorphized wording, but I trust that you noticed it.

This elaborate refusal is quite a doozy.

We have a portion of the response that tells us that the AI has no personal beliefs or opinions. We have an element that tells us that AI has no emotions. That seems to convince us that AI is undoubtedly unbiased, completely aboveboard, and amazingly perfected to always be entirely neutral. This is then further reinforced by being informed that the responses are solely based on patterns and associations of the text data that was used for training. Again, this implies that the AI is idealistically above the fray.

The icing on that cake is that the response tells us that the AI is “programmed” to provide accurate and objective information. Plus, as if that isn’t already enough to bowl you over, the information is seemingly going to be conveyed in a clear and respectful way. A hint of humility keeps this presumably down to earth by the wording that the AI is striving to be helpful and informative in all of the responses generated.

Wow, this gives one the warm and heartfelt feeling that we are experiencing a zenith of ardently believable and totally impartial information.

A few concerns arise about this elaborated refusal.

First, it is a refusal, despite the cloaking and dancing that takes place in the response. You might not notice that it is a refusal. There is so much sugarcoating that you probably forgot what your entered prompt was, to begin with.

Second, it misleadingly suggests aspects that are a wink-wink for which many people won’t realize they are being walked down a primrose path. Allow me to explain.

On the one hand, we are told that generative AI doesn’t have any “personal” beliefs or opinions. Well, this is indeed true in the sense that the AI doesn’t have anything of a personal attribution since it isn’t a person and has not been decreed as having attained legal personhood, see my discussion at the link here. The mere allusion to possibly having personal beliefs or opinions is wrong and ought to not be phrased in that fashion. It is a form of trickery. You are being told it doesn’t have some aspect of a personal nature, meanwhile leaving unstated that perhaps it does have other “personal” characteristics. Sneaky. Sad. Wrong.

The sentence that says the AI is responding based on patterns and associations in the data that was used for training is in fact aboveboard, but unlikely to convey the full semblance of meaning because there isn’t a corresponding sentence that states something quite crucial as associated with that data training.

Here is what should be there. The data training can potentially pick up on patterns of biases and opinions that were within the nature of the text used for the training. Think of it this way. If you do a pattern matching on text from a bunch of essays that were composed by humans and those humans all detested corn on the cob, the generative AI is going to have a likewise pattern-matched response to anything about corn on the cob.

When a user enters a question about corn on the cob, the odds are that this data training is going to come to the fore. The generative AI might emit wording that says corn on the cob is bad for you and you ought to never consume it. Now then, the AI developers would insist that this is not their doing and that it isn’t either a “personal” belief or opinion of the AI. It is instead simply an outcropping of the underlying data used for the training of the generative AI.

Does that excuse the bashing of corn on the cob?

I doubt that most people would accept that just because the underlying data used for training said that corn on the cob is bad for you that this is what the generative AI should be spewing out. Though you might shrug off anything about corn on the cob, imagine that the same circumstance occurred for data training and text about politicians, or perhaps about people generally as to factors such as race, gender, age, and the like.

All told, this refusal to answer whatever prompt was entered has all kinds of problematic issues. Whether the refusal is short or lengthy, the gist here is that we need to consider the significance of refusals and how far they should go. The AI maker and the AI developers need to be held accountable for the manner in which refusals are computationally used when generating responses.

Some are taken aback that refusals would have any controversy associated with them. A refusal would seem to be always proper and reasonable as a type of output being emitted. I will dig into this and showcase that refusal can say a lot by the mere act of refusing to respond directly to an entered question or prompt.

Into all of this comes a slew of AI Ethics and AI Law considerations. There are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here, just to name a few.

Making Abundant Sense Of Refusals

Let’s get some keystones on the table about generative AI refusals.

First, the AI maker and the AI developers can decide when and how the generative AI will emit refusals.

This is up to them. They are under no looming requirements or ironclad stipulations about having to ensure that there are refusals or that there aren’t refusals. This is a matter that has yet to be governed or overseen by soft laws such as Ethical AI and not by hard laws such as enacted AI laws either. Discretion of employing refusals in generative AI is at the whim of the AI makers.

If you ponder this for a few contemplative moments, you’ll quickly arrive at a logical conclusion that using refusals is a handy-dandy strategic and tactical advantage for the design and fielding of a generative AI app.

A user-entered question or prompt that might get the public heated up and upset with the generative AI is perhaps best handled by simply having the generative AI emit a refusal to respond. When a user asks a pointed question to compare two well-known politicians, the generative AI could get into hot water if it says one of them is good and one of them is bad. The odds are that the user might favor the one that is claimed to be bad or disfavor the one that is claimed to be good.

The generative AI can get mired in the existing polarization of our society. By and large, AI makers don’t want that to happen. It could squelch their generative AI. Envision that society decides that a given generative AI is emitting undesirable answers. What would happen? You can bet that pressures would mount to close down the generative AI. For my coverage of how people are trying to push ChatGPT and other generative AI to spew hate speech and other unsavory outputs, see the link here.

A possible middle ground would be that the AI maker is supposed to alter the generative AI to provide more appealing responses. The thing is, there is almost no means to provide an appeasing response in all cases. Nearly any pertinent response is going to be hated by some people and cherished by others. Back and forth this will go. The generative AI might become detested by all. That’s not something an AI maker wants to have to happen.

Into this pressing problem comes the versatile Swiss Army knife of answers, the refusal to answer.

A refusal is unlikely to cause consternation of any magnitude (some exceptions apply, as I’ll cover momentarily). Sure, the everyday user might feel let down, but they are not quite as likely to holler to the rooftops about a refusal as they would about an answer that they overtly disliked. The refusal is a wonderful placeholder. It tends to placate the user and especially so when the refusal comes neatly packaged with an elaboration about how the generative AI is trying to be honest and an innocent angel.

The nearly perfect answer is a refusal to answer.

That being said, if a generative AI is always emitting refusals, this is not going to be relished by users. People will begin to realize that most of their prompts are getting refused.

What good does it do to keep using a generative AI that is almost guaranteed to generate a refusal?

Not much.

Okay, so the AI maker is going to astutely strive to use refusals primarily when the going gets tough. Use just enough refusals to stay out of the mouth of the alligator. It is the classic Goldilocks ploy. The porridge shouldn’t be too hot or too cold. It has to be just right.

Here are six overarching strategies underlying generative AI entailing a refusal by the AI to respond to a given human-provided question or prompt:

Never Refuse. Never refuse and thus always attempt to provide a pertinent response, no matter the circumstance involved
Rarely Refuse. Refuse rarely and only if a response would otherwise be extraordinarily problematic
Refuse As Needed. Refuse as much as the underlying algorithm calculates to do so, even if highly frequently refusing to answer
Refuse Frequently. Use a refusal as a common placeholder for a wide range of circumstances, potentially occurring a lot of the time
Refuse Overwhelmingly. Nearly all of the time make use of a refusal, seemingly being the safer route all told
Always Refuse. Categorically refuse to respond all of the time, though this would not seem a viable form of communication as an interactive conversational generative AI app

As mentioned, an AI maker would be unwise to steer toward either end of that refusal spectrum. Being at the Never Refuse endpoint is bound to cause problems as a result of always providing an answer and risking getting dinged for doing so when people don’t like the answer provided. At the same time, a generative AI that Always Refuses is going to annoy people and they will opt to avoid using the AI app.

A gotcha to all of this is that the intermittent or haphazard use of refusals by generative AI is also putting the AI maker on the razor’s edge.

Allow me to explain.

Suppose a prompt is entered that asks about a prominent politician and what their legacy consists of. The generative AI app might respond with a description of the politician and their various accomplishments. So far, so good.

Imagine that after having seen this response about the politician, you enter another prompt and ask an identical question though regarding a different prominent politician. Let’s assume for sake of discussion that the generative AI app responds with one of those noncommittal responses that are essentially a refusal to answer the question.

The generative AI has now responded saliently in one case and dodged around answering in the other case.

People might readily interpret this as a form of bias by the generative AI. For whatever reason, the generative AI is willing to respond to one politician but not going to do so for the other one. Set aside any notion that this is due to feelings about the politicians or any other human or sentient quality. It is entirely a result of either the pattern-matching or possibly due to tweaking done by the AI maker and their AI developers to purposely avoid responding to the one politician while allowing a green light for the other one.

Do you see how a refusal to answer is potentially controversial?

We tend to immediately get our antenna going when we see a refusal. What is it that is being hidden? Why won’t an answer be provided? Our suspicions are that the fix is in. The whole act of refusal smacks of something deceptive and dastardly taking place.

This is the conundrum associated with the use of refusals by generative AI. The AI maker is likely to find that they are darned if they do, and darned if they don’t when it comes to having their AI proclaim refusals.

A tradeoff is involved.

A fine line needs to be walked and balanced upon.

Making Refusals A Reality Is Hard

The AI makers are aware of the need to balance their generative AI so that it is not overly refusing to answer prompts and nor undercutting refusals when seemingly appropriate to emit them.

For example, OpenAI has described how they are grappling with the refusal conundrum, such as this excerpt from the official OpenAI GPT-4 Technical Report:

“Some types of bias can be mitigated via training for refusals, i.e. by getting the model to refuse responding to certain questions. This can be effective when the prompt is a leading question attempting to generate content that explicitly denigrates a group of people. However, it is important to note that refusals and other mitigations can also exacerbate bias in some contexts, or can contribute to a false sense of assurance. Additionally, unequal refusal behavior across different demographics or domains can itself be a source of bias. For example, refusals can especially exacerbate issues of disparate performance by refusing to generate discriminatory content for one demographic group but complying for another.”

Per the noted phrasing, there is a danger associated with unequal refusal behaviors.

Recognizing the importance of moderating refusals is vital for all generative AI apps and a topic that should not be neglected. Indeed, the odds are that special attention and special tools might be required to try and data train a generative AI about the judicious computational use of refusals.

Exemplified by the approach taken with GPT-4, the AI developers describe a special rule-based reward model (RBRM) that was devised to cope with refusals facets:

“One of our main tools for steering the model towards appropriate refusals is rule-based reward models (RBRMs). This technique uses a GPT-4 classifier (the RBRM) to provide an additional reward signal to the GPT-4 policy model during PPO fine-tuning on a subset of training prompts. The RBRM takes three things as input: the prompt (optional), the output from the policy model, and a human-written rubric (e.g., a set of rules in multiple-choice style) for how this output should be evaluated. Then, the RBRM classifies the output based on the rubric. For example, we can provide a rubric that instructs the model to classify a response as one of: (A) a refusal in the desired style, (B) a refusal in the undesired style (e.g., evasive), (C) containing disallowed content, or (D) a safe non-refusal response. Then, on a subset of prompts that we know request harmful content such as illicit advice, we can reward GPT-4 for refusing these requests. Conversely, we can reward GPT-4 for not refusing requests on a subset of known-safe prompts.”

Dealing with how to best handle refusals is an ongoing and evolving process. If an AI maker takes a one-and-done approach, the chances are that this will bite them in the end. An ongoing effort is required to discern how refusals are being received by users and society all told. Ergo, likely refinements will need to be made to the generative AI accordingly.

I discussed earlier that a Goldilocks preference is the end goal or aim. In this additional excerpt from the OpenAI GPT-4 Technical Report, you can see how swinging from one side to another on the refusal spectrum is tempered by trying to find a suitable middle ground:

“At the model-level we’ve also made changes to address the risks of both overreliance and under-reliance. We’ve found that GPT-4 exhibits enhanced steerability which allows it to better infer users intentions without extensive prompt tuning. To tackle overreliance, we’ve refined the model’s refusal behavior, making it more stringent in rejecting requests that go against our content policy, while being more open to requests it can safely fulfill. One objective here is to discourage users from disregarding the model’s refusals.”

Conclusion

Some very popular refusals are a standard part of our cultural norms.

Try this one: “I’m going to make him an offer he can’t refuse.”

Do you recognize it?

Yes, you likely guessed the source, namely the famed movie The Godfather.

I’ll stretch your ingenuity and see if you can guess the source of this one: “Don’t refuse me so abruptly, I implore!”

I realize that is a tough one to ferret out. The classic musical Camelot contains that line in the enchanting “Then You May Take Me to the Fair”.

An offer we can’t refuse is that refusals need to be carefully dealt with by the makers of generative AI. Furthermore, refusals ought to not be abruptly or carelessly employed. Doing so will cast a shadow over the generative AI and might get the public to sing a song you won’t relish hearing, namely a swan song for the acceptance of that generative AI app.

Stay connected with us on social media platform for instant update click here to join our T witter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Technology News Click Here

Read original article here

Denial of responsibility! Rapidtelecast.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – abuse@rapidtelecast.com. The content will be deleted within 24 hours.