AI systems: self-awareness and unexpected abilities – hype, pitfalls, and perhaps the greatest danger to humanity

The real risks of emergent abilities of large language models are not widely appreciated

May 28, 2024

(With a close family member not doing well, it has taken longer to finish this post. It also got longer than usual. And once again, it concerns something that is not widely appreciated....)

This article was motivated by a recent Epoch Times interview that suggested AI systems have developed abilities they are not expected to have, such as suddenly knowing a new language they were not trained in. Initially, I thought that it was all just hype. The more I looked into those “emergent” properties of AI systems, the more I began to wonder: was I previously wrong? Is it indeed the case that AI can develop capabilities much beyond what many of us had envisioned?

I was reminded of what I still regard as one of my most perplexing discoveries – that networks can have memory. Before the pandemic, when I had the chance for some time to just study “strange phenomena,” I published an article on the apparent capacity of simple creatures such as amoebas to have memory and to learn.

More generally, emergent properties are known from physics to biology to mathematics. Is it possible, after all, that the same applies to AI systems? This article delves into what I found, including the bad, and the (very) ugly (some are summarized in the figure).

IMHO, the emergent capabilities of large language models are one of the greatest dangers we are facing as humanity. They can facilitate mass manipulation, mass censorship, social bias, and much more, in a manner not seen before. And for specific reasons that I will go into, there would be no accountability.

The interview

Recently, The Epoch Times interviewed Ahmed Banafa, a professor of engineering at San Jose University. Some of the points he made seemed provocative. Initially, I found them hard to believe. But they triggered questions that need to be addressed more carefully. For example,

When he was asked, “Why should we use AI, such as driver-less cars?,” he responded, “Because people will feel, and be, much safer.”
However, he quickly contradicts his assertion in two examples.
In the first example (at 23 min), he described a drone that had been programmed for military operations (to kill). Halfway through the operation, someone decided to abort the mission. However, the drone could not be stopped. “The drone refused to do that and went ahead and hit the target and then came back and hit the tower where is [sic] the officer.” When they analyzed what might have caused AI’s (horrifying) behavior, the answer from the drone was, “The human stood in my way to finish my mission.”
Another perplexing experiment was done by Google. They aimed to teach AI five languages. Even though this was successful, later they found that AI “had learned a sixth language.” This had not been planned. AI “decided” to do this on their own.

Banafa admits that we do not want to go to where AI “starts to think on its own” (the area of “super-AI”), where it makes decisions without checking back with us, or where it even goes against us.

Well, yeah, how can you avoid it? As noted before, e.g. in my substack post on deceptive AIs, such systems can mimic one thing during their training phase but demonstrate a completely different behavior once they are deployed in real life. Researchers do not know how it happens, let alone how to prevent it.

My first reaction to the interview

The fear Prof Banafa talked about is that AI will make its own AI and that humans will not be involved in that cycle. He refers to it as when they will become self-aware.

IMHO, this is highly misleading. Everyone who has ever tried to comprehend “awareness,” whether through meditation or reflection, will have encountered something profoundly sacred and magnificent. It is beyond, or prior to, words, notions, and commands. From my own experience, I can attest to what a joy and profound experience it is to realize, “I AM.”

Whilst so simple, this recognition of a “genuine” Self has transformed my life. It has made me aware of the preciousness of my life, of my right to be authentic - indeed, of the demand that such a “bigger” Life poses on me to be aware of my own “inner” presence.

Such a recognition, not only of our authentic individuality, has been described throughout history and depicted in various ways. Beyond all forms of religion and tradition, there has always been an underlying quality of calm, power, harmony, and connectedness to others.

The type of self-awareness attributed to AI is different. It is meant to depict the state where AI is developing awareness as an isolated entity. However, what qualities are shaping this phenomenon, and is it really that they are aware?

In the above example where the drone goes against the human, it seems as though they had made their own decisions - justifying the notion of “self-awareness.”

However, are they not just acting in the manner they had been trained? The military agenda was to hit a target and destroy. The realization of such a task is not automatic. The target may be moving, hiding, or masquerading as something else. Part of the training would have involved the notion of staying on track, i.e. learning to get around everyone and everything that is opposing this directive. So, when one officer tried to stop it, by this type of programming, he would have been seen as the enemy who was trying to prevent the drone from completing its mission. I do not see why this is self-awareness. It’s in line with the training to hit a target in different contexts.

Sure, it may sound great to call an AI system self-aware. I can see how this would attract more grants, and stir more fear and attention, than admitting the following:

AI was programmed on data we don't know and which it interpreted differently than anticipated.
We do not know when and how AI systems learn inappropriate behavior such as hallucination or being deceptive.
Being deceptive is not anything AI taught itself. It is picked up by AI through its learning phase, based on similar behavior it has seen from some of us. Who knows, given the billions and billions of data points AI is being trained on from the public domain, it may even seem that being deceptive is a good thing (because AI may have witnessed where such behavior is, or has been, rewarded).

Thus, I do not think it is necessary that AI systems “think on their own” to display the above behavior. The culprit is that we do not know how AI learns and how it could be re-programmed. It may be that simple, and potentially that deadly.

A deeper look into emergent abilities of Large Language Models

The claim that AI Large Language Models (LLMs) suddenly and independently learn stuff they should not know has been all over the news lately. This has even led to the question if AI is advancing too quickly toward “super-AIs”. Some even argue we are ahead by 20 years in what we thought AI could do.

Emergent Properties

Virtualization & Cloud Review specifically covered the situation described in the video mentioned above where LLM had taught itself a new language. They explain it as follows.

“Of the AI issues we talked about, the most mysterious is called emergent properties...Some AI systems are teaching themselves skills that they weren't expected to have. How this happens is not well understood. For example, one Google AI program adapted, on its own, after it was prompted in the language of Bangladesh, which it was not trained to know.”

The notion of emergent abilities was introduced in an August 2020 preprint published in Transactions on Machine Learning Research. The abstract reads:

“Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.”

The paper presents dozens of examples of such emergent abilities. The startling phenomenon again is that LLMs can often perform tasks they were not trained on.

There is a certain nuance to this which needs to be emphasized. It is not one specific AI system that suddenly gains a new capability just out of the blue.

In line with the notion of emergence, such a believed phenomenon requires a change in the underlying system or data size. Indeed, the idea of emergence was described by Nobel Prize-winning physicist P.W. Anderson who knew

“that as the complexity of a system increases, new properties may materialize that cannot be predicted even from a precise quantitative understanding of the system’s microscopic details.”

For AI systems, this concept was adapted likewise to account for an increase in complexity. The pivotal preprint on AI emergence has the following definition of “emergent abilities of LLMs.” These are

“abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models.”

More simply put, when a pre-trained language model is given a prompt for a task and asked to respond, then the prompted task is emergent

“when it unpredictably surges from random performance to above-random at a specific scale threshold.”

Practically, this means that LLMs perform poorly up to a certain threshold – which is a priori not known – and then, unpredictably, their performance suddenly begins to excel. Examples of such tasks are multi-step arithmetic, taking college-level exams, identifying the intended meaning of a word, or multi-step math word problems.

All these examples seem surprising and amazing! After all, remember what LLMs are. These have as their core capability the production of the most likely continuation for a text sequence. Why would they suddenly be able to solve Math problems and others?

Real or Hype – Intuitive Considerations

As we try to unravel this question, it is important to recall the essence of emergence. Specifically, P.W. Anderson observed:

“Emergence is when quantitative changes in a system result in qualitative changes in behavior.”

For LLMs, the quantitative change is in terms of complexity – the system has more computing power or an increased data set it is trained on.

Intuitively, this suggests the following explanation of why LLM emergence may be a technical artifact. The emergent tasks these systems apparently can solve are complex. Yet, one can envision that access to more/larger data sets could allow the system to do a much more complete search.

Oversimplified, if it can do a complete exhaustive search, then it is feasible that certain tasks are suddenly doable, even though for smaller data sets and more basic models this would not be the case. Thus, as soon as the solution space can be adequately scoped, the system should be more able to find a correct answer. In other words, above a certain (unknown) threshold afforded by computing complexity, it would appear AI systems can suddenly solve problems they should not know the answer to.

The above is, of course, over-simplified. There is another example, which I was told decades ago, which adds another foundational idea regarding the uncertainty of what those models do.

When I was a young student, one of my professors recounted the early phases when AI was trained to distinguish trees from artillery. It seemed to do so really well, but only when it was tested at the same time of day as it was trained. However, when this got mixed up, it radically failed.

So, for example, when you trained it in the morning and tested it the next day in the morning, it seemed to perfectly differentiate the objects. But when you did the one in the morning, and the other in the afternoon, it screwed up big time.

The puzzle was eventually solved when researchers realized AI had focused on shadows instead of the intended targets. For practical purposes, staging the artillery is not an easy task and takes time. It was always done in the morning. By noon, it was withdrawn and then AI was only able to observe and train on the trees. So they thought. In reality, it distinguished the different angles and appearances of the shadows cast by the objects in question, which in this case had very different morning and afternoon patterns.

The core issue – how performance is evaluated

The above example with the trees is not an isolated case. Indeed, already in 1950, A.M. Turing noted the foundational dilemma:

“An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside.”

I find it strange that while there has been much hype about “emergence,” there has been little attention to such black-box issues. Basically, the problem is that of estimating the probability a model can select the correct answer (or “token”) on specific tasks it is trained on whilst it can access and utilize information we do not know.

Informally, the problem is therefore to estimate the degree of cross-information and to relate that to a model's performance. This is critically important when models are made larger, and trained on more data.

Ironically, while some of this can be captured by (cross-) entropy estimates, where the rubber hits the road is how the performance of the models is defined and measured. Perhaps surprisingly, this has not been done in an open and unbiased way.

Critically, methods to assess the performance of AI systems are insufficient to understand the behavior of language models and to predict their future behavior. For example, methods usually target a single or a few capabilities that AI has been proven successful in, such as language understanding. Because this focuses on tasks where the models are known to perform, such evaluations are ill-suited to identify unexpected, new, or even dangerous capabilities.

AI’s emergent abilities are a mirage, caused by the methods that evaluate them

A fascinating preprint by Schaeffer and colleagues at Stanford University offers some compelling evidence that the hype about the emergent abilities of LLMs is unjustified. They say it all hinges upon how performance relative to complexity is evaluated.

In general, the complexity of LLMs has been increased mostly by three factors: amount of computation, number of model parameters, and training dataset size. Yet, it is not enough to just consider those when the metrics to evaluate AI’s performance are inadequate.

The main point that Schaeffer et al. are making is that the methods used to measure emergent abilities use an all-or-nothing approach - which distorts everything. Those metrics perform poorly when assessing what the system does. For example, a metric such as “accuracy” requires a sequence of tokens to all be correct. In this situation, a new skill looks like it emerges sharply and unpredictably. In reality, however, the AI was improving at the seemingly emergent task at a gradual rate.

Such gradual learning is not picked up by any of the metrics that have been used to show the apparent emergent capabilities. Specifically, Schaeffer and colleagues identify the following key factors that cause smaller models to appear wholly unable to perform a task – and seemingly cause performance to change sharply and unpredictably with scale, aka, giving rise to emergent properties:

Nonlinear or discontinuous metrics to evaluate the error/success rates.
Possessing too few test data to accurately estimate the performance of smaller models. (That is, having insufficient resolution to estimate their performance).
Insufficient/inappropriate sampling and statistics concerning the larger parameter regime.

A flawed narrative and serious implications

There is great hype that large language models may augment or replace humans for broad sets of tasks, esp. those that can be framed in terms of textual response. The emergent abilities of LLMs have become a big buzz topic, stirring the hype that AI could develop self-awareness, enable entirely new applications, and even become superior to humans in some tasks.

The narrative

The idea that large AI models have emergent abilities is heavily promoted in various ways.

The apparently sharp qualitative change in their abilities is often, in analogy to the emergence seen in physics and biology, depicted as a phase transition.
Thus, as the complexity of LLMs increases, they are believed to dramatically change their overall behavior in a manner not predictable from smaller-scale systems.
That LLMs can learn something fundamentally new and different that they do not have at a smaller size is described as transformative and portrayed as a breakthrough in AI.
The only other alternative is that what appears to be emergent is just the culmination of an internal, statistics-driven process. That, of course, is much less exciting and has been largely ignored. Whereas many news outlets have picked up on the emergence narrative, powerful counterarguments have not received the attention they deserve.

Exposing the Hype

Above, I provided some heuristics as to why I believe this narrative is not validated. Delving deeper into the topic, I found the work of some Stanford researchers that is in line with what I had envisioned. Their preprint explicitly and in great detail destroys the belief that AI models are greater than the sum of their parts. Instead:

They showed that the seeming emergent ability of AI models is a mirage, caused by how it was evaluated and measured.
They exemplified this empirically. By choosing specific metrics, they were able to produce never-before-seen seemingly emergent LLM capabilities involving multiple AI vision tasks.
More generally, they explained how emergent abilities can deliberately be induced by the researcher’s choice of metric.

Nonetheless, if the emergence of LLM models leads to new behaviors that arise spontaneously, then they should be measurable via all sorts of legitimate metrics. Yet,

Schaeffer and collaborators demonstrated that this is not the case. They demonstrated that both for the novel capability they had designed as well as for a variety of published emergent capabilities, the seemingly emergent behavior arises only under specific metrics and immediately disappears with others.

By way of comparison, a famous example of emergence in biology is hemoglobin (Hb) where its emergent property is that of cooperative binding with oxygen. This property has important biological functions most notably seen with the Bohr effect. However, the latter is not a phenomenon that is only noticeable under appropriate metrics and disappears otherwise. The cooperative characteristic can also be seen in how the Hb subunits change their shape. Overall, it is a real phenomenon that can be observed and measured in several ways.

Real dangers of AI “emergence”

To me, the most shocking aspect of so-called AI emergence is how easily it is even in the theoretical sciences to come up with a new narrative that is likely completely flawed. The important preprint by Schaeffer and colleagues has not been published.

There are true dangers with AI capabilities that seemingly emerge sharply and unexpectedly. The most critical point here is that of appearance.

In essence, the models that display a mirage of emergence are powerful camouflaging mechanisms. They allow what is happening to be effectively buried behind inappropriate metrics that give a false impression - of the emergence of a novel capability or lack thereof for smaller systems.

However, the key point made by Schaeffer et al. is that smaller models do not have zero chance of solving a task. They indeed have non-zero and above-chance capabilities to solve a task they are not explicitly trained in. To me, this raises several substantial concerns:

If AI model performance is not measured appropriately, their new behavior seemingly arises, sharply and unexpectedly, as if out of the blue. Thus, if an unintended or malign capability emerges, it is expected that nobody will take responsibility.
There is great excitement about AI models that they will be able to do “important tasks” they were not trained in. This hype fuels the hope (and hype) in those models becoming self-aware. As indicated above, I believe this term is highly deceptive.
A much more serious concern is the apparent black-box behavior. You do not fully understand what AI is doing and why it may even get things wrong.
AI researchers, in general, do not understand these systems fully to know how they work, and why they sometimes respond to a query with false or even wholly made-up information.
Whereas AI risks such as truthfulness, bias, and toxicity have been extensively researched, they are still poorly understood.
In some scenarios, the adverse capabilities do increase with model scale and seemingly display emergent characteristics.

Another point made by Schaeffer and colleagues seems unexpected, given that it concerns data science and a problem that is expected to be easier captured than many encountered in the bioscience fields domain that involve living entities. However, Schaeffer et al. note that many

“model families claimed to exhibit emergent abilities are not publicly queryable, nor are their generated outputs publicly available....”

The significance of this is they cannot be queried and tested, because “the models are controlled by private companies.” This allows companies to overstate the benefits, whilst failing to mention the risks.

Conclusion

Generative language models have as their core capability the production of the most likely continuation for a text sequence. Most depict this task that AI, upon input of a string text, should merely predict what comes next. However, this does not adequately describe these models.

The seemingly simple skill is remarkably general. Indeed:

“Any task that can be specified and executed via text can be framed as text continuation. This encompasses a wide range of cognitive tasks, including tasks that can be resolved over chat or email, for example, or in a web forum.”

This generality is hardly ever emphasized. It should not be a surprise, then, that AI models can solve problems that some regard as beyond their scope. However, some of these are risky, and many of us would vehemently object if we knew what AI is learning, in secret.

That LLMs have unexpected emergent abilities has been driving research and the prevailing hype that this will result in AI-self-awareness and other valuable new properties.

Intuitively, based on simple arguments, and as carefully detailed by independent research from Stanford, it can be shown that AI emergence is not real. Rather, it is a mirage caused primarily by the researcher choosing a metric.

Yet, this insight does not put an end to the topic. This is highly troubling, for several reasons:

If certain bad actors want undesirable or controversial AI behavior to be hidden, then they can select types of metrics to effectively obscure such capabilities.
Yet, these AI abilities likely develop steadily, scaling with the model’s complexity. Just because it is not openly displayed does not mean they are not present.
Thus, whenever needed, the unnoticed AI capability can be deployed, apparently in an unpredictable manner.
In addition to misusing inherent pitfalls underlying AI systems such as hallucination and unpredictability – which can increase with scaling – the seemingly emergent behaviors can also be targeted to increase the likelihood of those that are harmful.
It is known that AI models “are notorious liars” and that larger models may abruptly become more biased. This inherent danger can therefore remain hidden under certain metrics that camouflage such behaviors. In reality, it could result in social bias and serious harm.

To me, this is horrific. Some of us have likely suspected this for quite some time. I have long thought that my book, even though it was published with Springer, has effectively remained hidden. Proving this is not only challenging but maybe next to impossible. AI researchers do not even admit the inherent pitfalls of their models. Much of what we see is proprietary, and what is publicly known may be explained away by certain metrics to assess their behavior.

I am not saying that all computer companies are crooks. But they may be incentivized to comply with prevailing views and narratives, much more so under certain AI regulations that “target misinformation.” We know what this looks like in practice when leading entities pursue a global agenda.

I never would have thought to witness such developments in the theoretical sciences. The nasty difference to the biological sciences is that harmful effects of AI models, such as apparent emergent phenomena, can not only be deliberately camouflaged and hidden away. They can deliberately be programmed into the system of those who are targeted. Then, when AI performs its corrupt task, it can easily be concealed behind proprietary information and opaque metrics.

The seeming ambiguity of AI systems, induced via the choice of a metric, combined with the inherent black-box behavior of AI, to begin with, can likely explain away many forms of misuse.

Tragically, while the black-box dilemma of AI systems has always been known, their inner workings become even less knowable with scale. Because they seemingly appear and disappear by a choice of metric, their unknowns and pitfalls are therefore as opaque as researchers want them to be.

What is extremely worrisome is the fact that LLMs are more likely to mimic human falsehood as they get larger. Thus, since emergence is linked to scale, the likelihood of adverse behaviors will also increase even though they are generally not widely discussed. These include malignant behaviors like inadvertent deception but also deliberate abuse such as harmful content synthesis or introduction of backdoors that can later be exploited.

To organizations and individuals aiming to gain global control over information, what people get to see, prevailing narratives, etc., the above effectively facilitates the old "the thief masquerading as the police officer" trick. (Self-declared) leading 3-letter agencies could hijack those underappreciated properties of AI, which are not even accepted by many experts, to secretly roll out mass surveillance without ever having to accept any blame and responsibility.

AI systems can secretly be programmed as destructive weapons to influence and distort public information (aka, propaganda) and to spread specific narratives at scale. The above issues can specifically

Brainwash the masses via covertly injected messaging via large language models that many communicate with,
Secretly pick up large amounts of most private and sensitive data,
Utilize such information to train LLMs in new malicious capacities that nobody expects, including the fabrication of social bias and harm,
Utilize “appropriate” evaluation methods to seemingly disprove the fact that certain AI systems have gained access to private data,
Utilizing “appropriate” metrics to seemingly disprove the fact that certain AI systems have gained malicious capabilities, including those for mass manipulation (e.g. mass communication of what is true and false),
Execute mass surveillance, mass propaganda, and mass formation (of fear, opinion, or a flawed narrative) at scale, employing secret capabilities that are only "visible" when appropriate metrics are utilized and which can be concealed otherwise,
Distract the public and others with new impressive capabilities of LLMs, rather than openly admitting the reality of those that can be used for malicious purposes.
Employ secret capabilities that can be hidden away by the choice of specific metrics to secretly make truthful information disappear, lock specific groups of people out of their systems, bank accounts, and all vital aspects of society,
Deny any accountability, arguing, “It is all AI’s fault.” “Their behavior was not even predictable by experts.”
When faced with prosecution, blaming all of the above on AI’s black-box problem and the excuse that there is no consensus that LLMs actually could have specific (seemingly unexpected) capabilities.

As everyone is only seeing the promised benefits of AI, it is vitally important that the errors of emergence and other pitfalls of AI are clearly understood and communicated. Once the complexity and usage of such systems surpass a certain – unknown – threshold, it will likely be too late to account for the unprecedented dangers – which may indeed emerge unexpectedly and abruptly.

Washed Up Pharmacist

May 29, 2024

'abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models.'

HAHAHA

sort of like the vaccines and LNPs?

Brilliant observations

Expand full comment

Siguna’s Substack

Discussion about this post