The
Democratization of Generative AI
“Any sufficiently
advanced technology is indistinguishable from magic.”
Arthur C. Clarke
Because
of their ease of use, Bard and ChatGPT have
democratized the field of AI, artificial intelligence. AI simply tries to
predict. The role of generative AI is to predict, only the next correct word
(in context) when answering a question. But these apparently simple beginnings
make available to almost everyone most of the world’s knowledge, and disknowledge, that exist on the internet.
To
discuss how generative AI works, and to then assess its societal implications,
is to begin with its philosophy, its math and programming, its business use and
then its implications for the rest of society. Our readers can be assured that
this discussion, and its math, will remain general. We hope that the primary
motivation of our readers, here, is a simple curiosity. The investment
implications of the following remain positive, but still undetermined in
specific.
The
Philosophy
There
are two theoretical ways of organizing information and societies. These can be
top-down, from first principles and leaders chosen by heredity; or bottom-up
from the information and leaders chosen by true popular election. In cognition,
there is a diametrical difference between the Platonic view that there is an
ideal and secure world (for instance the world of ideal horses) that is
accessible only to properly trained philosophers and the English empirical view
that the mind is a tabula rasa, a blank slate, on which are impressed
the details of varied human experiences. In practice, the ideal Platonic
societies are simple in structure; and democratic societies, with their
voluntary organizations, diverse interests, mixtures of traditional and modern
practices are complicated – and confusing to those with politically
totalitarian tendencies who yearn for a controlled uniformity.
The
Math
Those
who would rather forget the math they learned in high school can skip this
section and the one following.
Computers
do not understand letters; they understand numbers. Therefore, any operation a
computer makes involves math; however the math we need
for this discussion is very simple, simpler than the many programming
structures necessary to make generative AI work:
· A vector is simply
an ordered sequence of numbers, with no necessary spatial interpretation.
· A matrix is
collection of vectors, whose columns have consistent representations and whose
rows are
individual
vectors. For instance the two vectors: [1,3
2,1]
· The product of two
matrices C=A x B exists only if the number of columns of A equals the number of
rows of B.
· Crucially, the dot product
of A with B defines the similarities, alignments or context,
between matrices A and B. That is A·B
=A x Bt. A transpose is simply the mirror image of A around an
unchanged diagonal, in the above vector case:
[1,2
3,1]
To
further explain “similarities.” Two vectors within a single matrix, say V and
Z, can be interpreted to have spatial dimensions, say x and y, and an angle theta.
V·Z = magnitude V x
magnitude Z x cos theta. If you remember from high school trig, cos theta
can vary between 1 or 0 depending upon the value of the angle theta.
Thus this dot product also defines the similarities between two vectors; if
they are approximately in the same direction.
That’s
all the math we need. Now to the neural network. Ordinary computer programs are
organized thusly:
Data -\ A
Centralized Hand Coded Function f(x) -\
A Prediction Given the Data
A
neural net develops the prediction function all by itself by first dividing the
data, like in a simple regression, into a training set and a test set. A neural
net first develops a massive training set using data from the internet. Then it
calculates a prediction, like the answer word that follows, from the test set -
which may be your question to Bard or Chat GPT. This is a simple neural net:
Generative AI produces an answer, word
for word, from the proper math vector context. For instance, a
foreign policy problem is different from a math problem.
Test
Set -\
Trained Neural Net -\ A Prediction Given the Test Set
Neural
nets are only very loosely modeled after the neural nets in the human brain.
The human brain contains around 100 billion neurons of more than 3000 different
kinds. Generative AI systems, such as Bard, have around 500 billion to 1
trillion neurons of a single kind.
This
is a very simple computer neural net program with an input, an output, and
trained weights among the nodes that minimize output errors for a given set of
inputs. If you ever studied OLS regression, this will be familiar. Note the
intermediate hidden layer(s), which in larger nets, to make a Bard music
analogy, adds chord progressions and rhythm to the basic melodies. The nature
of the intermediate layers is a topic of causal AI research. More about this
later.
Computer
Programs
Before
2012, progress in AI had been slow, notably classifying cat videos by hand according
to the top-down model. The advent of faster (sequential) computers running new
neural net architectures resulted in rapid progress in the bottom-up models
that have resulted in generative AI, notably Bard and ChatGPT. In 2017, Google researchers (Vaswani
et al) published a landmark AI paper catchily titled, “Attention Is All
You Need.” The dictionary defines “attention” as, “concentration of the mind
upon an object.” The concentration of part of a computer program upon a word
(actually a numerical token) in the context of other words allows rapid
parallel processing of entire sentences (tokenized) and improved translation
quality.
The
paper proposed a new Transformer architecture for AI. When we first read this
paper, it was very confusing to the extreme. But we gradually came to
understand it by an article that began as a blog posted by (Al-Mashhadani
& Baul)
of the University of Central Florida that contained the mathematics. Our notes
are shown in the blue type. That blog provides further evidence that in real
democracies, talent is widely distributed; and people are free to develop, as
we noted in the previous essay, their “human mental faculties.”
So,
to proceed. The Transformer architecture of AI contains three main components:
an encoder of the question, a neural network, and an almost similar decoder
that operates, on usually, another network - trained on the internet. 1
Say
you have a question. The question is first transformed according to a word
lookup table into numbers. Then the encoder’s self-attention transforms the
words of the question into (Q,K,V) individual dense matrices, that
respectively represent the word, the context of each word in the entire
question, and the content.
·
Where: Q represents the model’s current word or
element focus, including
·
the word’s familiar part of
speech (e.g. subject, verb, object).
·
K acts as a searchable catalog
for all elements of the input sequence.
·
V stores the actual content or
meaning of the word to be retrieved.
The
Attention mechanism then scores the importance of each word relative to others.
Attention
(Q,K,V) = f(Q x Kt ) x V
Q
x Kt is, importantly, the dot
product between Q and K. As the Google paper says, “…dot product attention is
much faster and space-efficient in practice, since it can be implemented using
highly optimized matrix multiplication code.” Also through the dot product
between two matrices, we get also get their qualitative context.
The
Transformer then uses multi-headed attention processed in parallel, to
dynamically update an existing representation of the question that
emphasizes the most relevant words and also their
relationships, still processing the original question word-by-word.
The
following diagram illustrates this process. The sentence to be scanned is, “The
animal didn’t cross the street because it was too tired.” Generative AI scans
each word, beginning at the top right, producing Q, K, and V vectors that are
placed in the Q, K, and V matrices.
Then
a decoder has direct access to the hidden layers of the neural network. Thus,
with its own rather similar attention mechanisms, it can formulate on a
word-by-word basis the proper answer from another massively trained
internet neural network. That’s why generative AI works, because it can address
the commonality of context. 2 Bard agrees with this, adding the
importance of the human-in-the-loop (now still around 20-30% of the time).
AI
and the Hidden Neural Network Layers
The
problems in the regulation and use of generative AI reside mainly in the facts
that no one is sure how it works in the hidden neural layers, Unlike normal web
browsing where the users make up their minds and thus remain (more or less) in
control, generative AI is authoritative and can be automatic, for instance, if
connected to infrastructure. There might not be a “kill” switch for AI.
On
1/7/24 Fareed Zakaria asked whether people should then start “believing” AI
rather than relying on Enlightenment “understanding.” An AI expert he was
interviewing then suggested that we should consider the benefits of AI. (Very
likely, but we also started thinking of Icarus.)
We
think these hidden layers ought to be understood to an increasing extent, for
the second phase of the Enlightenment held that the only cure for ignorance is
more knowledge; thus the current system of knowledge production where
exploration occurs from the known to the unknown, with some constraint.
Further
Neural Research
What’s
going on in the hidden neural network layers? AI researchers (Li, Hopkins et al, 2023)
investigated Othello, a simple two person game. It consists of a starting
position of four disks (two black, two white). The goal is to use the remaining
disks to sandwich, and thus flip, the opponent’s disks. By feeding some of the
game’s legal moves into a neural net, the researchers were able to use
non-linear probe software to determine the salience (importance) of each piece
for the next move, given a certain board state. The hidden layers were thus
able to represent both disks and their positions on an entire board.
A
complaint of generative AI is that it can predict only the next word in a
language, but not represent objects and concepts. It is thus held to be only an
auto-complete. This research indicates that objects and concepts are emergent
properties of the network, representations likely proceeding from language
itself; and can thus explain predictions in human terms – presenting human
decisionmakers with options.
A
founder of DeepMind, now part of Google, Mustafa Suleyman writes, “…we wanted
to build truly general learning agents that could exceed human performance at
most cognitive tasks.” 3 The development of AI therefore poses some
fundamental questions:
We
asked a neurologist, “What is consciousness?” He said, essentially,
“That’s an open question.” On 7/1/23 the NYT reported on a meeting of
800 neuroscientists, philosophers and the curious (that would have been us) in
Greenwich Village. Since there is no single location of consciousness in the
human brain, the researchers wanted to test the two leading theories, the
Global Workplace Theory where consciousness is “the global availability of
information,” made possible by signals that reach the prefrontal cortex, a
region in front of the brain that broadcasts information across the brain. The
opposing theory was The Integrated Information Theory that predicts “that
regions with the most active connections” – those in back of the brain – would
be most active. A study conducted by a third group of experts would then decide
which theory was correct.
The
results of the experiment would not be surprising to someone who has studied
history or the social sciences. Depending on the experiment, both theories were
true. Lucia Meloni, a neuroscientist at the Max
Planck Institute, said, “My thought is that I come from a family of divorced
parents, and you love them both.” As the ancient Greeks, whose cosmology was
the analysis of the universe, knew, there is a point where a simple logic
breaks down called aporia, (Fr. aporie).
The future is a matter of human choice; but you should take into account context,
to make the correct choice.
Skip
to another question, “What is life?” In a TV series, physicist Alan
Lightman interviewed University of Chicago biologist Jack Szostak,“(We
aren’t) just atoms and molecules. It’s the organization, there are layers and
layers of emergent phenomena, where you have collections of molecules and
sources of energy. You get interesting, new and often surprising phenomena. Its
common in life and other physical systems.”
All
three are emerging phenomena. So is complex democracy. According to the Wikipedia,
“In philosophy, systems theory, science, and art, emergence occurs when a
complex entity has properties or behaviors that parts do not have on their
own…” Stated simply, the whole is greater (or even different) than the sum of
its parts.
“The
ability to reduce everything to simple fundamental laws does not imply the
ability to start from those laws and reconstruct the universe. The
constructionist hypothesis breaks down when confronted with the twin
difficulties of scale and complexity. At each level of complexity entirely new
properties appear. Psychology is not applied biology, nor is biology applied
chemistry. We can now see the whole becomes not merely more, but very different
from the sum of its parts.” 4
·
physicist
PW Anderson
The
Business Implications
As
we have been discussing, the role of generative artificial intelligence is to predict
the next word from its context. As this posting shows, since 2022, this
rapidly developing technology has become able to handle incredibly nuanced
topics. Wall Street, we think, is correct in projecting a great future, as the
following will illustrate. This technology can, with increasing accuracy, predict
the next move in the complex chess game of life, provided the inputs don’t
exhibit a very large degree of variance (change) - such as in the climate or in
the will of a dictator. The statistical central limit theorem 5 then
still holds.
Does
that mean that it is presently a great investment? Like land in Florida, or the
internet itself in the year 2000, we would wait on getting in on this
opportunity. The problem, for a value investor, is this. This technology has a
great future, but where is the positive cash flow? To avoid the obsolescence of
their existing business models, four large companies: Microsoft, Google, Amazon
and Meta are now pouring billions of dollars into the effort to develop this
very expensive, as we have described in a footnote, technology. The following
further discusses the future.
Implementing
the Technology in Discovery
In
2023, (Szymanski
et al) from the University of California, Berkeley introduced the
A-Lab, an autonomous lab for the solid-state synthesis of organic powders.
After first winnowing 24,000+ publications for existing compounds, the
researchers identified 432 candidates as previously unsynthesized.
In 17 days, the Lab produced 41 new compounds. This technique, automating the
discovery of new compounds, will be useful in energy production and industrial
materials. There is, however. a long lag between a discovery in the lab and in
the commercialization of a truly new product.
A
2023 BBC article
announced a “New superbug-killing antibiotic discovered using AI.” To train AI,
McMaster University researchers took thousands of drugs whose chemical
structures were known and of varying effectivities on Acinetobacter Baumannii, a WHO “critical threat.” They then used AI
to extract the chemical features that were the most effective. They then
applied the AI to 6,680 compounds whose effectivities were unknown. The
computer identified the likely most effective. Researchers then found abaucin, an incredibly potent antibiotic.
Implementing
the Technology in Companies
A
Harvard Business Review Publication (Artificial Intelligence, 2019)
noted, “To take full advantage of…collaboration between humans and AI,
companies must understand how humans can most effectively augment machines, how
machines can enhance what humans do.” A question to note is whether AI ought to
proceed carefully from the data processing department, successful project by
project, or whether the entire company should be readied for a new way of doing
things.
A
Fall, 2023 Stanford Business School magazine notes, “Even for companies that
are very data – and machine learning – driven, they’re very conservative in
using AI to drive pricing experimentation because of the huge liability and
huge reputational risks….it’s (also) one thing to run experiments on supply
chains or inventory. It’s another to run experiments on the people you manage.”
Generative
AI seems to us, to be highly process oriented; meaning that it has to fit within the processes of the larger companies, at
a large scale. The main application of generative AI to these companies is at
the ideation and reshaping phases. The ideation phase involves the generation
of a large number of alternatives that must be then
winnowed down by humans, with a considerable amount of market or domain savvy.
Important is the feasible business idea of adjacencies, which all CEOs should
know anyway. A nimble consumer electronics manufacturer, for instance, should
not become an auto manufacturer. The reshaping phase involves the change of
business functions to accomplish new tasks with the easier access of company
personnel to more company data. The
Boston Consulting Group suggests a 10-20-70 percent split between choosing
the right AI algorithm (there are increasingly many), getting the company data
in shape (probably requiring more effort than usually thought), and getting the
people and processes aligned (the most important).
Two
other trends are ensuring data security and company curated data bases to
produce better decisions in their proximate environments and markets. Getting
all to occur will take time, and most large businesses act incrementally.
And
for theoretical reasons, the statistical Central Limit Theorem (see footnote 5)
suggests that the now easier to use AI is most applicable to some
business functions , such as operations, that are more
insulated from drastic change. It will be less useful when the change
(variance) is great, such as in the financial markets - consider just the
change in interest rates since 2022.
The
Social Implications of the Above
Both the US and the EU are trying to respond
to a rapidly advancing AI. AI may be somewhat like automated driving. Its
regulation might seem simpler at the outset; but it develops more and more edge
cases, real world situations where the rules (like those found in law) become
very complicated. Some regulations, for instance the EU guidelines against the
“unacceptable risks,” for instance social scoring or real-time biometric
verification, make sense. To handle other cases, as they develop, it would be a
good idea to keep fair and responsible humans, like referees, in the loop.
To
maximize the role of humans, additional research to examine how generative AI
makes decisions would be very useful. Thus the earlier cited neural net hidden
layer research. Some researchers have suggested combining the novel Transformer
networks with existing optical networks, that can localize optical features
such as edges. Others have suggested localized pure Transformer networks. There
is progress in this field.
As
a practical matter, at present, we would caution on the literal use of
generative AI in stock market, law, or other research; because the word-by-word
construction of an apparently authoritative answer can also lead to incorrect
attributions (the computer papers cited were worth following up on, but their
authors were unfortunately incorrect). Thus the above
paragraph.
Through
questions to Gemini, we
have been exploring how mathematical vector similarities (as defined by the dot
product of two matrices, network bias, and activation functions) can translate
into real causes. A substantial problem with generative AI is that the causes
for the next word depend upon context, which is fine; but the context that
generative AI uses is scrambled, complicated and obscure. Likely the default
position is the most important; that the AI systems be well-trained with
pertinent data, which makes human judgment always important.
Generative
AI is not an oracle.