The Democratization of Generative AI

 

 

                    Any sufficiently advanced technology is indistinguishable from magic.

                                                                          

                                                                                                        Arthur C. Clarke

 

 

Because of their ease of use, Bard and ChatGPT have democratized the field of AI, artificial intelligence. AI simply tries to predict. The role of generative AI is to predict, only the next correct word (in context) when answering a question. But these apparently simple beginnings make available to almost everyone most of the world’s knowledge, and disknowledge, that exist on the internet.

 

To discuss how generative AI works, and to then assess its societal implications, is to begin with its philosophy, its math and programming, its business use and then its implications for the rest of society. Our readers can be assured that this discussion, and its math, will remain general. We hope that the primary motivation of our readers, here, is a simple curiosity. The investment implications of the following remain positive, but still undetermined in specific.

 

 

The Philosophy

 

There are two theoretical ways of organizing information and societies. These can be top-down, from first principles and leaders chosen by heredity; or bottom-up from the information and leaders chosen by true popular election. In cognition, there is a diametrical difference between the Platonic view that there is an ideal and secure world (for instance the world of ideal horses) that is accessible only to properly trained philosophers and the English empirical view that the mind is a tabula rasa, a blank slate, on which are impressed the details of varied human experiences. In practice, the ideal Platonic societies are simple in structure; and democratic societies, with their voluntary organizations, diverse interests, mixtures of traditional and modern practices are complicated – and confusing to those with politically totalitarian tendencies who yearn for a controlled uniformity.

 

 

The Math  

 

Those who would rather forget the math they learned in high school can skip this section and the one following.

 

Computers do not understand letters; they understand numbers. Therefore, any operation a computer makes involves math; however the math we need for this discussion is very simple, simpler than the many programming structures necessary to make generative AI work:

 

·      A vector is simply an ordered sequence of numbers, with no necessary spatial interpretation.

·      A matrix is collection of vectors, whose columns have consistent representations and whose rows are

individual vectors. For instance the two vectors: [1,3

                                                                               2,1]

·      The product of two matrices C=A x B exists only if the number of columns of A equals the number of rows of B.

·      Crucially, the dot product of A with B defines the similarities, alignments or context, between matrices A and B. That is A·B =A x Bt. A transpose is simply the mirror image of A around an unchanged diagonal, in the above vector case:

                                            [1,2        

                                             3,1]

To further explain “similarities.” Two vectors within a single matrix, say V and Z, can be interpreted to have spatial dimensions, say x and y, and an angle theta. V·Z = magnitude V x magnitude Z x cos theta. If you remember from high school trig, cos theta can vary between 1 or 0 depending upon the value of the angle theta. Thus this dot product also defines the similarities between two vectors; if they are approximately in the same direction.

 

That’s all the math we need. Now to the neural network. Ordinary computer programs are organized thusly:

 

Data  -\  A Centralized Hand Coded Function f(x) -\  A Prediction Given the Data

 

A neural net develops the prediction function all by itself by first dividing the data, like in a simple regression, into a training set and a test set. A neural net first develops a massive training set using data from the internet. Then it calculates a prediction, like the answer word that follows, from the test set - which may be your question to Bard or Chat GPT. This is a simple neural net:

 

 

 

 

Generative AI produces an answer, word for word, from the proper math vector context. For instance, a foreign policy problem is different from a math problem.

 

Test Set  -\  Trained Neural Net  -\  A Prediction Given the Test Set

 

Neural nets are only very loosely modeled after the neural nets in the human brain. The human brain contains around 100 billion neurons of more than 3000 different kinds. Generative AI systems, such as Bard, have around 500 billion to 1 trillion neurons of a single kind.

 

This is a very simple computer neural net program with an input, an output, and trained weights among the nodes that minimize output errors for a given set of inputs. If you ever studied OLS regression, this will be familiar. Note the intermediate hidden layer(s), which in larger nets, to make a Bard music analogy, adds chord progressions and rhythm to the basic melodies. The nature of the intermediate layers is a topic of causal AI research. More about this later.

 

 

Computer Programs

 

Before 2012, progress in AI had been slow, notably classifying cat videos by hand according to the top-down model. The advent of faster (sequential) computers running new neural net architectures resulted in rapid progress in the bottom-up models that have resulted in generative AI, notably Bard and ChatGPT.  In 2017, Google researchers (Vaswani et al) published a landmark AI paper catchily titled, “Attention Is All You Need.” The dictionary defines “attention” as, “concentration of the mind upon an object.” The concentration of part of a computer program upon a word (actually a numerical token) in the context of other words allows rapid parallel processing of entire sentences (tokenized) and improved translation quality.

 

The paper proposed a new Transformer architecture for AI. When we first read this paper, it was very confusing to the extreme. But we gradually came to understand it by an article that began as a blog posted by (Al-Mashhadani & Baul) of the University of Central Florida that contained the mathematics. Our notes are shown in the blue type. That blog provides further evidence that in real democracies, talent is widely distributed; and people are free to develop, as we noted in the previous essay, their “human mental faculties.”

 

So, to proceed. The Transformer architecture of AI contains three main components: an encoder of the question, a neural network, and an almost similar decoder that operates, on usually, another network - trained on the internet. 1

 

Say you have a question. The question is first transformed according to a word lookup table into numbers. Then the encoder’s self-attention transforms the words of the question into (Q,K,V) individual dense matrices, that respectively represent the word, the context of each word in the entire question, and the content.

 

 

·      Where:   Q represents the model’s current word or element focus, including

·                      the word’s familiar part of speech (e.g. subject, verb, object).

 

·                     K acts as a searchable catalog for all elements of the input sequence.

 

·                     V stores the actual content or meaning of the word to be retrieved.

 

 

The Attention mechanism then scores the importance of each word relative to others.

 

                                Attention (Q,K,V) = f(Q x Kt ) x V

 

Q x Kt  is, importantly, the dot product between Q and K. As the Google paper says, “…dot product attention is much faster and space-efficient in practice, since it can be implemented using highly optimized matrix multiplication code.” Also through the dot product between two matrices, we get also get their qualitative context.

 

The Transformer then uses multi-headed attention processed in parallel, to dynamically update an existing representation of the question that emphasizes the most relevant words and also their relationships, still processing the original question word-by-word.

 

The following diagram illustrates this process. The sentence to be scanned is, “The animal didn’t cross the street because it was too tired.” Generative AI scans each word, beginning at the top right, producing Q, K, and V vectors that are placed in the Q, K, and V matrices.

 

Then a decoder has direct access to the hidden layers of the neural network. Thus, with its own rather similar attention mechanisms, it can formulate on a word-by-word basis the proper answer from another massively trained internet neural network. That’s why generative AI works, because it can address the commonality of context. 2  Bard agrees with this, adding the importance of the human-in-the-loop (now still around 20-30% of the time).

 

 

AI and the Hidden Neural Network Layers

 

The problems in the regulation and use of generative AI reside mainly in the facts that no one is sure how it works in the hidden neural layers, Unlike normal web browsing where the users make up their minds and thus remain (more or less) in control, generative AI is authoritative and can be automatic, for instance, if connected to infrastructure. There might not be a “kill” switch for AI.

 

On 1/7/24 Fareed Zakaria asked whether people should then start “believing” AI rather than relying on Enlightenment “understanding.” An AI expert he was interviewing then suggested that we should consider the benefits of AI. (Very likely, but we also started thinking of Icarus.)

 

We think these hidden layers ought to be understood to an increasing extent, for the second phase of the Enlightenment held that the only cure for ignorance is more knowledge; thus the current system of knowledge production where exploration occurs from the known to the unknown, with some constraint.

 

 

Further Neural Research

 

What’s going on in the hidden neural network layers? AI researchers (Li, Hopkins et al, 2023) investigated Othello, a simple two person game. It consists of a starting position of four disks (two black, two white). The goal is to use the remaining disks to sandwich, and thus flip, the opponent’s disks. By feeding some of the game’s legal moves into a neural net, the researchers were able to use non-linear probe software to determine the salience (importance) of each piece for the next move, given a certain board state. The hidden layers were thus able to represent both disks and their positions on an entire board.

 

A complaint of generative AI is that it can predict only the next word in a language, but not represent objects and concepts. It is thus held to be only an auto-complete. This research indicates that objects and concepts are emergent properties of the network, representations likely proceeding from language itself; and can thus explain predictions in human terms – presenting human decisionmakers with options.

 

A founder of DeepMind, now part of Google, Mustafa Suleyman writes, “…we wanted to build truly general learning agents that could exceed human performance at most cognitive tasks.” 3 The development of AI therefore poses some fundamental questions:

 

We asked a neurologist, “What is consciousness?” He said, essentially, “That’s an open question.” On 7/1/23 the NYT reported on a meeting of 800 neuroscientists, philosophers and the curious (that would have been us) in Greenwich Village. Since there is no single location of consciousness in the human brain, the researchers wanted to test the two leading theories, the Global Workplace Theory where consciousness is “the global availability of information,” made possible by signals that reach the prefrontal cortex, a region in front of the brain that broadcasts information across the brain. The opposing theory was The Integrated Information Theory that predicts “that regions with the most active connections” – those in back of the brain – would be most active. A study conducted by a third group of experts would then decide which theory was correct.

 

The results of the experiment would not be surprising to someone who has studied history or the social sciences. Depending on the experiment, both theories were true. Lucia Meloni, a neuroscientist at the Max Planck Institute, said, “My thought is that I come from a family of divorced parents, and you love them both.” As the ancient Greeks, whose cosmology was the analysis of the universe, knew, there is a point where a simple logic breaks down called aporia, (Fr. aporie). The future is a matter of human choice; but you should take into account context, to make the correct choice.

 

Skip to another question, “What is life?” In a TV series, physicist Alan Lightman interviewed University of Chicago biologist Jack Szostak,“(We aren’t) just atoms and molecules. It’s the organization, there are layers and layers of emergent phenomena, where you have collections of molecules and sources of energy. You get interesting, new and often surprising phenomena. Its common in life and other physical systems.”

 

All three are emerging phenomena. So is complex democracy. According to the Wikipedia, “In philosophy, systems theory, science, and art, emergence occurs when a complex entity has properties or behaviors that parts do not have on their own…” Stated simply, the whole is greater (or even different) than the sum of its parts.

 

“The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. The constructionist hypothesis breaks down when confronted with the twin difficulties of scale and complexity. At each level of complexity entirely new properties appear. Psychology is not applied biology, nor is biology applied chemistry. We can now see the whole becomes not merely more, but very different from the sum of its parts.” 4

·                                                              physicist PW Anderson

 

 

The Business Implications

 

As we have been discussing, the role of generative artificial intelligence is to predict the next word from its context. As this posting shows, since 2022, this rapidly developing technology has become able to handle incredibly nuanced topics. Wall Street, we think, is correct in projecting a great future, as the following will illustrate. This technology can, with increasing accuracy, predict the next move in the complex chess game of life, provided the inputs don’t exhibit a very large degree of variance (change) - such as in the climate or in the will of a dictator. The statistical central limit theorem 5 then still holds.

 

Does that mean that it is presently a great investment? Like land in Florida, or the internet itself in the year 2000, we would wait on getting in on this opportunity. The problem, for a value investor, is this. This technology has a great future, but where is the positive cash flow? To avoid the obsolescence of their existing business models, four large companies: Microsoft, Google, Amazon and Meta are now pouring billions of dollars into the effort to develop this very expensive, as we have described in a footnote, technology. The following further discusses the future.

 

 

Implementing the Technology in Discovery

 

In 2023, (Szymanski et al) from the University of California, Berkeley introduced the A-Lab, an autonomous lab for the solid-state synthesis of organic powders. After first winnowing 24,000+ publications for existing compounds, the researchers identified 432 candidates as previously unsynthesized. In 17 days, the Lab produced 41 new compounds. This technique, automating the discovery of new compounds, will be useful in energy production and industrial materials. There is, however. a long lag between a discovery in the lab and in the commercialization of a truly new product.

 

A 2023 BBC article announced a “New superbug-killing antibiotic discovered using AI.” To train AI, McMaster University researchers took thousands of drugs whose chemical structures were known and of varying effectivities on Acinetobacter Baumannii, a WHO “critical threat.” They then used AI to extract the chemical features that were the most effective. They then applied the AI to 6,680 compounds whose effectivities were unknown. The computer identified the likely most effective. Researchers then found abaucin, an incredibly potent antibiotic.

 

 

Implementing the Technology in Companies

 

A Harvard Business Review Publication (Artificial Intelligence, 2019) noted, “To take full advantage of…collaboration between humans and AI, companies must understand how humans can most effectively augment machines, how machines can enhance what humans do.” A question to note is whether AI ought to proceed carefully from the data processing department, successful project by project, or whether the entire company should be readied for a new way of doing things.

 

A Fall, 2023 Stanford Business School magazine notes, “Even for companies that are very data – and machine learning – driven, they’re very conservative in using AI to drive pricing experimentation because of the huge liability and huge reputational risks….it’s (also) one thing to run experiments on supply chains or inventory. It’s another to run experiments on the people you manage.”

 

Generative AI seems to us, to be highly process oriented; meaning that it has to fit within the processes of the larger companies, at a large scale. The main application of generative AI to these companies is at the ideation and reshaping phases. The ideation phase involves the generation of a large number of alternatives that must be then winnowed down by humans, with a considerable amount of market or domain savvy. Important is the feasible business idea of adjacencies, which all CEOs should know anyway. A nimble consumer electronics manufacturer, for instance, should not become an auto manufacturer. The reshaping phase involves the change of business functions to accomplish new tasks with the easier access of company personnel to more company data. The Boston Consulting Group suggests a 10-20-70 percent split between choosing the right AI algorithm (there are increasingly many), getting the company data in shape (probably requiring more effort than usually thought), and getting the people and processes aligned (the most important).

 

Two other trends are ensuring data security and company curated data bases to produce better decisions in their proximate environments and markets. Getting all to occur will take time, and most large businesses act incrementally.

 

And for theoretical reasons, the statistical Central Limit Theorem (see footnote 5) suggests that the now easier to use AI is most applicable to some business functions , such as operations, that are more insulated from drastic change. It will be less useful when the change (variance) is great, such as in the financial markets - consider just the change in interest rates since 2022.

 

 

The Social Implications of the Above

 

Both the US and the EU are trying to respond to a rapidly advancing AI. AI may be somewhat like automated driving. Its regulation might seem simpler at the outset; but it develops more and more edge cases, real world situations where the rules (like those found in law) become very complicated. Some regulations, for instance the EU guidelines against the “unacceptable risks,” for instance social scoring or real-time biometric verification, make sense. To handle other cases, as they develop, it would be a good idea to keep fair and responsible humans, like referees, in the loop.

 

To maximize the role of humans, additional research to examine how generative AI makes decisions would be very useful. Thus the earlier cited neural net hidden layer research. Some researchers have suggested combining the novel Transformer networks with existing optical networks, that can localize optical features such as edges. Others have suggested localized pure Transformer networks. There is progress in this field.

 

As a practical matter, at present, we would caution on the literal use of generative AI in stock market, law, or other research; because the word-by-word construction of an apparently authoritative answer can also lead to incorrect attributions (the computer papers cited were worth following up on, but their authors were unfortunately incorrect). Thus the above paragraph.

 

Through questions to Gemini, we have been exploring how mathematical vector similarities (as defined by the dot product of two matrices, network bias, and activation functions) can translate into real causes. A substantial problem with generative AI is that the causes for the next word depend upon context, which is fine; but the context that generative AI uses is scrambled, complicated and obscure. Likely the default position is the most important; that the AI systems be well-trained with pertinent data, which makes human judgment always important.

 

Generative AI is not an oracle.

 

 

 

Footnotes

 

 

RETURN TO HOME PAGE