Language models as collective intelligence

Throughout our cultural evolution, refinements in the ways we share information have enabled societal shifts. Generally, new mediums which increase individual agency are correlated with expanding collective knowledge. Agency is, in large part, a function of information — the world is complex, and we are limited in our ability to understand it. Tools that enable us to think together enable us to outsource intellectual and productive exploration to the collective and individually benefit from consuming the result.

Major refinements to sharing information have come via language, writing, the printing press and the internet. In each case, more agency has led to more productive societies. The first 30 years of the web has been about divergence; about empowering individuals to explore their niche by connecting them with others who share it. In doing so, it has exponentially increased the amount of information in the world. Language models have recently arrived as a tool for synthesising that information, and promise to enable us to wield the full capacity of the internet from a universal interface.

“…the ‘message’ of any medium or technology is the change of scale or pace or pattern that it introduces into human affairs.”Understanding media (McLuhan, 1964)

In the short term, language models are a new medium for creative expression. In the longer term, I’d like to argue that they are a tool for building a more cooperative internet, and with some luck, a tool for building more cooperative societies.

Language models synthesise collective knowledge

“Every day, we land on the internet without a map. Instead, search is the dominant wayfinding paradigm. It is the information equivalent of exploring the local area at ground level. A search is a hypothesis, an instance of trial and error. With enough searches, we can usually get where we're going. However, we lack the context of how everything fits together...” — Me (2021)

Language models are a synthesis tool. Their key feature is to compress information — to start with an internet of information, and reduce it to a consumable artifact. The first wave of capabilities that this makes possible — search, creative generation etc — are increasingly obvious by the day. What’s less obvious is what is happening behind the scenes, and why this is important.

Consider DALL-E or Stable Diffusion: start with a prompt, and finish with an image. To make this possible, models are trained on enormous amounts of images, each originally produced by a human in some form. Each image has a label that describes aspects of its style, composition and subject. Models extract features that are common in many images with similar labels. When you write a prompt, the model measures the relationship between the words you use and features it has extracted, and combines high-scoring features into a new image. In other words, models compress information by combining concepts from many images into something new.

Inside the black box — from "Visualising and understanding convolutional networks" (Zeiler & Fergus, 2013)
Inside the black box — from "Visualising and understanding convolutional networks" (Zeiler & Fergus, 2013)

Now consider what happens when an artist produces a piece of work. The techniques used — brushstrokes, colour schemes, shapes, materials (“features”) — are acquired as tacit knowledge through lots of practice over time. The art produced is some emergent combination of these techniques as a conduit for externalising an internal mental state. The artist’s mechanical process is to take accumulated information (techniques and experience), and combine it into art.

Artists compute over their experiences to produce pieces of art.
Artists compute over their experiences to produce pieces of art.

Language models are able to approximate art because the procedural mechanism is compression — merging features together. The difference is that language models are carrying out the combinatorial process at a higher level of abstraction. Rather than computing over a single perspective, as is the case with an artist, language models integrate over the experiences of many artists. Users can then manifest new images from the latent combinatorial space.

We all have internal mental states that we’d like to represent with art. Arguably, the difference between artists and non-artists is that artists have the tacit knowledge to make their ideas real. Language models close this gap, enabling ‘unskilled’ individuals to realise their ideas without having to spend years acquiring the techniques and experiences. To a limited extent, models subsume and compress artistic tacit knowledge and make it wieldable via prompts.

“…the content of any medium is always another medium.”Understanding media (McLuhan, 1964)

Each prompt, then, is a recursive computation over increasingly abstracted human inputs, collapsing thousands of dimensions into a human-readable artifact. For now, there is nothing really ‘artificial’ about it. What is really happening is that users are empowered by collective intelligence — a medium within which to wield the combined skills and experiences of millions of artists around the world. Models are a mechanism to rescue knowledge from individual brains, and make it collectively useful.

Language models are a humanising technology for informational empowerment.

Upwards information and downwards causation

“Reading used to be reserved for the clergy, to hand down unquestionable Revealed Truths to the masses. Today, it's just what everyone does. Think about a society in which science is not reserved for the clergy, to hand down unquestionable Revealed Truths to the masses, but is just what everyone does." — Bret Victor

‘Art’ is just the current thing. The same compression mechanism is true of video, text, and a host of coming applications. The pattern in each case is to break open skills that were previously reserved for a select few, and empower the masses. This observation is certainly interesting in the context of prior transitions in tools for thinking together — language, writing, the printing press and the internet.

In the short term, language models are thus a new medium for creative expression. There is no doubt that jobs based on skills which are no longer scarce will be replaced. More importantly, the newfound agency is sure to create an entirely new and higher level pattern of work.

“…the new patterns of human association tend to eliminate jobs, it is true. That is the negative result. Positively, automation creates roles for people, which is to say depth of involvement in their work and human association that our preceding mechanical technology had destroyed.”Understanding media (McLuhan, 1964)

A new medium for creative expression and the new opportunities it opens up is a compelling proposition. My conjecture, however, is that it is the second-order effects that are worthy of more focus.

We currently live in a world in which information asymmetry is profitable. To a certain extent, asymmetry is unavoidable — the world is complex, and we can only know so much. However, much of it is also engineered by competitive dynamics in markets and other institutions. The problem is that without openly sharing information, cooperation is impossible and we are left with the externalities of competition.

Language models suggest a mechanism for progressively incentivising agents to indirectly collaborate at multiple scales. The reason is thus. Training data is a key limiting factor for deep learning implementations, such that more and higher quality data is almost always better. Meanwhile, it will quickly become clear to individuals and organisations that what can be achieved via model augmentation far outstrips what can be achieved without.

My claim, then, is that models incentivise parties to share information — models tend towards better results with more training data, and users are incentivised to provide it in order to benefit from better compression. There is a group-level benefit to many parties contributing data to a model, which in turn can act as a group selection pressure for mechanisms that prevent free-riders and encourage contribution.

Models come to exhibit downwards causation.
Models come to exhibit downwards causation.

Over time, models will come to exhibit a kind of downwards causation. Information arising from many parties gets compressed to become reusable by all other parties, and all parties will come to rely on the compression because it produces better results than can be achieved otherwise. Models therefore act as a focal point — a reason to openly share information without needing to directly communicate with other parties. This mechanism changes zero-sum information asymmetry into positive-sum information sharing, where each party is made stronger by wielding the collective information pool.

“I propose we move from simple feedback to downward causation when components tune behaviour in response to estimates of coarse-grained, aggregate properties.”Coarse-graining as a downwards causation mechanism (Flack, 2017)

When many competing entities are ‘tuning’ to the same information pool — in this case, the collective information pool — it can enable a higher level of organisation to consolidate from the bottom up. In turn, the mutual information advantage sets the conditions for the evolution of more cooperative relationships between lower level parties.

"The transition to the next stage occurs… by joining [parties] into a single whole with the formation… of a control system headed by a new subsystem, which now becomes the highest controlling device in the new stage of evolution."The phenomenon of science: a cybernetic approach to human evolution (Turchin, 1977)

Consider a simple example in art. Let’s imagine that a large community of artists agree to continuously contribute their art as training data to an open-source diffusion model or some future descendant. The model compresses all contributions and makes them available for recombination. Each contributing artist now has the combined creative force of every artist in the community. My conjecture is that even for professional artists, the range of what can be achieved through this augmentation outstrips what can be achieved alone. This is probably true today, never mind in a few years’ time. If so, then there is a strong positive-sum mechanic at work — everyone contributing makes everyone’s work better. Community formation arises naturally, because there are strong incentives for cooperative relationships between community members.

At some point, a cooperative network empowered by shared models outcompetes its competitors because of the mutual information advantage. Individual network members have an information advantage relative to their external counterparts. There exists, then, a growing incentive to participate cooperatively rather than compete against the network. Within the network, the emergence of social norms and relevant monitoring mechanisms can ensure a stable base of cooperation, such that everyone contributes data and everyone can access the model.

“Viewed as a cultural evolutionary process, new entrants to the population would be more likely to adopt the preference ordering of those who obtained the higher material payoffs in the immediate past (Boyd and Richerson, 1985). Those who were less successful would tend to learn the values of those who had achieved higher material rewards (Borgers and Sarin, 1997).”Collective action and the evolution of social norms (Ostrom, 2000)

Flack’s paper suggests that downwards causation via compression is not unique to language models, but instead a generalised mechanism in adaptive systems throughout the natural world (macaque social hierarchies, for example). Which is to say that if correct, it is extremely robust, scalable and has been continually selected for at many levels of the evolutionary hierarchy.

One might worry that everyone using the same models leads to a convergence on a local minimum, which would be anti-evolutionary. However, I think the opposite is true. The coming diversity of models and use cases is such that individuals and groups will use a plethora of different models in their own ways across a massive range of applications. The result is surely a dense, interwoven and constantly evolving combinatorial network integrating groups in complex ways, sufficient to overcome local minima and enable continuous evolution.

“…[parties] need not be in agreement about how to best to tune to these variables. The degree of agreement will depend on whether decision-making—in learning or evolutionary time—is influenced by other types of heterogeneity”Coarse-graining as a downwards causation mechanism (Flack, 2017)

Language models are a substrate for more collaborative societies

“The various ways in which the knowledge on which people base their plans is communicated to them is the crucial problem for any theory explaining the economic process, and the problem of what is the best way of utilizing knowledge initially dispersed among all the people is at least one of the main problems of economic policy—or of designing an efficient economic system.”The use of knowledge in society (Hayek, 1945)

Today, models are toys for creative expression. Tomorrow, they will be tools for increasingly autonomous technological production. If the central problem of economic society is the problem of productively organising agents with imperfect information, then a new mechanism of aggregating that information is likely to enable new economic structures.

It’s worth stating my philosophical assumption here that deep learning and related models are not, and are unlikely to become conscious beings. I think we are dealing with tools, not beings, that will quickly get better and eventually outcompete humans in all relevant productive capacities. But, productive replicators is all they will ever be. This is not to say that entrusting widespread economic activity to unconscious robots is without risk, but the distinction between tools and beings is a critically important one. Arguably, the reason that unconscious AGIs are risky is precisely because they are unconscious, and thus incapable of the social intelligence required to understand what it means to be ‘human-aligned’.

"…the cognitive capacity to represent the formal properties of mind differs from the cognitive capacity to represent the subjective properties of mind (Seager 2006). Thus a notional zombie Hyper-Autist robot running a symbolic AI program on an ultrapowerful digital computer with a classical von Neumann architecture may be beneficent or maleficent in its behaviour toward sentient beings."The biointelligence explosion (Pearce, 2012)

My suggestion, then, is that models and their descendants can be viewed as tools with the potential to create more collaborative and more collectively intelligent economic societies.

To recap, it seems inevitable that models will become more effective in productive capacities currently entrusted to humans. Augmentation with models will therefore empower individuals and groups to do things they otherwise cannot do. Models, at least in the short term, are dependent on information arising from people, and more/better data makes more effective models. Effective models thus create a focal point for community formation, and groups at increasing scale will derive a selective advantage from model-mediated positive-sum information sharing. If so, then it further seems inevitable that the incentives of economic society shift towards an open information ecology — in which everyone openly and honestly shares information for collective benefit — and away from closed market information ecologies fixed on information asymmetry.

The most optimistic viewpoint is that this mechanism could enable coordination between groups at scale, for many of the same reasons. For example, many companies could coordinate with each other via a higher level model, or a complex of individuals, companies and even nations. Geoffrey West and co’s work indicates that there is no theoretical upper limit to information-dominated systems. If so, then open information ecologies coordinated by models could operate at community, city, national and planetary scale, with a selective advantage accruing to larger systems. As such, models may present focal points to overcome dangerous multipolar incentive patterns derived from information asymmetry.

A combination of cryptography and artificial intelligence may well be able to implement systems working on this principle. Of course, technological implementations are only part of the problem. Working on a collective principle of mass information sharing is a huge social step in a world dominated by competitive incentives. The idea that artists, companies and nations might openly and proactively contribute all internal information for collective consumption is frankly hard to imagine. Critically, it requires a shift in values away from individuals and towards collective networks at all scales, superseding individual incentives with collective incentives.

Subscribe to olly.eth
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.