Less than six months have passed since the last review on the topic in question, but the rapid developments in Large Language Models (LLM) and Multimodal Large Language Models (MLLM) require fresh updates. The outstanding representatives of the former are ChatGPT and GPT-4, demonstrating the triumph of artificial conversational intelligence to the public, while the leader among the latter is Midjourney that may take the bread away from a huge number of mid-tier graphic designers.
Advances and “breakthroughs” in textual and multimodal models based on deep and generative neural networks and transformers, in connection with the task and possibility of creating Artificial General Intelligence (AGI), as well as the social and ethical aspects of their development and application, are discussed in the Russian-language AGI developer community on a regular basis. At the 2023 seminars, in particular, the technical capabilities of ChatGPT / GPT-4 technologies, juxtaposition of their level with AGI, as well as the social effects of their application in modern society were discussed. In this review, we outline the main achievements and opportunities, as well as threats and potential development outlook.
Breakthroughs and opportunities
Finally, the Turing Test has been passed in all earnestness. The fact of the matter is that formally this test was passed by the computer program named Eugene Goostman chatbot, back in 2014 or nine years ago. But then the developers of that winning software disclosed its structure, and it came out that its essence is just a large and rather complicated “decision tree” with programmed dialog logic, and its creation and adjustment took the developers several dozens of man-years. This event provoked strong criticism from the academia which challenged the value of the Turing Test as a measure of artificial intelligence, if it can be passed by a manually created program, but not by a self-learning intelligent algorithm. Today, the quality of ChatGPT dialogs allowed reaching an audience of 100 million users in just six months, most of them finding the bot’s answers quite “human.” Inconsistencies and mistakes in the statements can be found, but it takes some effort, and in many cases, they also look quite “human” (e.g., outright nonsense about stirring coffee with a cigarette or tales in response to a question about the role of tank units in the army of Alexander the Great can be perceived as humor). Moreover, a recent IQ-test of the ChatGPT bot in its textual version showed that its “intelligence” in 155 Verbal IQ was not only significantly higher than average, but was above the IQ of 99.9% of those who were recruited in the test group of 2,450.
From a practical point of view, ChatGPT technology, for the first time in the half-century history of research in Natural Language Processing (NLP), made it possible to solve the task of summarizing texts with a quality acceptable for practical use. One of the new services based on this technology is ChatPDF, a service for generating short “extracts” from scientific articles and documents in different languages.
Moreover, the conversational capabilities of ChatGPT bot allow not only detailed answers to any questions (including even those for which an adequate question is impossible) and summarizing the articles, but also recommendations for programming and solutions to various technical problems in the case when the original data and examples of solutions could be in the training dataset.
For its part, MLLM Midjorney, which learned since last year to draw highly realistic portraits of any imaginary characters like Harry Potter as Ilya Muromets or Sherlock Holmes as Cat Matroskin, has just published a fresh version where all humanoid characters are guaranteed to always have 5 fingers on their hands (previous versions often drew six-fingered characters due to model flaws).
Today, LLM and MLLM-based solutions may well improve the productivity of those who work with text, graphics, and uncomplicated program code—in cases where one does not need a hundred percent guarantee of adequacy and quality, or the user acts as a critic, picking up “hints” or “prompts” and making sure the output is adequate. In fact, the principle of “trust, but verify” is also useful in checking the work of live subordinates and assistants, so such systems may well begin to be considered as junior secretaries and mates.
Risks and threats
Not so long ago it was believed that robots would rid humanity of dirty and hard work, and that humans would be free to engage in creativity and art. Recent advances described above suddenly implicate that “robots” are more likely to be able to efficiently write texts, draw pictures, and perhaps even compose music as well as uncomplicated (for now) computer programs themselves, leaving humans with tasks which are inconvenient for MLLM/LLM applications, such as tire swapping and stairwell cleaning. At this stage, while leaving simple robots to perform routine operations on the assembly line, sophisticated neural network models are more likely to get creative from now on, leaving the role of handymen to humans.
If we look at the emergence of the Internet, no one could have imagined that the network of scientific laboratory computers would grow not only into a worldwide network of knowledge and news exchange, but also into a worldwide network of cybercrime, where everyone could become a victim of computer viruses, social engineering, personal and financial data theft and even the personal identity stealing. Cyber criminals now laying their hands on tools for mass generation of fake videos, audios and texts could virtually create an unprecedented wave of cybercrime, with victims receiving voice messages and video calls from fake relatives or fictitious characters, engaging in trusted dialogues with them and eventually engaging in financial transactions for the benefit of fraudsters.
The use of such technologies at the state level or for political and marketing purposes can lead to a qualitatively new level of manipulating the public consciousness, based on massive generation and delivery of highly realistic text, audio and video content with a preset orientation. Neuro marketing will be automated and put on an assembly line. Superiority in hybrid and information warfare may lie with whoever is able and willing to turn these technologies into weapons.
Just as the technological revolution has limited man’s need for physical activity, the information revolution has deprived people of the need for memory. A further revolution in artificial intelligence will rob humans of their ability to think. Operators of industrial facilities with ACS systems in place are losing the skills to control the systems themselves and are unable to cope with them in abnormal or emergency situations. In order not to degrade, people will have to specifically force themselves to practice in mental activity, as they force themselves to jog in the morning or go to the gym after work—we have dealt with this matter in a separate article.
Technologies like ChatGPT can significantly lower the quality of education, since the capabilities of modern LLMs to paraphrase texts make anti-plagiarism systems useless and their quality of text generation on any topic available to schoolchildren and students will make it impossible to give homework assignments like writing essays and term papers, and in the near future any homework assignment, since any assignment can be “copied” from the “big brother.”
The speed at which events have unfolded over the past six months, the seriousness of the above risks, and the anticipation of new, unknown risks in the future lead a number of prominent figures in science and business to advocating a ban on the use of systems “smarter than GPT-4” for at least half a year, in order to analyze the possible consequences of technology development and to prepare the necessary regulatory measures.
On the Russian-language AGI developer community channel, a vote was held on the issue of banning or restricting research in AI using LLMs/MLLMs of the GPT-4 level and higher. As the poll showed, it was cyberthreats and the danger of using AI for military purposes that was identified as the main reason for a possible ban on the development of “stronger” AI. At the same time, many more people expressed the view that restrictions on AI were unnecessary, and even argued that bans would no longer help, as the “race for AI” between corporations and nations is unstoppable.
There is a heated debate in the academia about the extent to which the breakthroughs described above are bringing human civilization closer to the creation of AGI. As can be seen from the voting results, a quarter of the participants believe that LLM/MLLM in its current form is a blind alley. Excessive energy consumption, lack of symbolic or logical reasoning, inability to set goals and get adapted to new learning environments beyond the framework of the learning sample, as well as the need to address other problems identified in our previous article are cited as a strong case for this position. A fundamental argument against considering the GPT-4 even as an AGI prototype is its initial design as a “perfect approximator,” capable of nothing more than fitting its model to a given training corpus, trying to predict as accurately as possible the next word or letter in text or adjacent pixel in an image.
On the other hand, more than a third of those surveyed believe that this technology is an integral part of the future of AGI, though not the only one, and that many ingredients are still missing. As could be heard at one of the recent seminars, the key “missing” ingredient is seen as the ability to set goals. Indeed, existing chatbots work in a “reactive” mode, simply responding to user requests, even if that response is generated within the context of previous communications, with the user in mind. For a full-fledged proactive behavior of such systems, with a focus on achieving their own goals and independent goal-setting capabilities, more complex solutions are obviously required. Higher-level cognitive architectures, including various neural network models responsible for both answering questions and creating contexts for these answers, so that the answers could guide a trajectory of user’s movement in a direction, necessary and beneficial for the system itself, taking into account its own goals set by the system itself, could be one of those solutions. Among other things, hybrid cognitive architectures, combining LLM/MLLM and logical inference within neuro-symbolic integration, as proposed in one of the recent papers at last year’s BICA-2022 conference, can be created. Only one question remains: given the risks and threats outlined above, how much goal-setting can be delegated to future AI systems and what frameworks can and should be set for this purpose.
Meanwhile, almost 20% of all respondents believe that either the GPT-4 has already reached the AGI level, or the necessary improvements like those listed above are not critical. Given the above-mentioned results in terms of AGI’s textual communication capabilities, the task of catching up with most humans and overtaking them in terms of cognitive capacity now seems much more realistic than it was just a year ago. At the same time, even “not the strongest AI” in “capable hands” with computing and communication capacity for its application already offers huge competitive advantages—both in peaceful marketing and in information wars.
If this AI is used as an “intelligent assistant”, as we earlier mentioned, important and currently insurmountable problems of reliability, verifiability and interpretability figure large. When you use a regular text search on Google, the lack of search results more or less adequately indicates the lack of the desired information and forces you to work out a solution on your own. If you get a response from ChatGPT, you do not know whether it is objective information, or whether it is a fantasy on a level playing field (like the tanks of Alexander the Great), or maybe it is a politically biased interpretation of events in a context determined by a training sample—random or purposeful. Thus, for serious practical application beyond the entertainment realm, the use of this technology requires an even greater level of critical thinking than was required when a conventional search is used. In other words, for a thoughtful expert, ChatGPT and similar systems are an effective assistant; yet for a clueless outsider this is a sure way of getting into a very unpleasant situation and real trouble by taking any answers on board. Obviously, the development of technology in this direction, elaboration on the ways of implementing “critical thinking” at the level of the system’s cognitive architecture will be gathering momentum.
The economics of dialog systems like ChatGPT also leaves much to be desired for now, as compared to traditional Internet searches. Conventional keyword searching is based on extracting a set of potentially relevant links and document fragments based on indexing the entire corpus of source documents and pages on the Internet by individual words, for the user to independently comprehend the search results. Starting in 2015, this search has also been enriched in Google with semantic search based on the knowledge graph—that is, the semantic index. In turn, the LLM model, trained to the same volume of documents and pages, is itself an associative index, requiring much more computing resources to store and process. And the process of generating a response to a query also involves the context of the user’s session. With that in mind, the multi-headed attention mechanism is conducive to responses that are more specific to the user’s interests. All of the above requires significant computational outlay and so the final cost of replacing the usual search with a chatbot turns out 10 times higher.
In connection with the previous point, it should be noted that the business model of the classic “cheaper” Internet search is based on contextual advertising, presented in the form of clearly marked and obvious (“look out, advertising!”) links. And in case of responding to a request in the form of a chatbot message, the presence of advertising links in the search output is impossible due to the absence of the search output per se. Consequently, chatbot monetization may be possible only in the event of hidden advertising embedded in the text of the bot’s response, which leads to threats of manipulation and can be legally regulated. Perhaps the development of question-and-answer chatbots in the future can only be done via business models like subscriptions, or implicit divestment and resale of the information collected from users, which is also already a point of concern to lawmakers so that ChatGPT was recently banned in Italy.
While the latest calls to ban or stop the development of AI concern LLM/MLLM-based systems that operate exclusively in the virtual digital space and pose no obvious threat outside of it, in the real physical world much more serious problems are possible. In the previous overview we mentioned the actual inability of the global academia to stop the development of autonomous lethal weapon systems. The ubiquitous spread of a wide variety of UAVs and loitering munition in today’s theater of war for a wide range of tactical missions and objectives is becoming a key factor of superiority. Unfortunately, the continuation of armed conflicts around the world will undoubtedly boost technology development in this direction.
The fundamental possibility of building an AI system with cognitive abilities close, in certain aspects, to human ones has already been demonstrated. However, this requires computing resources that are significantly larger than those which can be possessed by autonomous physical devices, whether we are talking about UAVs or other terrestrial and land-based drones. Clearly, further work to miniaturize and reduce the power consumption of AI technologies will enable future creation of intelligent autonomous devices that make multimodal MLLM-type behaviors possible for robots operating in the physical world without the need for access to future GPT-5-type server clusters. And this could also enable the creation of autonomous, compact, intelligent weapons delivery and application tools that might elevate existing threats to the next level.
And even today, while LLM/MLLM intelligence is limited to the capacity of server clusters and those information networks that are connected to those clusters, the strategic superiority of those geopolitical entities, whose jurisdiction spreads to the institutions and companies that own such clusters, are obvious. These countries have a significant competitive edge and can ensure their technological sovereignty in the information space and other realms much better than those that cannot guarantee the development of relevant technologies in their jurisdiction.
From our partner RIAC