GPT Party 2.0. David Yang on Digital Employees

In Silicon Valley, on October 7-8, GPT Party 2.0 took place, which was the largest Russian-speaking networking event dedicated to artificial intelligence. More than 300 people gathered at Plug and Play to meet with leading experts, entrepreneurs, and investors, discuss the latest trends in artificial intelligence, and gain practical knowledge.

At the event, David Yang talked about who digital workers are and what they are capable of. He shared his vision of how AI will impact our lives and provided examples of digital workers that are already starting to replace humans in workplaces.

Part 1: Who Are Digital Workers?

“Right now, we actively use ChatGPT, but we still don’t fully understand the boundaries of its capabilities. Researchers have conducted special studies to determine what modern Language Models (LLMs) are capable of.

The graph below shows the pass rates of LLMs on key parameters. The results in blue represent the performance of the GPT 3.5 model, while the green ones indicate the tests passed by the GPT 4.0 system. Therefore, we can confidently say that GPT 4.0 achieves a high score, passing 80% of American college graduation exams.”

“It is believed that computers cannot grasp humor, even though they can acquire professional knowledge. In April of this year, a study was conducted to assess GPT 4.0’s ability to analyze humor. The task was to explain what is funny about a photo. The system recognizes objects in the photo, even models of phones, and describes what is depicted in each of the pictures. In the end, it summarizes the humor in that image.”

“Nick Davydov discussed the fact that large linguistic models are trained to predict the next token, which means the next words. But an amazing fact is that when this ‘brain’ was given the opportunity to read almost all the texts written by humanity over the past 2000 years, it began to predict not only words but also the subsequent meaning. The system learned to reason, with some caveats, to think. This raises a separate philosophical discussion about what is already embedded in the texts written by humanity – the principles of thinking. We won’t dwell on it in detail right now, but it’s an absolutely astonishing phenomenon.

Let’s continue with the research on linguistic models. In this case, I’m analyzing an article from April 13, 2023, that spans 150 pages, in which the authors try to understand the limits of the model’s capabilities. We already know that large linguistic models can compose poetry, observe images, and manipulate them. They have quite serious capabilities in the field of graphical representation of statistical research. The example below demonstrates the fact that LLMs have learned to understand physical objects, their sizes, geometry, and individual properties, as well as the laws of gravity. The task was presented as follows: we have a book, nine eggs, a laptop, a bottle, and a nail; arrange these items on top of each other in the most stable way. The system explains that the book should be placed on a flat surface first, and it will serve as the base. Then the system suggests not just arranging nine eggs in a three-by-three row but leaving a small gap between them at the end, as the computer’s weight will be distributed among the nine eggs, making them stable. However, it adds that the eggs should not be cracked. In the end, it suggests placing the bottle upside down on top because a nail will be placed on this lid. It should be placed carefully to prevent it from falling apart.”

“Understand what’s happening here? The model was trained to predict the next words, but in this case, it demonstrates reasoning abilities. Many say that linguistic models are only capable of text summarization, but we observe the creation of new content in this case. This task was unknown to the system as it was devised by researchers. It is guaranteed that AI has never encountered such a task before.

Below are examples of the model’s abilities in advanced mathematics and programming, although mathematics is considered not to be the model’s strongest suit. Large linguistic models are often seen as more geared toward the humanities, but imagine a humanities expert solving second-order integral equations.”

“In the 1980s, neurobiologists and cognitive scientists devised a test to determine whether an individual possesses consciousness and, if so, how much consciousness a normal adult human has. This test was called the ‘Theory of Mind.’ An adult human typically scores above 90% on these tests. In 2020, large linguistic models passed these tests by less than 10%, in 2021, they were already passing at 30%, and in 2023, ChatGPT 4 passed these tests at 95%.

Formally, we should acknowledge that there are entities in this black room that possess consciousness. Or do we know nothing about human consciousness? What is consciousness, anyway? It’s a separate, vast topic. Over the last four years, we’ve been creating a non-biological companion called Morpheus, in which we tried to embed consciousness and subconsciousness. We created three models: one model interacted with the external world, while the other two models interacted with each other. They always discussed what was happening, as if they were reflecting on this entire world, but they never communicated with the outside world.

I attended a conference where we tried to understand what consciousness is, to define the terminology. It’s believed that consciousness is a state in which an individual exhibits at least these 10 mental states: perception, attention, cognitive processes, intention, memory, imagination, self-awareness, beliefs, desires. Interestingly, a significant number of these mental states are demonstrated by LLMs.”

Part 2: How Will AI Change Our Lives?

“According to Goldman Sachs, 300 million jobs will disappear or be somehow transformed due to the emergence of artificial intelligence. Various studies, including one at the University of Pennsylvania, estimate that 50% of professions will be altered by AI. According to Reuters, 27% of knowledge worker professions are under threat from the AI revolution. The phenomenon is so massive that we decided to create the Association of Digital Workers. In the association, we believe that these 300 million jobs will ultimately be new ones; they won’t take away jobs from existing biological workers.

I agree with the speakers who believe that the AI revolution will lead to us engaging in more creative work and spending less time on it, dedicating a larger part of our lives to family, children, hobbies, and so on. The workweek will be shortened once again: just 200 years ago, there were no weekends, then one day off per week was introduced, now we have two, and soon we may have three or four weekends in a week.”

Part 3: A Concrete Example of Digital Workers

“The distinction between digital workers and tools or co-pilots that assist in work is quite simple. Digital workers are highly isolated versions of applications with a human form factor. Unlike a co-pilot used in GitHub to assist programmers in writing code, a digital worker is isolated and stands alone. They have a name, a phone number, an email address, and a Microsoft account. You can call them, message them, assign tasks to them, invite them to meetings or standups with the team, and they will listen to instructions from the leader, just like other employees.

In one of the previous panels, it was mentioned how challenging and time-consuming it is to implement any tools. This happens because implementation requires changes in business processes. However, with digital workers, we don’t change anything in the organization’s production process.

Digital workers solve a significant problem for organizational leaders dealing with constant turnover of employees in their positions. Imagine a manager with 100 people on the front line of technical support, and at least 40 of them are leaving this year. New people need to be found, interviews conducted, time spent on onboarding, and after four months, employees leave again.

Private entrepreneurs face a different problem. Imagine the owner of a spa salon who also provides services. While someone is receiving a procedure, there’s no one at the reception to answer calls. As a result, up to 40% of incoming calls are missed, leading to customer dissatisfaction and attrition. Finding people for reception is very challenging because many perceive it as temporary work, and thus, they are not engaged, don’t call back clients, and lack complete product information.

How can digital workers help in this case? A digital worker is an intelligent agent that possesses all communication channels: phone calls, a name, messaging capabilities, and physical presence. It’s a robot that handles communication work, can recognize the person who just called, or warmly greet clients at the reception and provide all the necessary information.

The diagram below illustrates the operation of such systems. At the bottom are large linguistic models. The middle part represents long-term and medium-term memory. When a task is set, the system autonomously makes decisions based on information about the current status. It analyzes the goal to be achieved and the specific tools at its disposal, then constructs a sequence of steps required to accomplish it. On the right side are all external static and dynamic data of the company. Static data includes rarely changing information about services and the company, such as the website, PDFs, instructions, and knowledge bases. Dynamic data refers to information that can change every second, such as Jira, tickets, CRM, ERP, etc. On the left is the person who called or visited, and communication takes place with them. The communication interface on the left includes the phone, messengers, or a physical robot. This is the high-level architecture of building a digital worker on top of a large linguistic model.

“To implement this, modern platforms are available, one of which we are involved in. It’s called neow.ai. This platform allows you to take five steps to implement a worker. It results in digital transformation without changing the organization’s business processes; it’s more like adapting people. First, you take instructions used to train a biological worker. Then, you connect the system to your static and dynamic information channels, which can be done by yourself as it doesn’t require deep technical skills. On the third step, you connect the channels that the worker usually uses, such as email, Slack, and more. Then comes testing, just like new biological employees go through. When people come to work in technical support, they initially study a stack of documents about the company’s products, prices, services, typical cases, and questions, and then they undergo testing for readiness for work tasks. The same tests are given to digital workers. If they start working normally, their work is almost indistinguishable from that of a biological worker. This means that the person calling the organization may never realize that they are interacting with a non-biological worker because modern agents speak excellently. Creating the first working version of a worker can take two days, plus a week or slightly more for testing.

Currently, we are also adapting another robot, Moxi, for physical use. This robot can rotate, has body movements, hand gestures, but most importantly, it is emotional. It can get upset, be surprised, and when a person walks around the room, it watches and maintains visual contact.

The combination of phone calls, emails, and physical presence creates an entirely different customer experience.”