On November 30, 2022, OpenAI released ChatGPT. In one year, generative AI pushed the technology industry toward rebuilding software and hardware, lifted the value of AI infrastructure providers, and opened new possibilities in fields from medicine to aerospace. It also brought anxiety about safety, jobs, fraud, manipulation, and the future of artificial general intelligence.
This article gathers views from AI practitioners in 2023 on questions including GPT-5, OpenAI’s challengers, multimodal models, AI chips and data, China’s large-model race, open source versus closed source, and the possible paths toward AGI.

OpenAI, the organization behind the generative AI wave, was not widely known to the public before ChatGPT. In just one year, it became one of the world’s best-known technology companies, putting pressure on Google, Meta, Amazon, and other giants. Two questions now concern almost everyone watching AI: when will GPT-5 arrive, and who can truly challenge OpenAI?
Zhang Peng, CEO of Zhipu AI, argues that calling others challengers may overstate OpenAI’s position. OpenAI is ahead, he says, but competitors cannot be ignored. Xiao Yanghua, director of the Shanghai Key Laboratory of Data Science and professor at Fudan University, notes that once a model initially takes on the form of AGI, its upgrade and iteration speed could be astonishing, making first-mover advantage very strong.
After explosive early growth, OpenAI’s user growth has slowed, which several interviewees described as normal. Wang Xiaohang, vice president of Ant Group and head of its financial large model, says model progress is increasingly data-driven and publicly available image-text data on the internet is beginning to run dry. He argues that large models should not be only a centralized super-AI, but a super-ecosystem.
Interviewees identified several future directions: larger models, AI personas, deeper customization for industries, agents, tool intelligence, digital transformation, and prioritizing the most important use cases rather than chasing every interesting possibility.
Will GPT-5 Be Released?
Chen Ran, founder and CEO of OpenCSG
GPT-5, GPT-6, and GPT-7 will certainly continue to be released. Data volume will keep growing explosively, model parameters will keep increasing, and interaction will become stronger. However, the pretraining process for large models may no longer be the main issue; the bigger question will be how to use good datasets to create qualitative leaps.
Liang Jiaen, chairman and CTO of Unisound
GPT-5 is only a code name. Many problems still need to be solved as AGI capabilities continue to extend. OpenAI is a respectable leader in AGI.
Chen Lei, vice president of FinVolution and head of big data and AI
A release should be inevitable, but the timing is hard to judge and depends on the market and regulation. OpenAI may be more cautious about launching GPT-5 because GPT-4 remains very competitive.
Xiao Yanghua, Fudan University professor
Release is only a matter of time, but the GPT-5 that is released may not be the same version as the GPT-5 currently trained. After development, safety evaluations are usually needed. Safety does not only mean producing useful and harmless answers in dialogue; it also includes assessing the impact of the model’s capabilities on society.
OpenAI published reports shortly after GPT-4, in March 2023, analyzing which human jobs might be vulnerable to GPT-series models. That should also be part of pre-release safety evaluation. After evaluation, GPT-5 may need some capability trimming before a relatively safe and socially acceptable version is released.
Who Can Truly Challenge OpenAI?
Zhang Peng, Zhipu AI CEO
OpenAI’s challengers can be divided into two groups. One is technology giants such as Microsoft, Google, Meta, Amazon, and even Nvidia. Their entry points differ, but they challenge OpenAI in markets, technical accumulation, and resources. The other group is startups such as Anthropic, Cohere, and Inflection AI, which will also have an impact.
Using the word challenger elevates OpenAI too much. OpenAI is indeed leading, but other competitors cannot be ignored. Companies that can really compete technically may need deep foundations, technical accumulation, and strong strategic understanding. Google, for example, is integrating all its resources and visibly accelerating, increasing its threat to OpenAI. Anthropic is widely seen as a strong competitor beyond OpenAI, and Inflection AI has its own distinctive path.
At the core, Google, Anthropic, and Zhipu AI have the same goal: AGI. Other companies may define themselves or their development paths differently.
The most important factor is that the goal and understanding must be competitive enough. OpenAI has targeted AGI since 2015 and has not changed its original aim, whether working with Microsoft or developing independently. Second is resources: Microsoft has supported OpenAI almost regardless of cost, and only giants or companies backed by giants may match that level of investment. Third is team accumulation and mastery of core technologies.
Xiao Yanghua
When ChatGPT was first released, Xiao argued that in the AGI race there may be only first place and no second place, assuming a fully free competitive environment and excluding other human or political factors. Once a model initially has the form of AGI, its upgrade, iteration, and evolution speed may be astonishing, so its first-mover advantage is extremely clear. The real worry is whether the gap will widen.
OpenAI may have challengers, but they may not come from the general large-language-model track that OpenAI opened. They may come from new tracks such as embodied large models, where models combine with robotic bodies; multimodal large models; models based on collective intelligence; or professional large models. New mechanisms for the emergence of intelligence may allow these tracks to surpass GPT-style models centered on language.
Xiao believes the paths to intelligence, and to AGI, are diverse. Large language models represented by ChatGPT may not be the best shortcut. OpenAI may recognize these possible challengers, but it is difficult for any participant to deploy deeply across so many tracks at once. Therefore, challengers exist in theory, but probably not on OpenAI’s own track.
Wang Xiaohang, Ant Group vice president
Five years ago, few people expected OpenAI to become an AI leader. A challenger could be a large open-source project backed by Meta’s open ecosystem, a company like Google with a data flywheel and resources, or another OpenAI: a startup deeply focused on algorithm architecture and core research.
Chen Lei
In the short term, it is hard to challenge OpenAI. Benchmark results have not reached GPT-4 capability, and there is no especially strong multimodal challenger in the short term. But this must be viewed dynamically, because models keep iterating and different models perform differently in different scenarios. It is hard for one model to dominate every use case, though GPT-4 may be an exception because it is very comprehensive.
Chen Ran
There will be no domestic challenger from China in the near term. Overseas, Google and Meta have not given up, and Cohere, Anthropic, and X will keep challenging. The technology will not be monopolized. Major players all want the right to compete, so competition will become more intense.
Liang Jiaen
DeepMind is also a powerful promoter of AGI, though it focuses more on industry problems. AlphaFold is very strong. Everyone is searching for a good foundation for general intelligence, then using that capability to solve problems. DeepMind’s idea is to solve intelligence first, then use it to solve difficult industry problems that may already exceed current human capabilities.
How Should We View Slower OpenAI Growth?
Wang Xiaohang
For model capability, there is a consensus that model architectures are becoming unified, so capability evolution is data-driven. A major problem today is that publicly available image-text data on the internet is basically starting to run out.
There are two solutions. First, model architectures must better align multimodal data, including images, text, video, IoT, and other data, to break bottlenecks in data scale and quality. This is one of the main directions for major AI model companies. Second is real-world implementation. OpenAI is looking for industry data partners, meaning that after public and high-quality data are exhausted, high-quality private data in specialized fields may be comparable in scale to shared data. Connecting that industrial data to large models like a water pipe is critical, and there is no shortcut.
From the user side, slowing growth is real. Explosive early growth is not sustainable. The main issue is that AGI, as a centralized product, has not yet become a high-frequency necessity for the public. Large models must truly enter every industry and make industries AI-native before they become a broad necessity. That is the next growth space.
Large models should not be just a centralized super-AI, but a super-ecosystem. This means more general and powerful models, more efficient development, and integration into industries after development. There are not yet especially successful industrial cases, but they may appear in the next year or two.
Xiao Yanghua
The slowdown mainly means slower growth in ChatGPT user numbers, which is normal. After the early stage of any new product, novelty fades and some users leave. This also shows that large models cannot remain just chat tools. They should quickly penetrate deep industry pain points and solve serious decision-making problems across industries before their value can be released.
Xiao often compares OpenAI’s GPT models to electricity: they provide a form of intelligence. It took more than a century for electricity to move from birth to large-scale application because many electrical appliances and devices had to be developed. OpenAI’s further growth is similar. Many applications, such as GPTs, must be built to use GPT-style intelligence before it can solve industry problems, create value, and support sustainable development.
Zhang Peng
The essence may not be a slowdown. Rumors about slower or declining ChatGPT user growth have many reasons behind them. API revenue has grown quickly, meaning many users shifted from casually trying ChatGPT to using GPT APIs to build applications and commercialize them. People moved from watching the excitement to doing practical work. Consumer growth also has a ceiling because there are only so many internet users worldwide.
Chen Ran
OpenAI has not really slowed. Its growth trend surpassed TikTok’s. It has only reached a bottleneck caused by the size of its base.
Chen Lei
This only reflects OpenAI’s consumer fundamentals. Slower consumer growth is inevitable and normal. When ChatGPT appeared, it was a phenomenal application and everyone tried it, creating a large base, but the user group later selected itself. Also, one cannot look only at consumer usage. Usage integrated into Microsoft Copilot and Office should be very large. OpenAI has also invested in downstream AI application companies, and its technology’s state in those scenarios needs a more comprehensive view.
Do Small AI Companies Still Have a Chance After OpenAI’s First Developer Conference?
Chen Ran
They do, and they do not. Without a unique approach, it is difficult. In the internet era there were Alibaba, Tencent, and Baidu, but later Pinduoduo also challenged them. Whether small companies have opportunities depends on whether what they do is innovative or business-revolutionary. Without that, they have no chance. Opportunities are reserved for startups with ideas.
Liang Jiaen
Current breakthroughs are still more often at the technical foundation level, but in industry, the business value of the application layer is much larger than that of the technology layer. If a small company only does simple tuning to form a business model, its space may shrink. It must go deep into an industry and solve deeper problems to have a chance.
How Will Multimodality Evolve Over the Next Year?
Wang Xiaohang
The industry focus is now multimodality, and technologies for understanding and aligning massive multimodal data will advance quickly. This is high-dimensional data, not only images, text, and video. Research connecting IoT and the physical world will also emerge. Data modalities such as sensing and control will be aligned with natural language. In AI, this is called grounding: connecting language with perception and action in the real world.
This could generate new capabilities such as autonomous driving and robotics, truly solving end-to-end problems. In autonomous driving, earlier systems trained multiple stitched-together modules. Tesla’s autonomous driving research is now focused mainly on end-to-end training, treating radar, video, driving data, and sensor data such as speed and brake control as sequence data to learn their correspondence and relationships. After mapping massive data to each other, the system can better understand and predict. This alignment across multimodal data already exceeds knowledge that natural language can describe.
Multimodal data is not just text-to-image conversion. More important is how it connects with the real world, which will open a new space.
Xiao Yanghua
In the future, large models should be competent at any multimodal task supported by data. Social media contains huge numbers of landscape photos, selfies, videos, and speech data, so multimodal large models will have corresponding understanding capabilities. Based on these data, large models may also combine and innovate, such as imagining and accurately drawing a picture of an astronaut riding a horse.
But much multimodal data rarely appears on open platforms, such as professional charts, medical data, design drawings, and professional images. Because large models lack the data foundation, understanding remains difficult. Another reason is that professional multimodal data is supported by principles. A circuit diagram has its structure because of circuit principles. Understanding such images requires not only the image itself but the relevant scientific principles behind it. Large models need extensive background knowledge. This difficulty also opens new tracks and opportunities for differentiated competitive advantage.
Zhang Peng
Multimodality has already made concrete progress in speech, vision, and natural-language interaction. Multimodal models will move to a more important stage and may integrate more modalities. In the future, more than two modalities may be unified in one model. Multimodal pretraining will further improve large models’ intelligence and cognition.
Liang Jiaen
Even for multimodal models, the core remains the large language model part, because language and semantics are the central abstraction. A pure text model may have read the world’s books and formed an internal world it understands, but it does not know what a tree looks like or what a bird sounds like. Now, centered on text, various modalities are being integrated and aligned, eventually reaching a state where multimodality aligns with semantics.
What Is the Next Evolution of Large Language Models?
Liu Qingfeng, chairman of iFlytek
Large models have three next directions. First, larger model parameters are still needed, with computing power supporting training at the trillion-parameter level. GPT-4 has around 1.8 trillion parameters. This is the foundation for large models.
Second, AI personas should be built so AI can ask and answer proactively, especially by asking inspiring and guiding questions.
Third, large models should provide deeper customization and services in industry scenarios, integrating multimodal capabilities and backend knowledge learning and expression deeply into each scenario.
Wang Fengyang, Baidu vice president and head of the mobile ecosystem business system
In marketing, Baidu is building an agent business because agents are already one of the most valuable and promising directions from a commercial-ecosystem perspective. ERNIE 4.0 has significantly improved understanding, generation, logic, and memory. The next direction should be how to make the large-model foundation improve agent performance.
Current agent applications that are relatively advanced are often emotional companionship or entertainment. Startups in China and overseas more often build psychological or education agents, and less often agents that complete complex commercial tasks. But since the second half of the year, more startups have moved toward complex business tasks, which is a larger space.
Zhou Bowen, founder of Beijing Xianyuan Technology and professor at Tsinghua University
Humans have two unique capabilities: language and the creation and use of tools. If AI is first pushed infinitely closer to human intelligence, the next question is whether AI can use tools as well as humans. Zhou calls this tool intelligence, and sees it as a more important direction for large models.
A simple phrase for teaching AI intelligence is tokenize everything. To a large language model, everything appears as tokens. After tokenization, just as language can be output one token at a time, tools can be called and used one token at a time. By combining tool use and analyzing the structure of tool use, AI can complete many complex interactive tasks.
Tools can be divided into three categories. The first is physical-interaction tools, such as robots, robotic arms, and autonomous vehicles. The second is graphical user interfaces, or GUIs, such as phones, where users complete tasks through visual interfaces. The third is APIs, behind the Silicon Valley saying that software is eating the world. Under these assumptions, tools can be tokenized. With tokenized training, a language or foundation model that has compressed world knowledge can understand and use tools.
The next generation of AI will have both language intelligence and tool intelligence. It will interact with people, understand human intent, understand the world and the tools needed, and call the right tools to complete tasks under human instructions.
Zhang Peng
Multimodality is definitely important. Its essence is improving the model’s cognition, including understanding, reasoning, and self-planning. This involves cross-modal learning and application, and reasoning that integrates knowledge and common sense. Agents are also popular, but their essence is still reasoning and self-planning. Fairness, safety, and privacy also need major work to ensure that a highly intelligent, human-like technology that can make small mistakes does not cause major harm in real applications.
Challenges include resources, data, and computing power. Cross-modal vision-language learning needs paired image-text data, which is harder to prepare than the data used for language-model pretraining and requires higher quality. High-quality corpora are said to have been used up; the question is where new high-quality data will come from. This is even more true for image cross-modal data. Preparing millions of image-text pairs was already difficult in the early stage. As models become larger, they need more data, raising the question of how to prepare it.
On fairness, safety, and privacy, one recent team used a model to predict social-media user profiles such as gender, location, age, and occupation based on content users posted, with very high accuracy. If used improperly, such research creates security threats.
Chen Ran
Large-model development is leap-like and exponential. In one year, large models covered decades of development. But China’s foundation remains unstable, including computing power, algorithm talent, and data.
Before large models can be built, digital transformation is necessary: information across industries must become data. China has not fully completed digital transformation. Many companies have demand for large models but lack data for training. If digital transformation is not done well, multimodality will also lag and diverge. Overseas markets have already adapted more fully, while China needs more time.
Applications will take root in China. Applications need platforms, ecosystems, and open source. Platform-oriented, ecosystem-oriented, open-source companies must first prove the business model so application companies can grow without worrying about infrastructure. Then applications may develop in leaps and potentially overtake on a curve, but not quickly. Chen estimates two to three years. Many application companies may appear next year, but they will not become giants easily.
Xiao Yanghua
The next direction for large language models is highly diverse, including multimodality and embodied large models.
For language models themselves, hallucinations remain a problem. Logical reasoning and professional thinking need improvement, especially in mathematics, physics, chemistry, and other professional capabilities. Recent training based on synthetic data partly generates data under the guidance of scientific principles. Training large models with synthetic data can to some extent improve or alleviate logical defects and bottlenecks in professional cognition.
Another important direction is making large models safer, more controllable, understandable, and explainable. Real-time capability also needs improvement. Search-augmented solutions exist, but it remains worth watching whether real-time limits can be addressed through training mechanisms. Low-cost training and application are also important. Large models remain expensive, and large-scale application costs are often hard to accept.
Chen Lei
The key is to solve difficulties in real applications. First is efficiency: how can large language models return results online in real time? Some teams are accelerating inference or pruning models. Second is how to ensure safety and controllability for consumer-facing models, avoiding ethical problems and producing rigorous, accurate, scientific answers. Third is how to use more high-quality data for continuous iteration. Overall, evolution will definitely move toward applications and implementation, not remain purely theoretical.
Ruba Borno, global channel vice president at Amazon Web Services
This is only the beginning, and there may be many directions. Generative AI is truly transformative, and we do not even know what it will look like a year from now. The real difficulty is prioritization: deciding which use cases are most important to users instead of following every potentially interesting direction. Generative AI can be applied to so many use cases that the next task is to choose specialized areas, whether industries or specific use cases, and continue building deep knowledge in them.
How Will China’s Large-Model Race Evolve Over the Next Year?
Zhang Peng
China has special circumstances and cannot be directly compared with overseas markets. The broad trend is recognition that large language models can reshape industries, so more resources must go into implementation in specific industries. Any technology must create practical value after reaching a certain stage.
The direction is right, but execution is tricky. A common view is that industries do not need general foundation models, only small, medium-sized, suitable industry models. But the fundamental reason large language models made breakthroughs is that they learned and modeled world knowledge, giving them near-human understanding, reasoning, and more advanced cognition. For industry models, the relationship is subtle: do they need the common-sense capability provided by foundation models, rather than only industry data? Zhang’s view is that the ideal state is for industry models to grow on top of foundation and general models, then be further trained and fine-tuned.
Chen Lei
The market will become calmer and more objective. First, future large models will focus more on application implementation. Second, foundation models will certainly converge; the market will not have more than 80 foundation-model versions. Mergers among some startups are already happening because this field is resource-intensive. Third, as foundation models converge, more companies may focus on vertical models. Fourth, the ecosystem will become richer, including model companies, hardware vendors, and applications based on large models. Over the next year, there will be progress and outputs in connecting the path for large-model implementation.
Zhou Bowen
The number of general foundation models will converge, and large AI models must enter industries. This wave of AI progress and industry growth depends on industry large models entering a phase of many models across many sectors. More importantly, applications of industry large models should emerge continuously.
Liang Jiaen
In the future, general large models may be countable on one hand. For vertical applications, the number of models will be larger than people expect, perhaps dozens. But this depends on whether industry models can truly solve industry problems. If they cannot, they are just toys.
Chen Ran
An investor once asked whether China would have several large models, a hundred, or ten thousand. Chen believes China will not only have a hundred-model race; like the United States, it will see thousand-model and ten-thousand-model races. Models are open source and algorithms are consistent, but data is unique. Once data is available, a large model becomes an independent entity. Every industry needs to inject data into large models to create productivity. Domestic competition will become fiercer, with dozens of foundation models and more vertical models. In the future, vertical models with hundreds of millions or billions of parameters may meet demand.
Models can climb leaderboards, but that is meaningless. People will increasingly understand which large models are truly useful. Chen believes a very strong open-source large model will definitely appear.
Can Open-Source Large Models Surpass Closed-Source Models?
Zhang Peng
At present, the gap between the average performance of open-source models and the best closed-source models is still obvious. Catching up may take time.
Liang Jiaen
Technically, there is no essential difference. There are not many big secrets in technical circles, and the global large-language-model paradigm is relatively unified.
For closed source, OpenAI has likely done very detailed and solid work on data in addition to its algorithmic framework. From an application perspective, large models must eventually enter applications. OpenAI’s applications are at the forefront and have already been iterating for a year. Liang estimates that open-source large models will have greater influence in terms of the number of applications, but closed-source models will be better at reaching the highest level.
Chen Lei
Open source and closed source each have advantages. Open-source ecosystems are better and attract more developers to help communities and models improve.
But open sourcing large models differs from other open-source technologies because it requires heavy resources. Using an open-source model in an application requires a team capable of integrating it into a program and doing extensive scenario-specific modification. Many companies do not have that capability.
From a training perspective, large-model open source is suitable within limited scope. Closed-source models emphasize commercialization and customization. Companies choose open or closed source differently at different stages. As a business strategy, some large-model startups may walk on two legs, releasing an open-source version first and then providing a commercial version.
What Question Do AI Leaders Most Want Answered?
Chen Ran
Large-model operators have not undergone a qualitative change so far. What is the next direction for innovation in large-model operators?
Liang Jiaen
From a technical perspective, what methods can make large models reliable and controllable? At present, they are still essentially statistics-driven. Statistics alone are not enough. On top of semantic abstraction, models need to effectively combine facts, conform to human logical norms, and align with human value choices. We can predict all behaviors of weak AI, though not how well it will perform. But the results of AGI are unpredictable. How to keep its outputs aligned with human expectations is a challenging question.
Zhang Peng
When can we clearly evaluate that the intelligence level of a large model has surpassed the average human level? The GPT-4 report mentioned such a conclusion, though not everyone fully accepts or values it. If a large model’s average capability exceeds the human average, that is a milestone showing large models can truly be put into use and solve concrete problems in many scenarios.
Chen Lei
Is the large-model approach truly a feasible path to AGI? People previously said AGI might arrive around 2035. Now, the only clear possible path to AGI appears to be large models. But whether they can really lead to AGI, and whether they are the only path, remains a question many people are curious about.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.