The Determining Factor for Success in the AI Industry: Data & AI Infrastructure Construction

2025-11-06

The 2025 China International Digital Economy Expo opened at the Shijiazhuang International Convention and Exhibition Center. Yu Yang, Chairman of the Board of Beijing KeenData Co., Ltd., was invited to attend the 2025 Chief Data Officer Summit Forum and delivered a keynote speech titled "The Success of the Artificial Intelligence Industry Relies on Data&AI Data Infrastructure Construction".

The following content is compiled based on the on-site speech by Yu Yang, Chairman of KeenData.

Yu Yang, Chairman of KeenData

I.Data is the Key to the Development of the Artificial Intelligence Industry, and High-Quality Datasets are the Core of Data

In the context of great power competition, the artificial intelligence (AI) industry is a core competitive field, consisting of three key elements: computing power, algorithms, and data. Currently, China has competitive solutions and large-scale computing methods in the field of computing power, and has achieved breakthroughs in mixture-of-experts models in the field of algorithms. As a crucial factor determining the success of the AI industry, data is the key to achieving overtaking on curves.

The development of AI is driven by the synergy of three core elements: computing power, algorithms, and data. None of these can be dispensed with, and only by forming a joint force can AI be truly implemented and applied.

In the field of computing power, China has vigorously promoted the construction of large-scale computing centers through forward-looking layout. From supercomputing centers to intelligent computing centers, it has built a computing power system with competitive strategies. At the algorithm level, domestic scientific research and industrial circles have also achieved numerous breakthroughs. Innovative achievements such as the Mixture of Experts (MoE) model and the Multi-Head Latent Attention (MLA) model continue to emerge.

Currently, data application both domestically and internationally faces common challenges: first, the storage, transmission, and computing of massive data require huge investments in hardware and computing power resources, directly pushing up the cost of technology implementation; second, multi-modal data such as text, images, and audio have significant format differences and lack unified integration standards, leading to extremely high difficulty in cross-type data fusion; third, existing data platforms are inefficient, with low data cleaning efficiency and long annotation cycles, which directly slow down model training progress and affect the timeliness of application implementation. Against this background, some enterprises have excessively high expectations for AI applications, hoping that it can solve full-scenario problems. However, low-quality data often results in model output failing to meet expectations, and ultimately the projects are forced to be shelved.

The scale and quality of data directly determine the height and depth that AI technology can achieve. Data defines the cognitive boundaries of models; the learning scope of models depends on the fields covered by data, and the wider the data coverage, the stronger the model's cognitive ability for different scenarios and problems. Data quality affects the reliability of output; low-quality data can lead to deviations in model training, significantly reducing the accuracy and credibility of output results. Data diversity enhances model robustness; data covering multiple scenarios, dimensions, and sources can help models cope with complex and changing practical application environments. Large-scale high-quality data supports the growth of model capabilities; sufficient data volume enables models to continuously optimize parameters, improve logic, and achieve capability iteration and upgrading. High-quality data is also the foundation for commercial implementation, ensuring that models can effectively play a role in actual business scenarios and promote the value realization of the AI industry.

As Liu Liehong, Director of the National Data Administration, pointed out, improving the quality and efficiency of datasets is a "catalyst" for AI to empower the real economy. Against the background of gradually converging algorithms and computing power, high-quality datasets have become a key moat for shaping the core competitiveness of AI models.

Therefore, to succeed in the competition of AI industry development, the next step needs to focus on building data competitiveness.

II.High-Quality Datasets Require a Set of Data Infrastructure for Support

The height and depth of AI development directly depend on the scale and quality of data as a new production factor. Breaking the bottleneck of insufficient supply of high-quality data is the primary prerequisite for the effective implementation of AI. However, high-quality datasets do not emerge out of thin air nor are they a one-time achievement. Instead, they are a dynamic process that requires continuous aggregation, processing, and governance. Only by relying on a data infrastructure platform can we stably output vivid data capabilities and achieve in-depth integration with basic models in various industrial scenarios.

A high-quality dataset is a collection of data that can be directly used for the development and training of AI models after a series of data processing operations such as collection and processing, and can effectively improve model performance. Through systematic screening, cleaning, annotation, enhanced synthesis, quality evaluation and other links, it forms standardized data products with the characteristics of unified format, controllable quality, and strong scenario adaptability. Essentially, the difference between high-quality datasets and ordinary data lies in the generational gap in "availability" and "efficiency".

It should be clarified that high-quality data does not emerge out of thin air nor can it be obtained once and for all. The construction of high-quality datasets is a dynamic process that must be based on continuous, stable, and fresh data supply. The construction of high-quality data is like street fighting; different enterprises, industries, and sectors have their own unique situations, and it is impossible to solve all problems at once through a single measure. Therefore, it is necessary to sort out data resources one by one according to the specific conditions of each enterprise, industry, and sector, build a scientific and reasonable governance system, and ultimately form high-quality enterprise-level, industrial-level, and sector-level datasets.

At the same time, the construction of high-quality datasets is inseparable from the support of a professional data platform. This platform must connect the entire data lifecycle from data collection, aggregation, cleaning, annotation, and governance to application. It not only accurately meets the needs of high-quality dataset construction but also empowers the entire link with technology to ensure the stable output and high availability of data supply.

III.Data&AI Integrated Platform is the Core Engine of Data Infrastructure

The AI era has put forward new requirements for data platforms: downward, it needs to combine GPUs for new computing power optimization; upward, it needs to carry out model tuning for various terminal scenarios and large-scale Agent development to solve practical problems. Therefore, AI engineering and AI Infra need to be deeply integrated with data engineering to build Data&AI integrated platform capabilities—which is the core of data infrastructure.

Data infrastructure is a new type of infrastructure aimed at releasing the value of data elements, integrating hardware, software, and standards. From a macro perspective, data infrastructure is a new type of infrastructure that provides services such as data collection, aggregation, transmission, processing, circulation, utilization, operation, and security for society from the perspective of releasing the value of data elements. Among them, the Data&AI integrated platform serves as the technical base, the data confirmation and value distribution mechanism constructs the rights and interests framework, the data circulation connector realizes cross-domain data interaction, and the data mall system supporting the transaction of data products and applications. It is an organic whole integrating hardware, software, model algorithms, standards, mechanism design, etc. As a basic software platform, the Data&AI integrated platform is the core engine of data infrastructure.

From an enterprise perspective, the core of the Data&AI integrated platform is to connect the entire link of data storage, governance, computing, and AI model development, realizing the two-way empowerment of "Data for AI" and "AI for Data". It is an upgraded form of the traditional big data platform, reconstructing the data processing paradigm through an AI-native architecture and becoming the "core production tool" in the AI era.

In the AI era, there are new requirements for data platforms: upward, it connects with basic models to provide strong support for scenario-based model tuning and the implementation of innovative applications; downward, it undertakes computing power resources to fully release the advantages of computing power and realize the optimal scheduling and efficient utilization of computing resources.

From the construction of national and urban-level trusted data spaces to application scenarios such as financial risk control, intelligent manufacturing, medical and health care, and retail, the comprehensive implementation of AI applications enables technology to accurately adapt to scenarios. It allows data to be seamlessly connected to AI training and model development (AI Infra) during the collection, cleaning, and annotation links (Data Infra), integrating AI and data infrastructure capabilities. It promotes the large-scale implementation of AI's "last mile" in various business scenarios and helps industrial intelligent upgrading.

IV.Systematic Methodology System for Data Infrastructure Construction: "Methodology + Technology + Products + Practice"

The construction of data infrastructure is not a purely technical or software-hardware work, but requires systematic support of "methodology + technology + products + practice" to achieve organizational and large-scale collaboration. In terms of methodology, based on years of practice, KeenData has formed a hybrid data intelligence implementation system of "deep integration of data governance and data engineering" and "centralized management and decentralized empowerment". In terms of technology and products, the Data&AI integrated platform serves as the core carrier. In practice, it has covered manufacturing, industry, energy, finance, retail and other fields—for example, in the field of intelligent manufacturing, it digitizes industrial knowledge, making "data/software" the "brain" of intelligent manufacturing and laying a solid core foundation for building a strong manufacturing country.

As a basic platform providing core technical support, the significance of the Data&AI integrated platform goes far beyond solving single-point technical problems. It also provides core support capabilities—digital intelligence capabilities—for large organizations and enterprises to continuously promote digital and intelligent transformation in the next 5-10 years. In the AI era, digital intelligence capability has become a key enterprise capability alongside supply chain capability, financial capability, and human resource capability, and is an indispensable core capability for enterprise development.

KeenData has been deeply engaged in the Data&AI integration field for more than six years and has built an AI-Native-oriented Data&AI integrated platform—KeenData Lakehouse. The platform integrates the "AI-Native" design concept and independently develops an AI-in-Lakehouse intelligent-driven architecture. It connects the entire link of data engineering → model training/inference → Agent factory → intelligent application, and promotes the "Data&AI" new infrastructure with the platform capabilities of "trustworthy + intelligent + systematic", supporting large organizations to move from data-driven to intelligent-driven. The platform breaks the traditional architecture where data and AI are separated, unifies the lake-house integrated engine, OLAP data governance, and AI technology, forming a streamlined and efficient All-in-One technical solution. The self-developed multi-modal computing engine completes data cleaning to result analysis in a single pipeline, multiplying GPU inference throughput. Combined with KMI inference acceleration, model quantization, and Unity Catalog, it realizes cross-modal intelligent governance.

With strong technical strength and product hard power, KeenData has won wide recognition in the industry: it has not only won the first prize of Provincial and Ministerial-level Scientific and Technological Progress Award, ranked among the Top 5 in China's big data private deployment platforms for three consecutive years, and ranked NO.1 in the market share of China's lake-house integrated platform software. It has also been rated as a National-level Key "Little Giant" Enterprise with Specialization, Refinement, Differentiation and Innovation, People's Network "Ingenuity Leap Award", Ministry of Industry and Information Technology Software Product Trustworthy Excellence Level Certification, Financial Industry Golden Tripod Award and many other important honors. At the same time, it has been listed as a global recommended vendor of data foundation platforms by international authoritative institutions such as Gartner and IDC, becoming a benchmark enterprise in the Data&AI integration field.

Relying on the support of methodology + technology + products + practice, KeenData has successfully served nearly 200 large organizations in more than 20 industries including manufacturing, industry, energy, finance, and retail. It has tailored data infrastructure and data bases to meet their business needs, with remarkable implementation results. At the same time, Kejie actively responds to national policies related to Digital China and data elements, deeply participates in the planning and construction of government-side data infrastructure and trusted data spaces, and undertakes projects of trusted data spaces and pilot demonstration zones in multiple key cities in China. It fully implements core capabilities in both government and enterprise scenarios, continuously broadening the path for data value release.

Energy Industry:Based on the Data&AI integrated platform KeenData Lakehouse, Sinopec has built a data resource pool covering 9 core businesses with a total data volume of 1.2PB, formulated 3,727 data standards, and provided 3,093 data services. Through AI empowerment, the efficiency of viewing business analysis reports has been significantly shortened from 1 week to 4 hours, with improved accuracy. It has successfully built high-quality datasets required for exploration vertical large models, promoting business intelligent innovation.

Financial Enterprises:China CITIC Bank's financial-grade real-time data platform based on KeenData Lakehouse integrates data from ten core business domains to support the real-time transaction needs of 100 million-level customers. The platform has shortened the response time of key links in credit approval by 60%, significantly improved the efficiency of real-time anti-fraud interception, and promoted the implementation of more than 10 core applications such as risk monitoring and mobile operations, forming a full-domain real-time data management capability.

Multinational Foreign Enterprises:AEON Group's Data&AI integrated platform based on KeenData Lakehouse integrates data from ten theme domains with a storage capacity of TB level. Through AI empowerment, the response speed of core reports has increased by 10 times, the business decision cycle has been shortened by 50%, intelligent pricing has increased the sales volume of KVI products by 9%, CDP member operations have promoted the repurchase rate by 8.45%, and a real-time inventory early warning system has been built, reducing the out-of-stock rate by 12%, comprehensively driving business intelligent upgrading.

Urban Governments:Taking the Data&AI integrated platform as the carrier and adhering to the overall idea of "construction, service, management, and operation", it builds a "1+4+N" framework and trusted data space to improve the efficiency of "data supply and data use". It promotes the trusted circulation and compliant sharing of data elements, realizing the access of more than 1,000 data subjects, the release of more than 2,000 data products, the creation of more than 30 typical application scenarios, the coverage of more than 5 key industries, and the access of public data resources. It empowers industrial economic development, promotes the release of data element value, and realizes sustainable data operation.

Relying on a leading technical base and profound software accumulation, combined with in-depth verification of project practices in multiple fields and scenarios, KeenData accurately grasps the core trends of enterprises' digital and intelligent transformation and upgrading, and provides enterprises with a full-process action guide that is both scientific and implementable.

KeenData proposes the core construction model of "centralized management and decentralized empowerment": through centralized management, it realizes unified control and quality control of the entire data lifecycle; through decentralized empowerment, it activates the innovation vitality of front-line business, allowing data value to accurately penetrate into business scenarios. At the same time, it deeply promotes the "deep integration of data governance and data engineering", systematically embedding governance requirements into all engineering links such as data collection, processing, and application, breaking the limitation of traditional post-correction, and helping enterprises build deeply digital and intelligent-driven organizations. Based on KeenData's Data&AI integrated platform KeenData Lakehouse, it provides solid implementation support for the above methodology.

After the construction of the KeenData Lakehouse Data&AI integrated platform, the core architecture of the enterprise-level big data and AI department has been fully built. This department not only has a powerful technical engine and low-code data and AI development tools but also realizes the high availability and reusability of data and AI capabilities through standardized management methods and in-depth matching of data assets. However, to further output precise services to a wider range of business units, it is still necessary to solve the core challenges of insufficient precipitation of industry Know-how and delayed response to personalized business needs. Therefore, the functional upgrading of the big data and AI department is inevitably moving towards "business-oriented expression of data + AI"—with Data Fabric data weaving and virtualization technology as the core, a wide range of middle and grass-roots personnel can quickly find data, call AI tools, and consume data services, truly integrating digital intelligence capabilities into daily business.

The implementation of Data Fabric is inseparable from the collaborative support of underlying low-code development, intelligent routing computing, and AI engineering pipelines. Ultimately, it realizes the freedom of data consumption and AI application across the entire organization, achieves a closed loop of data R&D, AI development, and business operations, and promotes the entire organization to form a data and AI dual-driven working model.

It is worth emphasizing that as the core supporting Data&AI integrated platform, KeenData Lakehouse is not a pure software platform but a core competitiveness that enterprises must continuously iterate. Its essence is a comprehensive system of deep integration of "advanced technology + mature software + AI engineering". It not only solves technical implementation problems but also shapes a new enterprise management method through the core model of "centralized management and decentralized empowerment", and is the best practice carrier for software to be deeply integrated into enterprise management. It connects technical engineering, data management, AI operations, and business collaboration, helping enterprises establish a new collaboration mechanism based on data and AI needs, and ultimately promoting the comprehensive digital and intelligent transformation of the organization from management model, business processes to value creation, turning transformation from a slogan into sustainable growth results.

V.Social Value of Data Infrastructure

The characteristics of data, such as dependency, determine that its value must be realized through specific application scenarios, and cross-industry and grass-roots front-line scenarios can create more extensive value. However, grass-roots units generally face the dilemma of "no data and no technology". The social value of data infrastructure lies in the more effective social allocation of data (production factors) and AI technology (production tools). Through "data available but not visible" and "centralized management + decentralized empowerment", this inclusive allocation method can truly realize the AI revolution and stimulate individual innovation.

Yu Yang, Chairman of KeenData

A distinctive feature of the digital economy is that data is the core production factor. The key difference from traditional production factors is that data has the attributes of replicability, shareability, and unlimited growth. These attributes free data from the scarcity and consumption constraints of traditional factors such as land and capital—replication requires no additional cost, sharing can break time and space limitations, and unlimited growth can continuously accumulate scale effects. This also makes data the most potential production resource in the digital economy era.

However, it should be clarified that the act of replication itself cannot directly generate value. The release of its core value lies not in the act of "circulation" itself, but in the "efficient utilization after circulation". The core significance of data circulation is to break the barriers of data silos, allowing scattered data in different departments, subjects, and scenarios to flow and aggregate, laying the foundation for subsequent utilization. But circulation is only a prerequisite for data value realization, not the final result. Only by deeply binding and accurately adapting the aggregated data flow with specific business scenarios, and embedding data into cross-industry terminal services and the real needs of the grass-roots front line of the industry, can abstract data be transformed into practical results such as decision support, efficiency improvement, and innovation breakthroughs, and truly release its in-depth value as a production factor.

These scenarios that can stimulate data value are widely rooted in factory production workshops, community convenient service points, farmers' fields, and the daily operations of small and medium-sized enterprises, covering the front-line scenarios of the social and economic capillaries. Only by allowing data to accurately meet the production needs of grass-roots units, the living needs of the masses, and the operational needs of enterprises can its scale and diversity give full play to their roles and realize cross-scenario value linkage through mobility.

In actual front-line scenarios, the value of data as a production factor has been fully verified.

Taking pharmaceutical companies as an example, pharmaceutical companies can aggregate clinical practice cases from multiple hospitals for in-depth analysis to accurately optimize drug production processes and clinical application plans. The process of accumulating cases in a single hospital is often time-consuming, but the centralized integration of medical cases from multiple hospitals can significantly accelerate the research process and achievement transformation efficiency of top hospitals.

However, the implementation of enterprise AI scenarios is always plagued by the bottleneck of "lack of data management capabilities": on the one hand, small and medium-sized enterprises and grass-roots institutions have strong demand for AI scenario implementation, but lack data source channels and data acquisition capabilities, falling into the dilemma of "having demand but no data"; on the other hand, even if some data is obtained through scattered channels, it is difficult to achieve in-depth data processing, effective analysis, and value transformation due to the lack of professional data technology teams and AI tool support, ultimately leading to "having data but no value".

So, how to realize the effective allocation and inclusive supply of data production factors for grass-roots scenarios, front-line workers, and the masses? This requires breaking technical barriers and resource monopolies, allowing small and medium-sized enterprises to obtain compliant data resources without high investment, enabling front-line workers to improve efficiency with lightweight AI tools, and letting ordinary people enjoy the dividends of the AI era.

Vigorously promoting the construction of data infrastructure and trusted data spaces is the core key to solving the above problems. By building data infrastructure, promoting the deep integration of data production factors and Data&AI technology production tools, and adopting the innovative service model of "available but not visible" data security circulation mechanism and "centralized construction + scenario-based empowerment", it accurately supplies high-quality production factors (data) and efficient productivity tools (AI technology) for small and medium-sized enterprises and grass-roots scenarios. It realizes the optimal allocation of data productivity and production factors and releases the innovation vitality and value potential contained in the grass-roots level.

The value realization of data infrastructure is inseparable from the strong support of key technologies—the Data&AI integrated platform is the core engine for activating scenario value creation. It can break the barriers between data and scenarios, make data truly "come alive" in specific application scenarios, and transform from static resources into a strong driving force for high-quality development at the grass-roots level.

VI.Undertaking the Mission of the Times: AI Technology Going Global and Data Capability Collaboration to Jointly Build New Global Industrial Advantages

Over the past ten years, China has steadily moved towards the center of the world stage and continuously exported China's advanced productive forces. Chinese technology enterprises are fully capable of providing global customers with innovative products verified in the Chinese market and creating new business ecosystems according to local conditions. Currently, Kejie has provided data infrastructure implementation support for local operators, financial institutions, and government technology departments in Japan, Saudi Arabia, Oman, Malaysia and other countries.

Since 2019, with the strong promotion of policies, the digital transformation of all industries in China has accelerated. The market of more than 1 billion 5G mobile data terminal users has accumulated unparalleled technical experience for these enterprises. With this experience, Chinese technology enterprises are fully capable of providing global customers with innovative products verified in the Chinese market and creating new business ecosystems according to local conditions.

Similarly, relying on the mature Data&AI data infrastructure construction experience and core technologies accumulated in China, KeenData has taken the initiative to enter the overseas market. It exports advanced domestic technologies, products, and methodologies to overseas countries and regions, helping them build core capabilities for development in the AI era, and promoting the development of local AI industries and digital economies. It has established in-depth cooperative relations with customers in many countries around the world, such as Saudi Arabia, Singapore, South Africa, Japan, Malaysia, and the Philippines. It works with global partners to jointly build new industrial advantages and contribute Chinese wisdom and Chinese strength to the development of the global digital economy.

Build Your AI-Native Data & AI Platform with KeenData

Company Introduction Back to Top
Contact Us (09:00-18:00) 010-64703560
Technical Support support@keendata.com
Product Consultation

Dedicated product consultation service

One-stop, full-chain, fully visual data intelligence platform

Many enterprises choose us, we fulfill our clients' trust with capability

Learn more
Start your data intelligence journey now
×
  • Please select service type

Thank you for your inquiry. We will contact you within 1 business day

×
×
×
×