中文
KeenData Releases the Milestone Version KeenData Lakehouse 2.0, Building a Unified Data and AI Infrastructure

KeenData Lakehouse 2.0 is a Data and AI integrated platform designed for the AI-Native era. The entire platform incorporates the AI-Native design philosophy and introduces the pioneering AI-in-Lakehouse intelligent-driven architecture, which connects the full chain from data engineering to model training and inference to the agent factory to intelligent applications. With trusted, intelligent and systematic platform capabilities, it advances the new Data and AI infrastructure and supports large organizations in moving from data-driven development to intelligent-driven development.

KeenData Lakehouse2.0

KeenData Lakehouse 2.0 adopts an AI-Native intelligent-driven architecture, which enables unified Data and AI engineering capabilities. Designed for large organizations to operationalize data and AI systems in a systematic manner, the platform provides foundational infrastructure products which cover data integration, batch and streaming development, multimodal computation, data governance, dataset management, AI model building and the full-chain closed loop from training and inference to agent development.

The platform breaks through traditional architectures in which data and AI are separated. Its pioneering AI-in-Lakehouse technology unifies the lakehouse engine, OLAP data governance and AI technologies, which forms a streamlined and highly efficient All-in-One technical solution. The self-developed multimodal computation engine completes the entire process from data cleaning to analytical output in a single pipeline, which increases GPU inference throughput by several times. Together with KMI inference acceleration, model quantization and Unity Catalog, it achieves cross-modal intelligent governance.

Features of the AI-Native Data and AI Integrated Platform Data and AI Integration

The platform achieves deep fusion between data and AI, which connects full-lifecycle data processing with the AI development workflow and which forms a closed loop that spans data processing, AI development and application deployment. Its core characteristics are reflected in three dimensions:

  • Multimodal data processing: It supports unified governance for text, images, audio and video, which enables efficient handling of diverse data types.
  • Agent-centric intelligent architecture: It achieves a complete loop of perception, cognition, action and evolution.
  • Data and AI integration: It provides a native All-in-One architecture which eliminates fragmentation between data and AI systems.

AI-Native

Distinct from traditional platforms which adopt loosely coupled external AI components, the Data and AI integrated platform from KeenData adopts AI-Native principles at its core, which embeds intelligent capabilities deeply into the system’s foundation and which builds an intelligent data infrastructure capable of autonomous evolution. Its technical architecture and core capabilities revolve around a dual-driven mechanism in which AI efficiently processes data and data intelligently supports AI, which covers three major capabilities: MaaS autonomous inference, agent self-iteration and intelligent enablement across the entire data lifecycle.

To address the pain points of traditional storage-compute integrated architectures, which lead to low resource utilization and high expansion costs, the platform adopts a storage-compute separation architecture. Data is stored in a unified high-performance storage layer, while compute resources scale elastically on demand. This reduces storage costs by more than 30 percent and allows AI training and inference workloads to allocate resources flexibly, which completely resolves resource contention between large and small tasks and establishes a solid foundation for implementing intelligent closed-loop capabilities.

Key Capabilities of the AI-Native Data and AI Integrated Platform AI Integrated Implementation Capability

The platform covers the full lifecycle of AI models, which includes model building, deployment, evaluation, governance, release and application, and it provides comprehensive services and support. Through a unified compute-scheduling engine which dynamically optimizes resource allocation, the platform offers strong guarantees for enterprise-level large-model development, deployment and intelligent operations, ensuring stability and elasticity in production environments.

The platform innovatively integrates hundreds of pre-trained models and supports zero-code model fine-tuning and transfer learning, which enables enterprises to build and operationalize customized models that fit their business scenarios efficiently by combining an advanced algorithm matrix. At the same time, the platform provides visual agent-application development capabilities, which allow developers to orchestrate multi-node workflows through low-code tools and which enable efficient development of production-grade generative AI applications, driving the democratization of AI. At the data-foundation layer, the platform manages the full lifecycle of unstructured data through a lakehouse architecture and unifies human annotation with intelligent AI annotation, which builds high-quality training datasets, strengthens the foundation for model training and preserves core data assets.

Keen AI serves as a critical supporting component, with deep customization of the training and inference framework. It incorporates diverse model-training and optimization strategies, which support model lightweighting, multi-mode parallel computation and inference-acceleration technologies such as sparse activation, operator-fusion optimization and paged attention. This achieves efficient collaboration between training and inference, breaking performance bottlenecks which arise from traditional fragmented development workflows.

Ready-to-Use Agent Intelligence

Rakesh Gohel, in his well-known iceberg model, highlighted a fundamental reality of enterprise AI agent implementation: building a truly usable enterprise-grade agent consists of 90 percent software engineering and 10 percent AI.

The Data and AI integrated platform from KeenData provides a one-stop agent development factory which natively integrates the 90 percent engineering capability with the 10 percent AI capability, enabling developers to easily build intelligent assistants, writing agents and automated workflows. The platform offers a rich library of built-in agents which developers can reuse directly, along with visual orchestration tools and online debugging and preview features. It supports seamless integration with multiple mainstream large models, which allows developers to rapidly build customized agents and retrieval-augmented generation (RAG) applications, significantly reducing development barriers.

Through dynamic task-decomposition algorithms which precisely divide complex requirements, combined with multimodal intent-understanding technologies which deeply interpret user needs, the platform enables a complete loop from requirement understanding to task execution. With a cross-platform execution engine which coordinates data, tools and services, it ensures end-to-end operational capability. The platform also provides full-lifecycle management features, which include online debugging and preview, application release and updates, and API access. It supports flexible integration and control of various models to meet scenario-specific requirements, empowering developers to efficiently build and operationalize intelligent applications and enabling enterprises to activate and construct the critical 90 percent beneath the iceberg.

Multimodal Computation Engine

The Data and AI integrated platform from KeenData provides a multimodal computation engine which is designed for multimodal AI workloads. It supports data cleaning, feature extraction, model inference and result analysis within a single data-processing pipeline, and it deeply integrates with mainstream data and AI frameworks, which enables mixed scheduling within the same task. The multimodal computation engine reconstructs the data-preprocessing paradigm and builds a native engine which understands and processes complex multimodal data. Designed for AI and ML workflows, it provides enhanced DataFrame primitives for AI and ML tasks. It features low latency and high throughput, supports zero-copy data sharing and simplifies fault tolerance through immutable-data design, which reduces network overhead by seventy percent and is particularly suitable for compute-intensive workloads. On top of this, it provides an enhanced dynamic execution engine which offers a unified and highly abstract representation for tasks and actors. A single interface can express both task-parallel computation and actor-based parallel computation.

AI for Data Governance

The platform implements the concept of integrated development and governance, which builds an AI-driven intelligent governance system. Through intelligent metadata scanning, it achieves dynamic encryption and masking of sensitive data. Its self-developed unified-metadata technology covers the entire product matrix and provides enterprise-grade data-governance capabilities, which include intelligent governance, permission management, centralized auditing, automated lineage tracking and data sharing across platforms, tenants and regions, ensuring security and compliance of data assets.

A standardized data model is constructed based on business-domain classification, which enables real-time detection of data anomalies and automated generation of quality-assessment reports. This achieves an upgrade from passive control to proactive prevention, which enables intelligent and automated data governance.

Full-Stack Intelligent Capabilities

The platform is designed to be simple and intuitive, and its built-in intelligent capabilities, including a high-precision NL2SQL model, significantly enhance data development and application efficiency. Its semantic development engine, which is based on NLP technologies, allows business users to perform data queries and development directly through natural language, while providing SQL interpretation and optimization capabilities for data-warehouse engineers, which greatly improves development efficiency.

The platform offers powerful multimodal retrieval capabilities. Combined with OCR, feature-extraction technologies and deep natural-language understanding, it supports rapid cross-modal retrieval and accurate localization of text and image content. The intelligent data-query system, which deeply understands business semantics, automatically links enterprise-wide data assets and allows users to query structured and unstructured assets through natural language.

Meanwhile, the platform supports the construction of high-efficiency enterprise knowledge bases. It provides intelligent document segmentation and embedding for multi-format documents, which transforms documents into searchable knowledge units. By integrating advanced models such as DeepSeek, it enables deep intelligent question-answering capabilities, which significantly improve efficiency and the user experience across data development, retrieval, management and application scenarios.

Autonomous and Secure Technical Support Capabilities

The platform is built on more than 170 core patents in big data and AI technologies, which establishes a secure and controllable technical foundation. KeenData has independently developed the AI-in-Lakehouse intelligent-driven architecture, multimodal fusion engine, Data Fabric, Active Metadata Management, Data Mesh and Data Virtualization, which enable integrated governance and development as well as distributed data processing under centralized control. The self-developed Unified Catalog provides cross-modal semantic alignment capabilities for large AI models, which ensures consistency and security in data understanding.

The innovative KMI inference-acceleration technology achieves a twofold performance improvement and optimizes heterogeneous-chip resource scheduling. Advanced model-quantization technologies, which utilize low-precision tensor cores (INT8 and INT4 Tensor Cores), achieve near-lossless compression and reduce storage overhead by seventy percent. The open architecture supports multiple computation engines and provides unified monitoring for both data and models. Through deep adaptation to domestic technology stacks (including Huawei Ascend, Hygon, Kylin OS and UOS), the platform achieves full-stack localization from hardware to applications, which delivers autonomous, secure and efficient technical assurance for governments, central state-owned enterprises and industries with high security requirements.

Diverse Application Scenarios

Powered by its end-to-end AI-Native architecture and low-code toolchain, the Data and AI integrated platform from KeenData combines technological universality with strong scenario adaptability. It responds rapidly to common enterprise needs in data retrieval, intelligent-assisted development, intelligent services and knowledge management, while also supporting deep customization for vertical industries, which enables one platform to cover multiple categories of scenarios.

  • Multimodal Data Retrieval: Through intelligent data annotation and natural-language understanding, the platform supports fast and precise retrieval across multimodal data, which includes text-to-text, text-to-image, image-to-image and image-to-text search.
  • Intelligent Data-Asset Question Answering: Users can quickly retrieve structured and unstructured Data and AI platform assets using natural language. Without understanding the underlying storage structures or writing queries, users can rely on the system, which analyzes query semantics automatically, converts them into executable queries and returns structured results.
  • Intelligent Assisted Development: Using natural language, developers can access programming-assistance capabilities for multiple languages, which include code generation, SQL generation and logic explanation, as well as performance-optimization suggestions for existing code. This enables rapid comprehension and efficient execution optimization.
  • Agent Development Factory: The AI agent development platform enables rapid creation of scenario-based agent applications such as intelligent customer service and workflow assistants. Through visual orchestration tools which combine dialogue nodes and task-execution nodes, enterprises can deploy intelligent Q&A systems without writing code to handle customer inquiries, ticket processing and other high-frequency needs. Combined with zero-code model fine-tuning in the large-model training platform, enterprises can optimize models based on proprietary conversation data, which improves accuracy and professionalism while reducing customer-service costs.
  • Intelligent Writing: The platform enables effortless creation of applications such as automated proposal writing, contract review and PPT generation, which enhances enterprise capabilities in processing key documents.
  • Knowledge-Base Construction: The enterprise knowledge-base module supports intelligent segmentation and embedding of multi-format documents (PDF, Excel, TXT and others), which transforms fragmented content into searchable knowledge units. A high-performance vector database is used to build a private enterprise knowledge base. Employees can obtain knowledge through natural-language queries, which include technical solutions, contract management, corporate policies and historical project experience. The platform supports rapid recall and standard querying across multiple databases and, with DeepSeek integration, performs deep analysis of user questions and retrieved knowledge, which provides professional and accurate answers.

Typical Case Applications Municipal Data Bureau Data Infrastructure Project

In a data-infrastructure project for a municipal data bureau, the integrated Data and AI platform from KeenData enabled non-algorithm teams to prepare data through the corpus-processing layer, and to complete model training, fine-tuning and deployment with zero code on the intelligent-support layer. They were then able to invoke APIs or build agents which transformed large models into commercial products rapidly. Meanwhile, the platform connected multimodal data with industry-specific intelligent agents across the entire chain, covering the full lifecycle from data to models to applications, which supported rapid development of data products for targeted scenarios. Through standardized SDKs and plugin interfaces which allow third-party corpus-processing tools to integrate in a plug-and-play manner, the project accelerated the deployment of large models in urban-service scenarios and promoted AI technologies which serve city governance and industrial upgrading effectively.

Municipal Digital Government 2.0 Project

In a Digital Government 2.0 project for another municipality, a trusted data space was built on KeenData’s integrated Data and AI platform, which provided a new generation of smart-city big data infrastructure and trusted data environments. This enabled comprehensive digital, intelligent and refined management and services across government, public livelihood and industrial domains, while ensuring that data resources were deeply explored and quickly applied. The project established the first centralized data-infrastructure support platform on the government side and explored effective mechanisms for providing public-sector data to enterprises.

Central State-Owned Enterprise Data-Intelligence Foundation Project

In a data-intelligence infrastructure project for a major central state-owned enterprise, the unified data center and governance framework were built on KeenData’s integrated Data and AI platform, which enabled efficient storage and computation for newly generated big data. By further aligning with business scenarios, the platform provided hundreds of service capabilities for planning, engineering decision-making and integrated engineering platforms. With AI which drives the management and sharing of all business and research data, the project accelerated the transformation of data into digital resources and assets, which enhanced operational efficiency and achieved integrated operations across the business chain. This serves as an important milestone which marks the group’s transition into a new stage of highly coordinated intelligent operations.

KeenData continues to advance innovation in Data and AI technologies, which forms the foundation of KeenData Lakehouse 2.0. Focusing on the construction and upgrading of data infrastructure for large organizations, the platform provides full-chain support from data engineering to intelligent applications through its AI-Native architecture and integrated Data and AI capabilities. With an autonomous and secure technical foundation and strong scenario-driven implementation capabilities, it enables enterprises to accelerate the transformation of data value into business momentum.