-
dianping.com
- shanghai
Stars
Fluss is a streaming storage built for real-time analytics.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
dask / fastparquet
Forked from jcrobak/parquet-pythonpython implementation of the parquet columnar file format.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
BibiGPT v1 · one-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube | Tweet丨TikTok丨Dropbox丨Google Drive丨Local files | Websites丨Podcasts | Meetings | Lectures, etc. 音视…
🔬 Online Heap Dump, GC Log, Thread Dump & JFR File Analyzer.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Flowchart for debugging Spark applications
A better notebook for Scala (and more)
A query predictor pipeline and service to predict resource usages of Presto queries
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Warp is a modern, Rust-based terminal with AI built in so you and your team can build great software, faster.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊
The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台