大数据
大数据处理、分析与可视化
共 262 个源码项目gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
GraphEngine
Microsoft Graph Engine
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
onyx
Distributed, masterless, high performance, fault tolerant data processing
datafusion-ballista
Apache DataFusion Ballista Distributed Query Engine
elasticsearch-hadoop
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
bytewax
Python Stream Processing
tera
An Internet-Scale Database.
actordb
ActorDB distributed SQL database
faust
Python Stream Processing. A Faust fork
secor
Secor is a service implementing Kafka log persistence
Gaffer
A large-scale entity and relation database supporting aggregation of properties
ambry
Distributed object store
oryx
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
genie
Distributed Big Data Orchestration Service
ekuiper
Lightweight data stream processing engine for IoT edge
PyHive
Python interface to Hive and Presto. 🐝
LocustDB
Blazingly fast analytics database that will rapidly devour all of your data.
quix-streams
Python Streaming DataFrames for Kafka
streamparse
Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
comdb2
Bloomberg's distributed RDBMS
HiBench
HiBench is a big data benchmark suite.
wally
Distributed Stream Processing
第 4 / 11 页,共 262 个项目
