多码网
返回分类

大数据

大数据处理、分析与可视化

262 个源码项目

gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

2,263749JavaApache-2.0

gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

2,263749JavaApache-2.0

GraphEngine

Microsoft Graph Engine

2,251330C#MIT

mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

2,085100PythonMIT

onyx

Distributed, masterless, high performance, fault tolerant data processing

2,051202ClojureEPL-1.0

datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

2,020269RustApache-2.0

elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

2,000992JavaApache-2.0

bytewax

Python Stream Processing

1,980107PythonApache-2.0

tera

An Internet-Scale Database.

1,906436C++BSD-3-Clause

actordb

ActorDB distributed SQL database

1,89170ErlangMPL-2.0

faust

Python Stream Processing. A Faust fork

1,867203PythonNOASSERTION

secor

Secor is a service implementing Kafka log persistence

1,860532JavaApache-2.0

Gaffer

A large-scale entity and relation database supporting aggregation of properties

1,793364JavaApache-2.0

ambry

Distributed object store

1,787288JavaApache-2.0

oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

1,785401JavaApache-2.0

genie

Distributed Big Data Orchestration Service

1,763372JavaApache-2.0

ekuiper

Lightweight data stream processing engine for IoT edge

1,697452GoApache-2.0

PyHive

Python interface to Hive and Presto. 🐝

1,696549PythonNOASSERTION

LocustDB

Blazingly fast analytics database that will rapidly devour all of your data.

1,64675RustNOASSERTION

quix-streams

Python Streaming DataFrames for Kafka

1,542105PythonApache-2.0

streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.

1,507221PythonApache-2.0

comdb2

Bloomberg's distributed RDBMS

1,505236CNOASSERTION

HiBench

HiBench is a big data benchmark suite.

1,492767JavaNOASSERTION

wally

Distributed Stream Processing

1,48467PonyApache-2.0

4 / 11 页,共 262 个项目