多码网
返回分类

大数据

大数据处理、分析与可视化

262 个源码项目

gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

2,265748JavaApache-2.0

gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

2,265748JavaApache-2.0

GraphEngine

Microsoft Graph Engine

2,251331C#MIT

mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

2,085100PythonMIT

onyx

Distributed, masterless, high performance, fault tolerant data processing

2,051201ClojureEPL-1.0

datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

2,032278RustApache-2.0

bytewax

Python Stream Processing

1,994109PythonApache-2.0

elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

1,977997JavaApache-2.0

tera

An Internet-Scale Database.

1,906436C++BSD-3-Clause

actordb

ActorDB distributed SQL database

1,89071ErlangMPL-2.0

faust

Python Stream Processing. A Faust fork

1,870203PythonNOASSERTION

secor

Secor is a service implementing Kafka log persistence

1,860533JavaApache-2.0

Gaffer

A large-scale entity and relation database supporting aggregation of properties

1,794364JavaApache-2.0

ambry

Distributed object store

1,788293JavaApache-2.0

oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

1,785401JavaApache-2.0

genie

Distributed Big Data Orchestration Service

1,763373JavaApache-2.0

ekuiper

Lightweight data stream processing engine for IoT edge

1,700451GoApache-2.0

PyHive

Python interface to Hive and Presto. 🐝

1,698548PythonNOASSERTION

LocustDB

Blazingly fast analytics database that will rapidly devour all of your data.

1,64575RustNOASSERTION

quix-streams

Python Streaming DataFrames for Kafka

1,548105PythonApache-2.0

comdb2

Bloomberg's distributed RDBMS

1,511238CNOASSERTION

streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.

1,507220PythonApache-2.0

HiBench

HiBench is a big data benchmark suite.

1,490768JavaNOASSERTION

wally

Distributed Stream Processing

1,48467PonyApache-2.0

4 / 11 页,共 262 个项目