CAUTION: PROTOTYPE! NO PRODUCTION USE, not even development.

Merdb is a data processing library that

is a relational api to query data (like SQL but in Python)
has Unix like pipes to compose operators using the | syntax
scales to multi core or a cluster(via Modin)
processes data too big to fit into memory(via Modin)
support interactive and optimized processing(optimizations in roadmap)

Install

pip install merdb

Example

import pandas as pd
from merdb.interactive import *

def is_senior(row) -> bool:
    return row['age'] > 35


def double_age(row) -> int:
    return row["age"] * 2


cols = ["name", "age"]
people_df = pd.DataFrame([
    ["Raj", 35],
    ["Sona", 20],
    ["Abby", 70],
    ["Abba", 90],
], columns=cols)

# One can specify functions without any source data like quadruple age
# map is a merdb function
quadruple_age = map(double_age, "age") | map(double_age, "age")

result = (t(people_df) # convert people_df to a merdb table
          | where(is_senior)
          | order_by("name", "asc")
          | quadruple_age # Unix like pipe syntax making it easy to refactor out intermediate processing
          | select("age")
          | rename({"age": "new_age"})
          )

# Convert to Pandas Dataframe and print
print(result.df())

# Output
   new_age
0      360
1      280

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Install

Example

Files

README.md

Latest commit

History

README.md

File metadata and controls

Install

Example