Skip to content

Latest commit

 

History

History
57 lines (43 loc) · 1.32 KB

README.md

File metadata and controls

57 lines (43 loc) · 1.32 KB

CAUTION: PROTOTYPE! NO PRODUCTION USE, not even development.

Merdb is a data processing library that

  • is a relational api to query data (like SQL but in Python)
  • has Unix like pipes to compose operators using the | syntax
  • scales to multi core or a cluster(via Modin)
  • processes data too big to fit into memory(via Modin)
  • support interactive and optimized processing(optimizations in roadmap)

Install

pip install merdb

Example

import pandas as pd
from merdb.interactive import *

def is_senior(row) -> bool:
    return row['age'] > 35


def double_age(row) -> int:
    return row["age"] * 2


cols = ["name", "age"]
people_df = pd.DataFrame([
    ["Raj", 35],
    ["Sona", 20],
    ["Abby", 70],
    ["Abba", 90],
], columns=cols)

# One can specify functions without any source data like quadruple age
# map is a merdb function
quadruple_age = map(double_age, "age") | map(double_age, "age")

result = (t(people_df) # convert people_df to a merdb table
          | where(is_senior)
          | order_by("name", "asc")
          | quadruple_age # Unix like pipe syntax making it easy to refactor out intermediate processing
          | select("age")
          | rename({"age": "new_age"})
          )

# Convert to Pandas Dataframe and print
print(result.df())

# Output
   new_age
0      360
1      280