Skip to content
/ merdb Public

Distributed Relational Programming in Python

Notifications You must be signed in to change notification settings

RAbraham/merdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAUTION: PROTOTYPE! NO PRODUCTION USE, not even development.

Merdb is a data processing library that

  • is a relational api to query data (like SQL but in Python)
  • has Unix like pipes to compose operators using the | syntax
  • scales to multi core or a cluster(via Modin)
  • processes data too big to fit into memory(via Modin)
  • support interactive and optimized processing(optimizations in roadmap)

Install

pip install merdb

Example

import pandas as pd
from merdb.interactive import *

def is_senior(row) -> bool:
    return row['age'] > 35


def double_age(row) -> int:
    return row["age"] * 2


cols = ["name", "age"]
people_df = pd.DataFrame([
    ["Raj", 35],
    ["Sona", 20],
    ["Abby", 70],
    ["Abba", 90],
], columns=cols)

# One can specify functions without any source data like quadruple age
# map is a merdb function
quadruple_age = map(double_age, "age") | map(double_age, "age")

result = (t(people_df) # convert people_df to a merdb table
          | where(is_senior)
          | order_by("name", "asc")
          | quadruple_age # Unix like pipe syntax making it easy to refactor out intermediate processing
          | select("age")
          | rename({"age": "new_age"})
          )

# Convert to Pandas Dataframe and print
print(result.df())

# Output
   new_age
0      360
1      280

About

Distributed Relational Programming in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages