Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation view of spec #58

Open
fabiancook opened this issue May 16, 2020 · 0 comments
Open

Implementation view of spec #58

fabiancook opened this issue May 16, 2020 · 0 comments

Comments

@fabiancook
Copy link

fabiancook commented May 16, 2020

TLDR I found it harder to implement this in the way the spec defines, and found a split between read and write to make things a lot easier


As a pretext to this, I did previously try to implement this based on using async iterables in place of "streams" where the dataset also implemented [Symbol.asyncIterator], however the implementation was messy and didn't cleanly fall into place, it required weird inheritance where a factory was required to create new datasets within DatasetCore.

I hoped that using async iterables would help with the lazy style of reading I wanted to achieve, but it turns out you can achieve this same thing with sync Iterables, which can be seen in a code snippet below


I have had great success implementing dataset using two different concepts

First, we have all of our "read" dataset

export interface FilterIterateeFn<T> {
  (value: T): boolean
}

export interface RunIteratee<T> {
  (value: T): void
}

export interface MapIteratee<T, R> {
  (value: T): R
}

export interface ReadonlyDataset extends Iterable<Quad> {
  size: number
  empty: boolean
  filter(iteratee: FilterIterateeFn<Quad>): ReadonlyDataset
  except(iteratee: FilterIterateeFn<Quad>): ReadonlyDataset
  match(find: Quad | QuadFind): ReadonlyDataset
  without(find: Quad | QuadFind): ReadonlyDataset
  has(find: Quad | QuadFind): boolean
  contains(dataset: Iterable<Quad | QuadLike>): boolean
  difference(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
  equals(dataset: Iterable<Quad | QuadLike>): boolean
  every(iteratee: FilterIterateeFn<Quad>): boolean
  forEach(iteratee: RunIteratee<Quad>): void
  intersection(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
  map(iteratee: MapIteratee<Quad, QuadLike>): ReadonlyDataset
  some(iteratee: FilterIterateeFn<Quad>): boolean
  toArray(): Quad[]
  union(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
}

And our write dataset...

export interface Dataset extends ReadonlyDataset {
  add(value: Quad | QuadLike): Dataset
  addAll(dataset: Iterable<Quad | QuadLike>): Dataset
  import(dataset: AsyncIterable<Quad | QuadLike>): Promise<unknown>
  delete(quad: Quad | QuadLike | QuadFind): Dataset
}

If we want an immutable write dataset...

export interface ImmutableDataset extends Dataset {
  add(value: Quad | QuadLike): ImmutableDataset
  addAll(dataset: Iterable<Quad | QuadLike>): ImmutableDataset
  import(dataset: AsyncIterable<Quad | QuadLike>): Promise<ImmutableDataset>
  delete(quad: Quad | QuadLike | QuadFind): ImmutableDataset
}

A couple of specific changes...

DatasetCore is "moved up" to become the write dataset on top,
Dataset is "moved down" to become the read dataset.

write functions return writable datasets
read functions return readable datasets

In terms of implementation, this felt a lot more natural and took a lot less time than trying to follow the spec one to one

I've used types specifically for TypeScript here, but I think it shows nicely the how implementation works

Behind the scenes I was able to utilise a Set as the backing collection for quads

The read dataset in this implementation accepts a source Iterable, which the read dataset implements itself, meaning we don't need any intermediate steps in between creating new datasets, a Set also implements this, making everything very seemless.

Chaining using the read dataset is very clean which can be seen in the implementation of the read dataset itself:

https://github.com/opennetwork/rdf-dataset/blob/02f8d19e78b8065cfc0f78691f1af174e8c47425/src/readonly-dataset.ts#L76-L78

Using iterables also enables this kind of usage where the returned read dataset is a "live" view of the write dataset:

import { Dataset } from "../esnext/index.js"
import { DefaultDataFactory } from "@opennetwork/rdf-data-model"

const dataset = new Dataset()

const aNameMatch = {
  subject: DefaultDataFactory.blankNode("a"),
  predicate: DefaultDataFactory.namedNode("http://xmlns.com/foaf/0.1/name"),
  graph: DefaultDataFactory.defaultGraph()
}

const aMatcher = dataset.match(aNameMatch)

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"A"@en`)
})

dataset.add({
  subject: DefaultDataFactory.blankNode("s"),
  predicate: DefaultDataFactory.namedNode("http://xmlns.com/foaf/0.1/name"),
  object: DefaultDataFactory.literal(`"s"@en`),
  graph: DefaultDataFactory.defaultGraph()
})

console.log({ a: aMatcher.size, total: dataset.size })

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"B"@en`)
})

console.log({ a: aMatcher.size, total: dataset.size })

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"C"@en`)
})

console.log({ a: aMatcher.size, total: dataset.size })
console.log({ aObjects: aMatcher.toArray().map(({ object }) => object) })

This snippet outputs:

{ a: 1, total: 2 }
{ a: 2, total: 3 }
{ a: 3, total: 4 }
{
  aObjects: [
    LiteralImplementation {
      termType: 'Literal',
      value: 'A',
      language: 'en',
      datatype: [NamedNodeImplementation]
    },
    LiteralImplementation {
      termType: 'Literal',
      value: 'B',
      language: 'en',
      datatype: [NamedNodeImplementation]
    },
    LiteralImplementation {
      termType: 'Literal',
      value: 'C',
      language: 'en',
      datatype: [NamedNodeImplementation]
    }
  ]
}

I have implemented these datasets as sync as I am only wanting to know whats in memory right now, not whats available in this remote dataset

I did this because if you wanted to import information from a remote dataset, you should utilise import if you have an async iterable (Node.js ReadableStream, MongoDB cursors, etc), or addAll if you have another in memory dataset or an iterable (Arrays, Sets, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant