Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Aggregators to realise data summarisation on data streams stored in a Solid environment #84

Open
pbonte opened this issue Oct 19, 2022 · 15 comments
Assignees
Labels
challenge technical problem applied to a use case ongoing The challenge is actively being tackled. proposal: approved ✅ update-required

Comments

@pbonte
Copy link

pbonte commented Oct 19, 2022

Pitch

Data streams are becoming omnipresent, however, storing and analysing real-time data streams in a decentralised fashion using solid is still hard to achieve. This is mainly due to the high frequency of changes in the answers to the issued queries on these streams and the temporal validity of the answers.
A first prototype of streaming aggregators is necessary to prepare the answers of a continuous query over streaming data for a client and keep the query results up to date. This eliminated the need for the client to process the whole stream while the aggregator allow the client to retrieve the results instantaneously.
In patient monitoring system, data streams produced by personal vitality sensors and activity trackers are semantically annotated and stored in the data pods. Healthcare providers are interested in summaries of the activity of a single patient our summaries across multiple patients.
Streaming aggregators are required to realise an improved data summarisation and instantaneous results as the data to be analysed in a pull-based fashion is extremely large due to the continuous dimension of the data streams.
The DAHCC Dataset will be used as the data stored for each patient in a solid pod to realise the aggregators.

Desired Solution

A first proof of concept streaming aggregator which runs as a service, with whom a client application can interact. The client application can specify the query that needs to be continuously evaluated on one or multiple data streams stored in Solid Pods. The solution is required to,

  • Execute the queries as requested by the client application over a specific time-based window (to be specified by the client application).
  • Is able to do aggregate data resulting from streams stored on a single pod as well as over multiple pods.
  • Is able to compute time-based tumbling and sliding windows over the data streams.
  • Store the result of the continuous queries in the aggregator, so that the client can execute a GET request to access the aggregated data summarization.

Use Case

The dataset has sensor values from multiple patients. To monitor the patient's location, we use the sensors which detects the presence of the person in the house. The person detection sensor is employed in the 3 halls, kitchen and the bedroom in the DAHCC dataset. We will aggregate each patient's location in a particular window, as well as the location of all the patients. This allows to compute a summary of the activity of each patient, which is a useful insight for healthcare providers.

Acceptance Criteria

A demo resulting from the solution should be able to,

  • Accept continuous queries for streams resulting form a single or multiple pods
  • The client should be able to get the results of the queries through a GET operation.
  • Show that the results are complete
  • Show the speed up compared to a client applications that does not use the aggregation service.

Assumptions

  • Long term server-side authenticated sessions has been resolved.
  • The registered queries are SPARQL Select queries (or RSPQL queries if we want to define the streaming operator inside the query)
  • This is a first prototype that does not need to be fully optimised
  • LDES will be used to store the streams on Solid pods, and the LDES client will be used to continuously retrieve the latest changes to the LDES.

Compared to (#24), we focus on the streamming and windowing aspect for aggregation of data.

Scenarios

This is part of a larger scenario

@s-minoo
Copy link

s-minoo commented Oct 22, 2022

A few papers that I came across might be relevant to sliding windows aggregations:

  1. Cutty(2016)
  2. Scotty(2018)
  3. General stream slicing

The author, Jonas Traub, developed the aggregate stream slicing in that specific order

@pheyvaer pheyvaer added the ongoing The challenge is actively being tackled. label Oct 25, 2022
@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

Aggregation of the data streams generated from a single LDES in LDP solid pod works. Working towards aggregation for multiple different pods.

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

Currently defining an ontology and using websockets to publish the results from the aggregation.

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

A repository for the demo of the aggregator is avaible at https://github.com/argahsuknesib/ssa-demo/

@argahsuknesib argahsuknesib added completion: pending ❓ and removed ongoing The challenge is actively being tackled. labels Apr 25, 2023
@pheyvaer
Copy link
Contributor

pheyvaer commented May 4, 2023

Not all acceptance criteria are met at the moment.

@pheyvaer pheyvaer added ongoing The challenge is actively being tackled. and removed completion: pending ❓ labels May 4, 2023
@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

Currently, the testing of the aggregator system is not done. In the next step, testing / benchmarking will be done. The challenge will be closed, with the results from those benchmarks.

@github-actions
Copy link

github-actions bot commented Jun 9, 2023

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

Currently preparing a testplan / test environment for the aggregator's testing.

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@argahsuknesib
Copy link

Preparing the test setup and writing scripts for testing. The evaluation repository (which will be updated, later with finished results) is available here https://github.com/argahsuknesib/solid-stream-aggregator-evaluation

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case ongoing The challenge is actively being tackled. proposal: approved ✅ update-required
Projects
None yet
Development

No branches or pull requests

5 participants