Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stored procedure #3

Open
wants to merge 19 commits into
base: streamline_api_endpoint
Choose a base branch
from
Open

Stored procedure #3

wants to merge 19 commits into from

Conversation

chr4
Copy link
Member

@chr4 chr4 commented Jun 11, 2018

Use a stored procedure to generate asset_stats on the fly from the effects table. I'm not sure whether this is actually good or bad for performance (less writes, but probably a bit more calculations).
My guess is, that CPU is way less of a bottleneck than IO, therefore I'd go with this version.

This pull-request also adds a change (which we probably should backport in case we decide against this) to not include transfers and account stats from the asset issuer account. I'd argue that the issuer should be ignored when counting payments and involved accounts of customers.

@chr4 chr4 requested a review from fabrik42 June 11, 2018 21:59
@chr4 chr4 force-pushed the stored_procedure branch 2 times, most recently from 76ca1e1 to 3da6af9 Compare June 11, 2018 23:06
@fabrik42
Copy link
Member

Hmmm I don't think this is really beneficial to performance. It basically rebuilds a query with psql, including some N+1 sub-queries.

I read the file name of the migration and my question would be: Why not use a real materialized view for this? Next to a real table, this is probably the most performant solution.

@chr4
Copy link
Member Author

chr4 commented Jun 12, 2018

I've considered materialized views initially, but the problem here is that we do not need view, but something that needs to loop over rows.
I'm not sure (and did no benchmark) whether a procedure is faster than a table (but it would at least safe some writes).

@chr4 chr4 closed this Jun 12, 2018
@chr4 chr4 changed the base branch from num_transactions to master June 13, 2018 08:18
@chr4 chr4 reopened this Jun 13, 2018
@chr4 chr4 changed the base branch from master to streamline_api_endpoint June 13, 2018 18:13
@chr4
Copy link
Member Author

chr4 commented Jun 13, 2018

I've rebased this on the streamline_api_endpoint branch. It cancels out some of the problems with WHERE clauses, but I'm not sure if there's a way to use a materialized view instead of the procedure because of the LOOP statement.

@chr4 chr4 added the question Further information is requested label Jun 13, 2018
@fabrik42
Copy link
Member

The problem with this solution is that it will always be a sequential scan on the effects table that will also cause 4 additional queries per row in the table. So this is not a scalable/performant solution.

I think the best solution in terms of performance will be to persist every point of the accumulation. And every time you add another asset_stat you can get the last one and add the values of it.

@chr4
Copy link
Member Author

chr4 commented Jun 16, 2018

The sub-selects are currently run on every INSERT as well (https://github.com/cndy-store/analytics/blob/master/models/asset_stat/asset_stat.go#L36-L44) - so instead of

INSERT INTO asset_stats(paging_token, asset_code, asset_issuer, asset_type, created_at, issued, transferred, accounts_with_trustline, accounts_with_payments, payments)
		                   VALUES ($1, $2, $3, $4, $5,
		                       (SELECT COALESCE(SUM(amount), 0) FROM effects WHERE type='account_debited' AND account=$6),
		                       (SELECT COALESCE(SUM(amount), 0) FROM effects WHERE type='account_debited' AND account!=$6),
		                       (SELECT COUNT(DISTINCT account) FROM effects WHERE type='trustline_created' AND account!=$6),
		                       (SELECT COUNT(DISTINCT account) FROM effects WHERE type='account_debited' AND account!=$6),
		                       (SELECT COUNT(*) FROM effects WHERE type='account_debited' AND account!=$6)
		                   )

It would be better to pass the new Effect into assetStat.New() and then just take the latest row and add the effect values? This wouldn't solve the DISCTINCT clauses though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants