Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue with fetchlog stats #10

Open
caparker opened this issue Oct 24, 2023 · 2 comments
Open

Potential issue with fetchlog stats #10

caparker opened this issue Oct 24, 2023 · 2 comments
Assignees

Comments

@caparker
Copy link
Collaborator

While checking to make sure that the latest fetcher deployments were working I noticed that the japan file seemed to be ingested twice.

                                              key                                             |         init_datetime         |        loaded_datetime        | inserted | records | file_size |            batch_uuid            
---------------------------------------------------------------------------------------------+-------------------------------+-------------------------------+----------+---------+-----------+----------------------------------
 realtime-gzipped/2023-10-24/1698167736_japan_95d5d39e-0931-407e-88a8-3f7f504a340d.ndjson.gz | 2023-10-24 17:17:12.727395+00 | 2023-10-24 17:20:03.043072+00 |     5753 |    5753 |     77709 | f6f3e28adca244ecaf8b8943839e1de3
 realtime-gzipped/2023-10-24/1698167822_japan_f7267778-3caa-440d-b4ea-4a6abf69d098.ndjson.gz | 2023-10-24 17:18:37.579177+00 | 2023-10-24 17:20:03.043072+00 |     5753 |    5753 |     77709 | f6f3e28adca244ecaf8b8943839e1de3
 realtime-gzipped/2023-10-24/1698171336_japan_be93d259-8285-4e7d-a0c3-8fd0b75a29c2.ndjson.gz | 2023-10-24 18:17:15.362648+00 | 2023-10-24 18:20:03.73997+00  |     5730 |   11483 |    115496 | 1d6e4e9c3b9c45d4a5119eb4c58e61dd
 realtime-gzipped/2023-10-24/1698171421_japan_2197373d-5847-4fef-a973-afdd657357ae.ndjson.gz | 2023-10-24 18:18:42.537477+00 | 2023-10-24 18:20:03.73997+00  |     5730 |   11483 |    115496 | 1d6e4e9c3b9c45d4a5119eb4c58e61dd
(4 rows)

Notice that we have 2 batches of hourly files and they all seem to have records that could be inserted. I would have expected that one of the files would show that 0 records were inserted.

So its possible that there is an issue of how we are recording the fetchlog stats after a file is ingested.

@caparker caparker self-assigned this Oct 24, 2023
@caparker
Copy link
Collaborator Author

Here is the issue

WITH inserted AS (
  SELECT m.fetchlogs_id
  , COUNT(m.*) as n_records
  , COUNT(t.*) as n_inserted
  , MIN(m.datetime) as fr_datetime
  , MAX(m.datetime) as lr_datetime
  , MIN(t.datetime) as fi_datetime
  , MAX(t.datetime) as li_datetime
  FROM tempfetchdata m
  LEFT JOIN temp_inserted_measurements t ON (t.sensors_id = m.sensors_id AND t.datetime = m.datetime)
  GROUP BY m.fetchlogs_id)
UPDATE fetchlogs
SET completed_datetime = CURRENT_TIMESTAMP
, inserted = COALESCE(n_inserted, 0)
, records = COALESCE(n_records, 0)
, first_recorded_datetime = fr_datetime
, last_recorded_datetime = lr_datetime
, first_inserted_datetime = fi_datetime
, last_inserted_datetime = li_datetime
FROM inserted
WHERE inserted.fetchlogs_id = fetchlogs.fetchlogs_id;

We are joining off of sensors and timestamp and not accounting for the fetchlogs_id.

I am gonig to add the fetchlogs id to the join.

@caparker
Copy link
Collaborator Author

We cant add fetchlogs_id to the join because we are currently not adding that field to the temp_inserted_measurements. We will have to make that change before we can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant