Background: It’s a fresh new year, and citizens of Digitown are in an uproar. Their water and electricity bills have inexplicably doubled, despite no changes in their consumption.
Mission: work your magic with this data and query, to inspect the telemetry data responsible for billing, unravel any hidden errors, and set things right.
For all the marbles: What is the total bills amount due in April?
There’s something fishy with the billing. Let’s inspect the April records to get to the bottom of this.
.execute database script <|
// The script takes ~20seconds to complete ingesting all the data.
.set-or-replace Costs <|
datatable(MeterType:string, Unit:string, Cost:double) [
'Water', 'Liter', 0.001562,
'Electricity', 'kwH', 0.3016]
.create-merge table Consumption (Timestamp:datetime , HouseholdId:string, MeterType:string, Consumed:double)
//clear any previously ingested data if such exists
.clear table Consumption data
.ingest async into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00000.csv.gz')
.ingest async into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00001.csv.gz')
.ingest into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00002.csv.gz')
We put an assumption that every day, each household receives a water and electricity bill. Let’s fish for anomalies from this
Consumption
| summarize count() by HouseholdId
| summarize count() by count_
| render barchart with (xtitle="Number of household", ytitle="Number of records")
Anything other than 60 seems to be an anomaly. Let’s inspect further.
First thing we notice, negative values
Consumption
| summarize arg_min(Consumed,*) by HouseholdId
| order by Consumed asc
These are only 3 houses though.
Let’s look for duplicate records(factoring in the negative values - let’s look for absolute duplicate records).
Consumption
| summarize dcount(abs(Consumed)) by HouseholdId
| join kind=inner(Consumption
| summarize count() by HouseholdId)
on HouseholdId
| where dcount_Consumed != count_
More than 9000 households!
For sanity check, let’s confirm that this is the diff.
That solves it!
Now let’s calculate the total bill.
Consumption
| distinct Timestamp, HouseholdId, abs(Consumed), MeterType
| summarize sum(Consumed) by MeterType
| join kind=inner(Costs) on MeterType
| extend Total= Cost * sum_Consumed
| summarize sum(Total)