Case 01 → Something’s leaking 👃🏾

Background: It’s a fresh new year, and citizens of Digitown are in an uproar. Their water and electricity bills have inexplicably doubled, despite no changes in their consumption.

Mission: work your magic with this data and query, to inspect the telemetry data responsible for billing, unravel any hidden errors, and set things right.

For all the marbles: What is the total bills amount due in April?


There’s something fishy with the billing. Let’s inspect the April records to get to the bottom of this.

.execute database script <|
// The script takes ~20seconds to complete ingesting all the data.
.set-or-replace Costs <| 
    datatable(MeterType:string, Unit:string, Cost:double) [
     'Water', 'Liter', 0.001562, 
     'Electricity', 'kwH', 0.3016]
.create-merge table Consumption (Timestamp:datetime , HouseholdId:string, MeterType:string, Consumed:double)
//clear any previously ingested data if such exists
.clear table Consumption data
.ingest async into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00000.csv.gz')
.ingest async into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00001.csv.gz')
.ingest into table Consumption (@'https://kustodetectiveagency.blob.core.windows.net/kda2c1taxbills/log_00002.csv.gz')

We put an assumption that every day, each household receives a water and electricity bill. Let’s fish for anomalies from this

Consumption 
| summarize count() by HouseholdId
| summarize count() by count_
| render barchart with (xtitle="Number of household", ytitle="Number of records")

Anything other than 60 seems to be an anomaly. Let’s inspect further.

First thing we notice, negative values

Consumption 
| summarize arg_min(Consumed,*) by HouseholdId
| order by Consumed asc

These are only 3 houses though.

Let’s look for duplicate records(factoring in the negative values - let’s look for absolute duplicate records).

Consumption 
| summarize dcount(abs(Consumed)) by HouseholdId
| join kind=inner(Consumption
    | summarize count() by HouseholdId)
    on HouseholdId
| where dcount_Consumed != count_

More than 9000 households!

For sanity check, let’s confirm that this is the diff.

That solves it!

Now let’s calculate the total bill.

Consumption
| distinct Timestamp, HouseholdId, abs(Consumed), MeterType
| summarize sum(Consumed) by MeterType
| join kind=inner(Costs) on MeterType
| extend Total= Cost * sum_Consumed
| summarize sum(Total)

Case 00 → Comme le Chef! 😎

Case 02 → Gone Phishing 🪱

comments powered by Disqus