data_quality.png

Set expectations for your data, implement data quality checks and data monitoring. Send alerts by email or Slack when anomalies are detected and implement your data observability.

Data monitoring

Write a custom query that performs a check on a dataset, e.g. simply a count of the number of records:

SELECT COUNT(id) AS my_count FROM some_table

Now add an app that fetches the result of this query, and that sends out an alert in case the result is incorrect:

dbconn = pq.dbconnect('dw_123')
data = dbconn.fetch('dw_123', 'schema_name', 'table_name')

count = data[0]["my_count"]
if count[0]<10000:
    slack = pq.connect("Slack") # use your name of the connection
    slack.add("message", channel = "QA", text = "Data quality alert", username = "My bot")

Finally, add a schedule to your monitoring app.

Of course you can implement more advanced quality checks, based on the presense of recent timestamps, performing joins, applying a regular expression to records, a WHERE clause to select outliers etc.

Click here for more info on the Slack connector.

Watch a 2 minute demo on how to implement the above script:

https://youtu.be/GbroYIxiRg4