When you have a lot of records to sync from a source to a destination, you want to avoid processing all these records on each run of your scheduled script. Instead you only want to process new & updated records.
You can process records incrementally by filtering on a timestamp “last update”. By storing the highest timestamp that was already processed, you implement a stateful script that will only process new and updated records on each run.
Here’s an example script using a timestamp stored in the state of the script:
**previous_timestamp = pq.get_state()**
if previous_timestamp=="":
previous_timestamp = "1970-01-01 00:00:00"
dbconn = pq.dbconnect('dw_123')
my_query = "SELECT * FROM orders WHERE timestamp_last_update > %s ORDER BY timestamp_last_update"
% previous_timestamp
source_records_to_process = dbconn.fetch('dw_123', query = my_query)
for source_record in source_records_to_process:
odoo.upsert("order", reference = source_record["code"], ...)
highest_timestamp_processed = source_record["timestamplast_update"])
**pq.set_state(highest_timestamp_processed)**
This pattern is more advanced than the pattern using a timestamp because when using a timestamp a record will only be processed once and if writeback fails for that specific record, it will not be picked up again in the future.
An alternative pattern is to compare the data from source & destination using a JOIN query in Peliqan first, and using the result of that JOIN query as input for the sync script to process records.
Example JOIN query to compare data from a source and a destination:
SELECT
source_order.id,
source_order.reference,
source_order.price,
source_order.item_count,
destination_order.id,
destination_order.code,
destination_order.total_price,
destination_order.number_of_items
FROM source_order
LEFT JOIN destination_order
ON source_order.reference = destination_order.code # fields to match on
WHERE
destination_order.id IS NULL OR # include records that do not exist in the destination
(source_order.price <> destination_order.total_price AND
source_order.item_count <> destination_order.number_of_items) # include only source records that are different from the destination
The result of this JOIN will only include records that need to be processed, these can now be picked up by a sync script.
A script is stateful when it keeps track of its state across runs.
# Get state from previous run
previous_state = pq.get_state()
# Save state for the next run
pq.set_state(current_state)
The state itself that you set (save) can be a string, number or a dict: