Skip to content

Conversation

JasonBarnabe
Copy link
Contributor

@JasonBarnabe JasonBarnabe commented Mar 29, 2018

Given n rows processed in m batches, currently 3n statements are sent to the DB for updates: BEGIN, UPDATE, COMMIT. If the updates were wrapped in a transaction, then it would only send n + 2m updates.

On a local postgres table with 40000 rows, batch size 1000, anonymizing a single email field.

Before changes: 2m 52s
With transactions: 2m 26s (15% faster)

I'm not sure if this would have undesired effects for others, so maybe this should be configurable?

@coveralls
Copy link

coveralls commented Mar 29, 2018

Coverage Status

Coverage decreased (-2.3%) to 91.541% when pulling d4f1c30 on kickbooster:transactions into db4f509 on sunitparekh:master.

@JasonBarnabe
Copy link
Contributor Author

Note that #57 avoids transactions altogether which brings the number of statements to n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants