We may have come across the need of having to create multiple records in a single operation. Now, this can be achieved through a single insert command of course, but the more records we require to persist the longer it’ll take for this entire operation to complete.
Let’s set up a basic example to go over this topic. We will be creating a list of contacts in bulk using the
database/sql library in Go.
First off, let’s create the database and table we will be using for this example in this case I will be using Postgres for that matter.
We have the structure for the contacts table, real simple. This is the entity we will be creating with a multi-insert statement.
Contacts Dummy generator
Then, we have our dummy generator which will allow us to simulate the list of contacts we would normally gather through a file.
Now, in our main file, we will assemble a connection to the database we just created, like so:
Since we are using Postgres for this exercise, we must import the
github.com/lib/pq library so we have the Postgres driver available when calling
Now that we have our database connection, let’s move onto the main function.
This very simple program sets up a database connection then calls the dummy generator to get as many contacts as the
recordsToInsert variable demands it and finally, it creates the contacts it receives from the generator in bulk. It also includes a few additional statements to give us an idea of how long it is taking to persist a specific amount of records.
Creating Contacts in Bulk
Here’s the plan. The first thing we need to do is to capture the values and placeholders for our query. The number of placeholders used in our query will depend on the columns declared in the insert statement. Then we start a transaction from our connection that we will use to invoke the insert statement to add as many contacts as we were given. Let’s see how it looks.
Something worth pointing out is that calling
txn.Exec() with arguments, causes the library to prepare the statement prior to its execution, which only validates the SQL once then executes it, and finally closes it. This in general, causes the statement to be faster in terms of performance, but there is a constraint that we need to keep in mind, and is the number of parameters specified in a prepared statement.
This works fine but we will only be able to upload
21.845 records within a single insert statement without surpassing the database constraint of
65.535 records per prepared statement. So, if we increase the
recordsToInsert variable anywhere over that number we will get an error like the following one:
pq: got 65553 parameters but PostgreSQL only supports 65535 parameters
Now, let’s declare groups of contacts or batches just to circumvent this constraint.
First we need to define how big our groups can be, and for that we just need to divide the number of columns of our table by the maximum amount of parameters allowed by the database for prepared statements. In our case
65.535 is our constraint.
Then, we just need to tweak our
contact.go file a little bit like so:
We basically just added a function to split our
contacts into groups just so we can loop over each one, gather placeholders and parameters as we were doing before, and finally prepare a transaction per group that we will commit after executing our bulk insert, and that should do the trick.
Thank you for reading through this post, I hope you find this little exercise helpful. The code for this exercise is on Github, so feel free to swing by and take a look at the code there.
If you wish to learn more about prepared statements with this package, I encourage you to read the following post about Using Prepared Statements.