22
loading...
This website collects cookies to deliver better user experience
[
{
"category_id": 23,
"subcategory_id": 1,
"post_id": 445,
"value": "A test value"
},
{...}
]
value
field with the value provided in the request. Otherwise, I need to create a new record with this data. If any of those three fields did not match, then it would be considered a separate record (such as the case where subcategory_id=2
). Demo.objects.filter(category_id=23, subcategory_id=1, post_id=445).exists()
in_bulk
method doesn't provide a way to filter by several field names like we would need in this case. It's convenient when we have a single list of values, but falls short in more complex cases. Instead, I had to resort to making linear checks through filtering to see if the record existed:records = [
{
"id": Demo.objects.filter(
category_id=record.get("category_id"),
subcategory_id=record.get("subcategory_id"),
post_id=record.get("post_id")
)
.first()
.id
if Demo.objects.filter(
category_id=record.get("category_id"),
subcategory_id=record.get("subcategory_id"),
post_id=record.get("post_id")
).first()
is not None
else None,
**record,
}
for record in records
]
id
key which will hold the primary key of the record that exists given the values, or a value of None
if the record can't be found in the database. We're using the filter statement from above to make this comparison. We're then unpacking the data from the request alongside the id
key. id
key for every record. Django doesn't offer a bulk upsert or update_or_create
method, so we have to run bulk_create and bulk_update separately. We'll do this by splitting our new records list into two separate lists which hold the records to update and the records to create.id
field is None
, then we append that record to the list for creation. Otherwise, that record is appended to the list for updates. id
field for records that need to be created since it holds a value of None
.records_to_update = []
records_to_create = []
[
records_to_update.append(record)
if record["id"] is not None
else records_to_create.append(record)
for record in records
]
[record.pop("id") for record in records_to_create]
bulk_create
and bulk_update
methods. By default, Django will attempt to write a single statement for all of the records that you provide it. To limit the length of this statement, we want to specify the batch_size
parameter to instruct the method to handle these bulk inserts at a rate of 1000 records per statement.created_records = Demo.objects.bulk_create(
[Demo(**values) for values in records_to_create], batch_size=1000
)
# Unless you're using Django 4.0, bulk_update doesn't return a value
Demo.objects.bulk_update(
[
Demo(id=values.get("id"), value=values.get("value"))
for values in records_to_update
],
["value"],
batch_size=1000
)
bulk_update
, Django expects that you specify the list of fields to update as the second argument. Because we are only updating one field in this example, I explicitly set these fields within the model, whereas we are unpacking the full dictionary within the bulk_create
statement.from rest_framework import generics, status
from rest_framework.response import Response
from api.models import Demo
from api.serializers import DemoSerializer
class DemoViewset(generics.ListCreateAPIView):
def post(self, request):
records = request.data.get("data")
# Let's define two lists:
# - one to hold the values that we want to insert,
# - and one to hold the new values alongside existing primary keys to update
records_to_create = []
records_to_update = []
# This is where we check if the records are pre-existing,
# and add primary keys to the objects if they do
records = [
{
"id": Demo.objects.filter(
category_id=record.get("category_id"),
subcategory_id=record.get("subcategory_id"),
post_id=record.get("post_id")
)
.first()
.id
if Demo.objects.filter(
category_id=record.get("category_id"),
subcategory_id=record.get("subcategory_id"),
post_id=record.get("post_id")
).first()
is not None
else None,
**record,
}
for record in records
]
# This is where we delegate our records to our split lists:
# - if the record already exists in the DB (the 'id' primary key), add it to the update list.
# - Otherwise, add it to the create list.
[
records_to_update.append(record)
if record["id"] is not None
else records_to_create.append(record)
for record in records
]
# Remove the 'id' field, as these will all hold a value of None,
# since these records do not already exist in the DB
[record.pop("id") for record in records_to_create]
created_records = Demo.objects.bulk_create(
[Demo(**values) for values in records_to_create], batch_size=1000
)
# Unless you're using Django 4.0, bulk_update doesn't return a value
Demo.objects.bulk_update(
[
Demo(id=values.get("id"), value=values.get("value"))
for values in records_to_update
],
["value"],
batch_size=1000
)
# We may want to return different statuses and content based on the type of operations we ended up doing.
message = None
if len(records_to_update) > 0 and len(records_to_create) > 0:
http_status = status.HTTP_200_OK
elif len(records_to_update) > 0 and len(records_to_create) == 0:
http_status = status.HTTP_204_NO_CONTENT
elif len(records_to_update) == 0 and len(records_to_create) > 0:
http_status = status.HTTP_201_CREATED
message = DemoSerializer(created_records, many=True).data
return Response(message, status=http_status)
SELECT
statement against my model's table for every record to check if it exists by filtering. It felt naive and inefficient. After a bit of reading, however, I found that there wasn't a good solution for this or at least one that was widely agreed upon.SELECT
statements over a large list of models), and may not be feasible if you only have to perform these operations in one view.Demo.objects.bulk_update(Demo(**values))
, where I would unpack the dict into the model and expect Django the intelligently handle conflicts. As it turns out, you can't do this, as the statement does not know a primary key. Trying to do this results in an exception:ValueError: All bulk_update() objects must have a primary key set.
bulk_create
and bulk_update
is ideal so that we can use a single statement against the database. As I mentioned earlier, Django also allows us to use the batch_size
parameter to ensure the statement doesn't become overly long, which otherwise could lead to too much CPU and memory usage when executing statements. Our goal was to minimize the number of API requests being made between the two services, and reduce the load on the database, so bulk operations were the perfect candidate.