28
loading...
This website collects cookies to deliver better user experience
def merge_sorted_lists(l1, l2):
sorted_list = []
while (l1 and l2):
if (l1[0] <= l2[0]): # Compare both heads
item = l1.pop(0) # Pop from the head
sorted_list.append(item)
else:
item = l2.pop(0)
sorted_list.append(item)
# Add the remaining of the lists
sorted_list.extend(l1 if l1 else l2)
return sorted_list
heapq.merge
that does this. It takes advantage of the fact that our lists are already sorted, so we can get a new sorted list linear time rather than the n*log(n)
time it would take for combining and sorting two unsorted lists.Long story short, unless len(l1 + l2)
>= 1,000,000 use sort
list.sort
is the original implementation of a hybrid sorting algorithm called TimSort, named after its author, Tim Peters.[Here is] stable, natural merge sort, modestly called
Timsort (hey, I earned it ). It has supernatural performance on many
kinds of partially ordered arrays (less than lg(N!) comparisons needed, and
as few as N-1), yet as fast as Python's previous highly tuned sample sort
hybrid on random arrays.
The main routine marches over the array once, left to right,
alternately identifying the next run, then merging it into the previous
runs "intelligently". Everything else is complication for speed, and some
hard-won measure of memory efficiency.
(x + y).sort()
can be surprisingly fast: once it finds the sequential runs of numbers, it functions like our merge algorithm: combining the two sorted lists in linear time.heapq.merge
knows where the runs are ahead of time. Timsort overcomes this disadvantage by being written in C rather than Python. Or as ShawdowRanger on Stack Overflow explains it:CPython's list.sort
is implemented in C (avoiding interpreter overhead), while heapq.merge
is mostly implemented in Python, and optimizes for the "many iterables" case in a way that slows the "two iterables" case.
//New List
PyObject* mergedList = PyList_New( n1 + n2 );
for( i = 0;; ) {
elem1 = PyList_GetItem( listObj1, i1 );
elem2 = PyList_GetItem( listObj2, i2 );
result = PyObject_RichCompareBool(v, w, Py_LT);
switch( result ) {
// List1 has smallest, Pop from list 1
case 1:
PyList_SetItem( mergedList, i++, elem1 );
i1++;
break;
case 0:
// List2 has smallest, Pop from list 2
PyList_SetItem( mergedList, i++, elem2 );
i2++;
break;
}
if( i2 >= n2 || i1 >= n1 )) {
//One list is empty, add remainder of other list to result
...
break;
}
}
return mergedList;
import merge
and use my new merge method:import merge
# create some sorted lists
a = list(range(-100, 1700))
b = list(range(1400, 1800))
# merge them
merge.merge(a, b)
import merge
import timeit
a = list(range(-100, 1700))
b = [0.1] + list(range(1400, 1800))
def merge_test():
m1 = merge.merge(a, b)
def sort_test():
m2 = a + b
m2.sort()
sort_time = timeit.timeit("sort_test()", setup="from __main__ import sort_test", number=100000)
merge_time = timeit.timeit("merge_test()", setup="from __main__ import merge_test",number=100000)
print(f'timsort took {sort_time} seconds')
print(f'merge took {merge_time} seconds')
timsort took 3.9523325259999997 seconds
merge took 3.0547665259999994 seconds
sort
is beating us for small lists and even on big lists our performance improvement is thin at best://Default comparison
PyObject* merge( PyObject*, PyObject* );
//Compare assuming ints
PyObject* merge_int( PyObject*, PyObject* );
//Compare assuming floats
PyObject* merge_float( PyObject*, PyObject* );
//Compare assuming latin
PyObject* merge_latin( PyObject*, PyObject* )
merge
beats Timsort for heterogeneous lists, and the specialized versions are there for when you have uniform types in your list, and you need to go fast.Practically, you might not want to use pop, but just track an index of where the head of the stack should be, like the C code shown later. ↩
It was easier because my teammate Alex has experience writing C extensions for Python, so by the time I had found the Python header files, Alex had already put together a prototype solution. ↩