Python Data Structures —List Comprehensions, Generator Expressions, Named Tuples, Arrays, Memory Views, Deque, Ordering with bisect — Part-1

Ibrahim Halil Koyuncu
8 min readAug 6, 2023

--

Hi, in this article which is related with Python — Data Structures, I am going to explain below subjects.

  • List Comprehensions and Readability
  • Generator Expressions vs List Comprehensions
  • Tuples as Records and Named Tuples
  • Arrays
  • Memory Views
  • Deque
  • Managing Ordered Sequences With bisect

List Comprehensions and Readability

A quick way to build a list is using a list comprehension. It provides a compact syntax to generate lists by applying an expression to each item in an iterable (like a list, tuple, or string) and optionally filter the elements based on a condition. Lets go with example in below.

squares_of_evens = [x**2 for x in range(10) if x % 2 == 0]
print(squares_of_evens) # Output: [0, 4, 16, 36, 64]

In this example, used a list comprehension to create a list of squares for even numbers from 0 to 9.

Generator Expressions

Generator expressions are a memory-efficient and lazy way to create iterators in Python. They are similar to list comprehensions but use parentheses “()" instead of square brackets “[]". The main difference is that generator expressions produce values on-the-fly as you iterate over them, rather than creating a complete list in memory all at once(Under the hood it uses yield). Lets deep dive into it.

# Using a generator expression to calculate the squares of even numbers from 0 to 9
squares_of_evens_gen = (x**2 for x in range(10) if x % 2 == 0)

# Iterating over the generator to get the values one at a time
for square in squares_of_evens_gen:
print(square) # Output: 0, 4, 16, 36, 64

When we run this example in debug mode, you should be aware of that type of “squares_of_evens_gen” variable is generator and is not list so actually we did not create a list. Take a look at below screenshot.

Debugging Generator Example

Tuple as Records and Named Tuples

Some definitions introduce tuples as “immutable list” only. Tuples do double duty: they can be used as immutable lists and also as records with no field names.

If you think of a tuple just as an immutable list, the quantity and the order of the items may not be important, but when using a tuple as a collection of fields, the number of items is often fixed and their order is always vital.

#Data about Darmstadt: name, year, population, population change and area 
city, year, pop, chg, area = ('Darmstadt', 2003, 32450, 0.66, 8014)

Named Tuples

Named tuples are a subclass of tuples that have named fields, making them more self-descriptive and convenient to work with. They combine the benefits of tuples (immutable, memory-efficient) with the advantages of named fields, providing a clear and readable structure for representing simple data structures. Lets go with an example.

import collections

# Creating a named tuple type to represent a Point with x and y coordinates
Point = collections.namedtuple('Point', ['x', 'y'])

# Creating an instance of the named tuple
p1 = Point(x=1, y=2)

# Accessing attributes using dot notation
print(f"x: {p1.x}, y: {p1.y}")
# Output: x: 1, y: 2

# Accessing attributes using index-based access
print(f"x: {p1[0]}, y: {p1[1]}")
# Output: x: 1, y: 2

# Get the field names using _fields
field_names = Point._fields
print(field_names)
# Output: ('x', 'y')

# Using _make() to create a new instance
data = [1, 2]
p1 = Point._make(data)

# Accessing attributes using dot notation
print(f"x: {p1.x}, y: {p1.y}")
# Output: x: 1, y: 2

In this example, we define a named tuple type called Point with two fields: x and y. We create an instance of the named tuple with specific values for x and y, and we can access the attributes using both dot notation and index-based access. The named tuple provides a more expressive and readable representation of the data, enhancing code clarity and maintainability. You can also use “_fields” parameter to access field names in Tuple and use “_make” method to instantiate a named tuple from iterable.

Arrays

The “array.array” module provides an alternative to the built-in list type, allowing you to create arrays that are more memory-efficient and optimized for handling numeric data.

The “array.array” class is designed to work with homogeneous data, meaning all elements in the array must be of the same type such as integers, floats and supports all mutable sequence operations(including pop, insert and extend) and additional methods for fast loading and saving data such as “.frombytes” and “.tofile”. Lets take a look to example in below.

from array import array
from random import random
import timeit

# Function to measure the write operation performance
def measure_write_performance():
floats = array('d', (random() for i in range(10 ** 7)))
start_time = timeit.default_timer()

with open('floats.bin', 'wb') as fp:
floats.tofile(fp)

end_time = timeit.default_timer()
write_time = end_time - start_time
print(f"Write Operation Time: {write_time:.6f} seconds")

# Function to measure the read operation performance
def measure_read_performance():
floats_2 = array('d')
start_time = timeit.default_timer()

with open('floats.bin', 'rb') as fp:
floats_2.fromfile(fp, 10 ** 7)

end_time = timeit.default_timer()
read_time = end_time - start_time
print(f"Read Operation Time: {read_time:.6f} seconds")

# Measure the performance of write operation
measure_write_performance()

# Measure the performance of read operation
measure_read_performance()

As you can see, array.tofile and array.fromfile are easy to use. If you try the example, you will notice they are also very fast.

Performance Output

Memory Views

Memory views provide a way to access the memory of an object directly without copying its data. They allow you to interact with objects, such as arrays or buffers, at a lower level, offering potential performance improvements and memory efficiency.

Benefits of using memory views in Python:

  1. Efficient memory access: Memory views allow you to access the underlying memory of an object directly, which can lead to improved performance since it avoids unnecessary data copying.
  2. Avoids memory duplication: When working with large datasets, memory views help prevent duplicating data, which is common with slices or array operations in some cases.
  3. Lower memory consumption: Memory views reduce memory consumption since they do not create additional copies of the data.
  4. Faster data manipulation: Memory views enable fast and efficient manipulation of data at a low level, which can be advantageous for numerical computations and data processing.

Lets continue with an example.

# Create a bytearray object
data = bytearray(b'Hello, world!')

# Create a memory view of the bytearray
view = memoryview(data)

# Modify the data using the memory view
view[0] = ord('J')
view[7:13] = b'Python'

# The original bytearray is modified through the memory view
print(data) # Output: bytearray(b'Jello, Python!')

In this example, we create a bytearray object named data. We then create a memory view named view of the data bytearray. Using the memory view, we can directly modify the underlying memory, which results in changing the original bytearray. Memory views offer a convenient and efficient way to work with data, especially when dealing with large datasets or when performance optimization is required.

Deque

collections.deque is a specialized data structure provided by the collections module. It stands for "double-ended queue" and is optimized for efficient insertions and deletions from both ends. It combines the features of a stack (Last-In-First-Out) and a queue (First-In-First-Out) in a single data structure.

  1. Efficient operations: deque allows fast insertions and deletions from both ends of the queue, providing constant-time complexity O(1) for these operations. This makes it suitable for scenarios that require efficient enqueue and dequeue operations.
  2. Constant-time lookup: Accessing elements from either end of the deque is also very fast, providing O(1) time complexity for indexing.
  3. Memory efficiency: deque is implemented as a doubly-linked list, which allows efficient memory management and usage. It can grow or shrink dynamically based on the number of elements without requiring continuous memory allocation.
  4. Thread-safe: Unlike Python lists, deque supports thread-safe operations, making it a better choice for multithreaded applications without needing external synchronization.
  5. Versatility: deque can be used as a queue, a stack, or a double-ended queue, depending on how you use its methods. This flexibility makes it a versatile data structure for various use cases.
from collections import deque

# Creating a deque and adding elements
my_deque = deque()
my_deque.append(1) # Add element to the right end
my_deque.appendleft(2) # Add element to the left end
my_deque.extend([3, 4]) # Add multiple elements to the right end
my_deque.extendleft([5, 6]) # Add multiple elements to the left end

print(my_deque) # Output: deque([6, 5, 2, 1, 3, 4])

# Removing elements
my_deque.pop() # Remove and return an element from the right end
my_deque.popleft() # Remove and return an element from the left end

print(my_deque) # Output: deque([5, 2, 1, 3])

# Accessing elements
print(my_deque[0]) # Output: 5
print(my_deque[-1]) # Output: 3

In this example, we create a deque and demonstrate its features by adding elements, removing elements, and accessing elements from both ends.

Managing Ordered Sequences with bisect

The bisect module provides a binary search algorithm, which allows for efficient insertion and lookup in sorted sequences. It is especially useful for maintaining sorted data and performing fast searches on that data. The module includes two main functions: bisect_left() and bisect_right and along with other utility functions like insort_left() and insort_right().

  • bisect_left(data, item): This function returns the insertion point for the element item in the sorted list data such that all elements to the left of the insertion point are less than item. If item is already present in the list, it returns the leftmost insertion point.
  • bisect_right(data,item): This function is similar to bisect_left, but it returns the rightmost insertion point.

Benefits of using bisect in Python:

  1. Efficient binary search: bisect module provides an efficient implementation of the binary search algorithm, which allows for faster searching in sorted sequences compared to linear search.
  2. Insertion in a sorted list: You can use bisect to efficiently insert elements into a sorted list while maintaining its sorted order.
  3. Finding insertion points: bisect can be used to determine where an element should be inserted into a sorted list to keep it sorted.
import bisect

data = [1, 3, 5, 7, 9]

# Using bisect to find the insertion point of a value in a sorted list
insertion_point = bisect.bisect_left(data, 6)
print("Insertion point for 6:", insertion_point) # Output: 3

# Using bisect to insert a value into a sorted list while maintaining the order
bisect.insort_left(data, 6)
print("Updated list:", data) # Output: [1, 3, 5, 6, 7, 9]

In this example, we use bisect_left to find the insertion point of the value 6 in the sorted list data. We then use insort_left to insert 6 into the list while keeping the list sorted.

--

--

Ibrahim Halil Koyuncu
Ibrahim Halil Koyuncu

No responses yet