788 words

4 minutes

Advanced Generator Pattern: Consuming and Testing Data Streams

2024-12-08

Programming

Concurrency

Advanced Generator Pattern in Go: testing, error handling, and real-world data generation techniques for robust applications.

Advanced Generator Pattern: Consuming and Testing Streams#

Difficulty Level
Advanced

Introduction#

Expanding on our previous discussions of the Generator pattern, we’ll explore two advanced applications: consuming large datasets lazily and simulating data streams for testing. These techniques are crucial for efficient data processing and robust application testing.

When to Use#

Processing large datasets that don’t fit in memory
Simulating data sources for testing
Implementing ETL (Extract, Transform, Load) processes
Creating reproducible test scenarios for data processing pipelines

Why to Use#

Memory Efficiency: Process large datasets without loading everything into memory
Testability: Create controlled environments for testing data processing logic
Flexibility: Easily switch between real and simulated data sources
Reproducibility: Generate consistent test cases for data processing scenarios

How it Works#

Create generator functions that yield data items one at a time
Use channels to stream data from the source to the consumer
Implement lazy loading for large datasets
Create mock data generators for testing scenarios

Example 1: Lazy Loading of Large Datasets#

1
type DataItem struct {
2
    ID   int
3
    Data string
4
}
5

6
// lazyDataLoader simulates loading a large dataset lazily
7
func lazyDataLoader(filePath string) <-chan DataItem {
8
    out := make(chan DataItem)
9
    go func() {
10
        defer close(out)
11
        // Simulate opening a large file
12
        fmt.Printf("Opening file: %s\n", filePath)
13

14
        // Simulate reading the file line by line
15
        for i := 0; i < 1000000; i++ {
16
            // Simulate processing delay for each item
17
            time.Sleep(1 * time.Millisecond)
18
            out <- DataItem{
19
                ID:   i + 1,
20
                Data: fmt.Sprintf("Data from line %d", i+1),
21
            }
22
            if i%100000 == 0 {
23
                fmt.Printf("Processed %d items\n", i)
24
            }
25
        }
26
    }()
27
    return out
28
}
29

30
func processData(data <-chan DataItem) {
31
    for item := range data {
32
        // Simulate data processing
33
        processedData := fmt.Sprintf("Processed: %s (ID: %d)", item.Data, item.ID)
34
        fmt.Println(processedData)
35
    }
36
}
37

38
func main() {
39
    dataStream := lazyDataLoader("large_dataset.txt")
40
    processData(dataStream)
41
}

This example demonstrates lazy loading of a large dataset, processing items one at a time without loading the entire dataset into memory.

Example 2: Simulating Data Streams for Testing#

1
type DataItem struct {
2
    ID   int
3
    Data string
4
}
5

6
// mockDataStream simulates a data source (e.g., a file, queue, or network stream)
7
func mockDataStream(count int) <-chan DataItem {
8
    out := make(chan DataItem)
9
    go func() {
10
       defer close(out)
11
       for i := 0; i < count; i++ {
12
          // Simulate reading from a data source
13
          time.Sleep(100 * time.Millisecond)
14
          out <- DataItem{
15
             ID:   i + 1,
16
             Data: fmt.Sprintf("Data-%d", i+1),
17
          }
18
       }
19
    }()
20
    return out
21
}
22

23
// dataGenerator consumes the mock stream and yields processed data
24
func dataGenerator(stream <-chan DataItem) <-chan string {
25
    out := make(chan string)
26
    go func() {
27
       defer close(out)
28
       for item := range stream {
29
          // Process the data item
30
          processedData := fmt.Sprintf("Processed: %s (ID: %d)", item.Data, item.ID)
31
          out <- processedData
32
       }
33
    }()
34
    return out
35
}
36

37
type StreamGenerator struct{}
38

39
func (g StreamGenerator) Execute() {
40
    // Create a mock data stream
41
    dataStream := mockDataStream(10)
42

43
    // Create a generator to process the stream
44
    processedDataGen := dataGenerator(dataStream)
45

46
    // Consume and print the processed data
47
    for data := range processedDataGen {
48
       fmt.Println(data)
49
    }
50
}

This example demonstrates a more structured approach to using the Generator pattern for testing data processing pipelines:

mockDataStream simulates a data source by generating items with controlled timing
dataGenerator shows how to process a stream of data items and transform them
The StreamGenerator type provides a clean interface for executing the pipeline and can be replaced with real data sources in production using DI (Dependency Injection)
Each stage of the pipeline is clearly separated and testable

Best Practices and Pitfalls#

Best Practices:

Use buffered channels for improved performance when processing large streams
Implement timeout mechanisms for long-running operations
Use the context package for cancellation in long-running generators
Create configurable mock generators for diverse test scenarios

Pitfalls:

Not handling errors or edge cases in data generation
Overlooking resource cleanup in generators (e.g., closing file handles)
Creating overly complex mock generators that don’t reflect real-world scenarios
Ignoring performance implications in lazy loading implementations

Summary#

The Generator pattern proves invaluable for both consuming large datasets efficiently and creating robust test environments for data processing logic. By leveraging Go’s concurrency features, we can create flexible, memory-efficient, and testable data processing pipelines that can handle real-world scenarios and simulated test cases alike.

Disclaimer#

While these examples demonstrate the power of the Generator pattern for data processing and testing, real-world implementations may require additional error handling, resource management, and optimizations. Always consider the specific requirements and constraints of your application when applying these patterns.

For more advanced concurrency patterns and best practices in Go, stay tuned for future articles! 🚀

If you want to experiment with the code examples, you can find them on my GitHub repository.

Educational Go Patterns by Corentin Giaufer Saubert is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
The code examples are licensed under the MIT License. The banner image has been created by (DALL·E) and is licensed under the same license as the article and other graphics.

Advanced Generator Pattern: Consuming and Testing Data Streams

https://corentings.dev/blog/advanced-real-world-generator/

Author

Corentin Giaufer Saubert

Published at

2024-12-08

License

CC BY-NC-SA 4.0

Share this post

Context and Cancellation in Go: A Practical Guide

Learn how Go context cancellation works in real systems: timeouts, request lifecycles, errgroup fan-out, and graceful shutdown without leaked goroutines.

2026-06-17

GoConcurrency

Go Pipeline Pattern: Turning Streams into Useful Data

Learn the Pipeline Pattern in Go using goroutines and channels. Build composable stages for parsing, filtering, enriching, and processing log streams.

2026-04-24

GoConcurrency

Flexible Approaches to Worker Pools in Go

Explore flexible approaches to the Worker Pool pattern in Go, including the Shared Semaphore method and third-party libraries. Learn when to use each approach for optimal concurrency management in your Go projects.

2024-12-12

GoConcurrency

Mastering the Worker Pool Pattern in Go

Advanced Generator Pattern in Go: Test Data Generation

Corentin GS's Blog

Advanced Generator Pattern: Consuming and Testing Data Streams

Advanced Generator Pattern: Consuming and Testing Streams#

Introduction#

When to Use#

Why to Use#

How it Works#

Example 1: Lazy Loading of Large Datasets#

Example 2: Simulating Data Streams for Testing#

Best Practices and Pitfalls#

Summary#

Disclaimer#

Related Posts

Context and Cancellation in Go: A Practical Guide

Go Pipeline Pattern: Turning Streams into Useful Data

Flexible Approaches to Worker Pools in Go

Corentin GS's Blog

Advanced Generator Pattern: Consuming and Testing Data Streams

Advanced Generator Pattern: Consuming and Testing Streams#

Introduction#

When to Use#

Why to Use#

How it Works#

Example 1: Lazy Loading of Large Datasets#

Example 2: Simulating Data Streams for Testing#

Best Practices and Pitfalls#

Summary#

Disclaimer#

More on Programming

Related Posts

Context and Cancellation in Go: A Practical Guide

Go Pipeline Pattern: Turning Streams into Useful Data

Flexible Approaches to Worker Pools in Go