I used global variables for “convenience” (and created bugs that I couldn't reproduce)

Author's): Dua Asif

Originally published in Towards Artificial Intelligence.

AI GENERATED
# config.py
DATABASE_URL = "postgresql://localhost/mydb"
API_KEY = "sk_live_abc123"
DEBUG = True
# app.py
import config
def connect_database():
return psycopg2.connect(config.DATABASE_URL)
def call_api(endpoint):
return requests.get(f"https://api.example.com/{endpoint}",
headers={'X-API-Key': config.API_KEY})

Clean. Simple. Each module can access the configuration via import config. There are no forwarding parameters anywhere. Comfortable.

Then the bug reports started.

“The app works fine on my computer, but not in production.” “Tests pass locally but fail in CI.” “The feature works when run on its own, but stops working when run after other tests.”

I haven't been able to reproduce any of them. The code worked perfectly for me. However, users experienced random, sporadic crashes.

Then I discovered the problem….. global variables.

# test_api.py
import config
def test_with_mock_api():
# Override config for testing
config.API_KEY = "test_key_123"
result = call_api('users')
assert result.status_code == 200
# test_database.py
import config
def test_with_test_database():
# Override config for testing
config.DATABASE_URL = "postgresql://localhost/testdb"
result = query_users()
assert len(result) > 0

The tests modified the global state. When they were launched in sequence, they interfered with each other. The order in which the tests were performed determined which tests passed.

# Run test_api first, then test_database
pytest test_api.py test_database.py # Both pass
# Run test_database first, then test_api
pytest test_database.py test_api.py # test_api fails!
# Why?
# test_database changed DATABASE_URL globally
# test_api used the changed value
# Results were unpredictable

This month…..hunting for race conditions and strange test failures…..I learned the most painful lessons about the global state.

Global variables are not convenient. These are invisible dependencies that make the code impossible to understand. They create bugs that only appear under certain circumstances.

I'll show you all the ways global variables ruined my code.

A racing situation that only happened in production

I had counter as a global variable.

# counter.py
request_count = 0
def track_request():
global request_count
request_count += 1
return request_count
# app.py
from counter import track_request
@app.route('/api/endpoint')
def handle_request():
count = track_request()
logging.info(f"Request #{count}")
return process_request()

It worked perfectly during development. Each request was given a unique sequence number.

In multi-threaded production…..the request count was wrong.

# Thread 1 and Thread 2 run simultaneously
# Thread 1: Read request_count (0)
# Thread 2: Read request_count (0)
# Thread 1: Add 1, get 1
# Thread 2: Add 1, get 1
# Both threads see count = 1
# Later
# Thread 3: Read request_count (1)
# Thread 4: Read request_count (1)
# Thread 3: Add 1, get 2
# Thread 4: Add 1, get 2
# Counts are duplicated!
# Logs show
# Request #1
# Request #1 ← Duplicate!
# Request #2
# Request #2 ← Duplicate!

Global variables are not thread-safe. Multiple threads accessing the same global create race conditions.

import threading
request_count = 0
count_lock = threading.Lock()
def track_request():
global request_count
with count_lock: # Only one thread at a time
request_count += 1
return request_count

But this introduced another problem….. competition for the castle. Any request had to wait for the lock. Under load, performance decreased.

# Better: thread-local storage
import threading
thread_local = threading.local()
def track_request():
if not hasattr(thread_local, 'request_count'):
thread_local.request_count = 0
thread_local.request_count += 1
return thread_local.request_count
# Each thread has its own counter
# No race conditions
# No lock contention

Or don't use globals at all…..

class RequestTracker:
def __init__(self):
self.count = 0
self.lock = threading.Lock()

def track(self):
with self.lock:
self.count += 1
return self.count
# Create one instance per application
tracker = RequestTracker()
@app.route('/api/endpoint')
def handle_request():
count = tracker.track()
logging.info(f"Request #{count}")
return process_request()

Leaked cache

I created a global cache for performance.

# cache.py
_cache = {}
def cache_set(key, value):
_cache(key) = value
def cache_get(key):
return _cache.get(key)
# user_service.py
from cache import cache_set, cache_get
def get_user(user_id):
cached = cache_get(f"user:{user_id}")
if cached:
return cached

user = database.query("SELECT * FROM users WHERE id = %s", user_id)
cache_set(f"user:{user_id}", user)
return user

It worked great. Cached users, fast searches.

In production…..memory usage kept increasing. 8GB, 16GB, 32GB were used in the process. It finally crashed.

# After 1 hour
_cache = {
'user:1': {...},
'user:2': {...},
'user:3': {...},
# ... 10,000 users cached
}
# After 1 day
_cache = {
'user:1': {...},
'user:2': {...},
# ... 240,000 users cached
# 5GB of memory
}
# After 1 week
# 1,680,000 cached users
# 32GB of memory
# Server crashes

The cache was never cleared itself. Each unique user ID was cached forever. Global state accumulated indefinitely.

from functools import lru_cache
@lru_cache(maxsize=1000) # Limit cache size
def get_user(user_id):
return database.query("SELECT * FROM users WHERE id = %s", user_id)
# Or use time-based expiration
from datetime import datetime, timedelta
class ExpiringCache:
def __init__(self, ttl_seconds=300):
self._cache = {}
self._expiry = {}
self.ttl = timedelta(seconds=ttl_seconds)

def set(self, key, value):
self._cache(key) = value
self._expiry(key) = datetime.now() + self.ttl

def get(self, key):
if key not in self._cache:
return None

if datetime.now() > self._expiry(key):
del self._cache(key)
del self._expiry(key)
return None

return self._cache(key)
# One cache instance
cache = ExpiringCache(ttl_seconds=300)
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return cached

user = database.query("SELECT * FROM users WHERE id = %s", user_id)
cache.set(f"user:{user_id}", user)
return user

The test that changed the state of the world

I wrote a test that modified the global.

# settings.py
DEBUG = False
MAX_RETRIES = 3
TIMEOUT = 30
# test_api.py
import settings
def test_api_with_debug():
# Enable debug mode for this test
settings.DEBUG = True

result = call_api_endpoint()
assert 'debug_info' in result
# test_retry.py
import settings
def test_retry_logic():
# This test expects DEBUG = False
result = call_api_endpoint()
assert 'debug_info' not in result
# If test_api runs first
# settings.DEBUG is now True
# test_retry fails!

Tests the modified global state. The modifications persisted throughout testing. The order of the tests mattered.

# Test isolation broken
pytest test_api.py test_retry.py # test_retry fails
pytest test_retry.py test_api.py # Both pass
pytest test_retry.py # Passes when run alone
pytest test_api.py test_retry.py # Fails when run after test_api

I had to save and restore the state.

import settings
import pytest
@pytest.fixture
def restore_settings():
# Save original values
original_debug = settings.DEBUG
original_retries = settings.MAX_RETRIES
original_timeout = settings.TIMEOUT

yield # Test runs here

# Restore original values
settings.DEBUG = original_debug
settings.MAX_RETRIES = original_retries
settings.TIMEOUT = original_timeout
def test_api_with_debug(restore_settings):
settings.DEBUG = True
result = call_api_endpoint()
assert 'debug_info' in result
def test_retry_logic(restore_settings):
result = call_api_endpoint()
assert 'debug_info' not in result

Or better yet… don't modify globals in tests.

# Instead of modifying globals
def test_api_with_debug():
settings.DEBUG = True
result = call_api_endpoint()
# Pass configuration explicitly
def test_api_with_debug():
config = {'DEBUG': True}
result = call_api_endpoint(config)
# Or use dependency injection
def call_api_endpoint(debug=False):
if debug:
return {'data': '...', 'debug_info': '...'}
return {'data': '...'}
def test_api_with_debug():
result = call_api_endpoint(debug=True)
assert 'debug_info' in result

The singleton that wasn't there

I created a singleton pattern with global.

# database.py
_db_instance = None
def get_database():
global _db_instance
if _db_instance is None:
_db_instance = DatabaseConnection()
return _db_instance
# Multiple modules use it
from database import get_database
def save_user(user):
db = get_database()
db.execute("INSERT INTO users ...")
def get_orders():
db = get_database()
return db.query("SELECT * FROM orders")

I thought I had one connection to the database. In fact, I had one per trial.

# With multiprocessing
from multiprocessing import Process
def worker():
db = get_database()
# Each process creates its own connection
# _db_instance is not shared between processes
# In process 1
get_database() # Creates connection A
# In process 2
get_database() # Creates connection B (different instance!)
# Two separate connections
# Not a singleton across processes

As for thread safety…

# Thread 1 and Thread 2 run simultaneously
def get_database():
global _db_instance
if _db_instance is None: # Thread 1 checks: None
# Thread 2 checks: None
_db_instance = DatabaseConnection() # Thread 1 creates connection
_db_instance = DatabaseConnection() # Thread 2 creates connection (overwrites)
return _db_instance
# Two connections created
# One immediately lost
# Memory leak

Thread-safe singleton…..

import threading
_db_instance = None
_db_lock = threading.Lock()
def get_database():
global _db_instance
if _db_instance is None:
with _db_lock:
# Double-check inside lock
if _db_instance is None:
_db_instance = DatabaseConnection()
return _db_instance

But better…..don't use singletons or globals.

class Application:
def __init__(self):
self.db = DatabaseConnection()

def save_user(self, user):
self.db.execute("INSERT INTO users ...")

def get_orders(self):
return self.db.query("SELECT * FROM orders")
# Create one instance
app = Application()
# Pass it where needed
@app.route('/users')
def users_endpoint():
return app.get_orders()

A side effect of the import that ruined everything

I had initialization code at the module level.

# config.py
import os
DATABASE_URL = os.environ('DATABASE_URL') # Read from environment
API_KEY = os.environ('API_KEY')
# Initialize connection at import time
db_connection = connect_to_database(DATABASE_URL)

Every import config executed this code. Even if I just wanted to look at the module.

# test_config.py
import config # This crashes!
# Why?
# Environment variables aren't set
# KeyError: 'DATABASE_URL'
# Even if I'm just testing something else
import other_module
# other_module imports config
# config tries to read environment variables
# Crash!

Import side effects made it impossible to test or even import code without fully configuring the environment.

# Can't do this in tests
import config
config.DATABASE_URL = "postgresql://localhost/testdb"# Because it already tried to connect during import
# Connection was made with os.environ('DATABASE_URL')
# Setting it after import does nothing

I had to configure the entire environment to import the module.

# test_config.py
import os
# Must set environment before importing
os.environ('DATABASE_URL') = 'postgresql://localhost/testdb'
os.environ('API_KEY') = 'test_key'
import config # Now it works
# But this affects all other tests
# Global environment pollution

Fix….. lazy initialization.

# config.py
import os
_db_connection = None
def get_database_url():
return os.environ.get('DATABASE_URL', 'postgresql://localhost/defaultdb')
def get_api_key():
return os.environ.get('API_KEY', 'default_key')
def get_db_connection():
global _db_connection
if _db_connection is None:
_db_connection = connect_to_database(get_database_url())
return _db_connection
# Now importing doesn't crash
# Connection only created when actually needed

Or use a class…..

class Config:
def __init__(self):
self._db_connection = None

@property
def database_url(self):
return os.environ.get('DATABASE_URL', 'postgresql://localhost/defaultdb')

@property
def db_connection(self):
if self._db_connection is None:
self._db_connection = connect_to_database(self.database_url)
return self._db_connection
# Create config when needed
config = Config()
# Tests can override
config_test = Config()
config_test._database_url_override = 'postgresql://localhost/testdb'

A hidden dependency that prevents testing

My function used a global value without declaring it.

# config.py
API_ENDPOINT = "https://api.production.com"
# api.py
def call_api(path):
# Uses global without importing it explicitly
import config
url = f"{config.API_ENDPOINT}/{path}"
return requests.get(url)
# Looks fine
result = call_api('users')

But testing was impossible.

# test_api.py
import config
config.API_ENDPOINT = "https://api.test.com"
from api import call_api
# This STILL uses production API!
# Why?
# Because call_api imports config inside the function
# The import happens when the function runs
# After we've already overridden config.API_ENDPOINT
# But the import creates a new reference

In fact, it is misleading. Let me explain…..

# api.py
def call_api(path):
import config # Gets the config module
url = f"{config.API_ENDPOINT}/{path}"
# This WILL use the modified value
# Real problem is this
API_ENDPOINT = "https://api.production.com"
def call_api(path):
# Implicitly uses module-level global
url = f"{API_ENDPOINT}/{path}"
return requests.get(url)
# test_api.py
import api
api.API_ENDPOINT = "https://api.test.com"
result = api.call_api('users')
# Still uses production URL if function was called before modificati

Hidden dependencies are invisible in the function signature.

# This function looks like it only depends on 'path'
def call_api(path):
url = f"{API_ENDPOINT}/{path}"
return requests.get(url)
# But it secretly depends on API_ENDPOINT global
# Impossible to see without reading the implementation

Clearly mark dependencies…..

# Dependency is visible in signature
def call_api(path, endpoint="https://api.production.com"):
url = f"{endpoint}/{path}"
return requests.get(url)
# Testing is easy
def test_api():
result = call_api('users', endpoint='https://api.test.com')
assert result.status_code == 200
# Production uses default
result = call_api('users')
# Test overrides
result = call_api('users', endpoint='https://mock.api')

Or use dependency injection…..

class APIClient:
def __init__(self, endpoint="https://api.production.com"):
self.endpoint = endpoint

def call_api(self, path):
url = f"{self.endpoint}/{path}"
return requests.get(url)
# Production
client = APIClient()
result = client.call_api('users')
# Testing
test_client = APIClient(endpoint='https://api.test.com')
result = test_client.call_api('users')

A monkey patch that had lasting effects

I corrected globally for testing.

# production_module.py
import datetime
def get_current_time():
return datetime.datetime.now()
# test_module.py
import datetime
import production_module
def test_time_dependent_function():
# Mock current time
original_now = datetime.datetime.now
datetime.datetime.now = lambda: datetime.datetime(2024, 1, 1, 12, 0, 0)

result = production_module.get_current_time()
assert result.day == 1

# Forgot to restore!
# datetime.datetime.now = original_now
# All subsequent tests use mocked time!
def test_something_else():
now = datetime.datetime.now()
print(now) # 2024-01-01 12:00:00
# Forever stuck in the past!

Patching monkeys modifies the global state. If it is not restored, it affects everything.

import datetime
import pytest
@pytest.fixture
def freeze_time():
original_now = datetime.datetime.now
datetime.datetime.now = lambda: datetime.datetime(2024, 1, 1, 12, 0, 0)

yield

datetime.datetime.now = original_now # Always restore
def test_time_dependent_function(freeze_time):
result = get_current_time()
assert result.day == 1

Or use libraries designed for this…..

from freezegun import freeze_time
@freeze_time("2024-01-01 12:00:00")
def test_time_dependent_function():
result = get_current_time()
assert result.day == 1
# Automatically restored after test

What I learned about the global state

After a week of debugging unreproducible errors…..I learned that global variables are not about convenience. They concern hidden dependencies and invisible coupling.

Problems with globals:

  • Not thread safe (race conditions)
  • Not process safe (each process has its own)
  • Make testing difficult (shared state between tests)
  • Make the code difficult to understand (hidden dependencies)
  • Cause memory leaks (accumulate forever)
  • Create order dependencies (initialization order matters)

When globals are actually okay:

  • True constants (NEVER modified)
  • Module-level configuration (if immutable)
  • Cache with appropriate size limits and eviction
  • Login configuration
  • Read-only searches (if thread-safe)

Checklist I use now…..

Is this actually a global?
☐ Can it be a parameter instead?
☐ Can it be a class attribute instead?
☐ Can it be dependency-injected instead?
If it must be global:
Is it truly constant? (Never modified)
☐ Does it need thread safety?
☐ Does it need process safety?
☐ Does it need size limits?
☐ How will tests handle it?
☐ Are initialization side effects safe?

The golden rule…..if you use it global keyword, you're probably doing something wrong.

What global variable created errors that you couldn't reproduce? Share it below…..we have all learned that globalization makes everything harder.

Published via Towards AI

LEAVE A REPLY

Please enter your comment!
Please enter your name here