Dynamic DNS Resolution in Nginx
Introduction
When configuring Nginx as a reverse proxy, one of the most challenging issues engineers face is the problem of dynamic DNS resolution. This becomes particularly relevant when using cloud services like AWS, where load balancer endpoints (ALB) might change their underlying IP addresses without warning.
Understanding the Problem: DNS Resolution in Nginx
How Nginx Handles DNS Resolution
By default, Nginx resolves DNS names to IP addresses only once during startup or configuration reload.
Then it caches the ip and reuses it for further requests
This design decision was made for performance reasons, as constant DNS lookups would add overhead to request processing. However, this creates a significant problem when working with dynamic endpoints like AWS Application Load Balancers (ALBs).
For example, consider this basic Nginx configuration:
When Nginx starts, it resolves my-alb-1234567890.us-east-1.elb.amazonaws.com
to its current IP addresses.
The problem occurs when AWS rotates the IP addresses behind that ALB domain (which happens regularly for maintenance, scaling, or failover).
As Nginx will continue using the cached (and now potentially invalid) IP addresses, then the apis will start failing and they will not reach to backend.
Why AWS ALB DNS Addresses Change
AWS Application Load Balancers are designed for high availability and scaling. To achieve this, AWS:
Uses multiple IP addresses for a single ALB
Regularly rotates these IPs for maintenance and scaling
Manages failover by changing IPs when instances become unhealthy
May scale the ALB horizontally, adding new IP addresses during high traffic periods
The ALB DNS name itself remains constant, but the IP addresses it resolves to can change at any time, often without notice.
The Resulting Problems
This mismatch between Nginx's one-time DNS resolution and AWS's dynamic IP assignment leads to several issues:
Connection failures: Requests fail when Nginx tries to connect to stale IP addresses
Service degradation: Some backends might become unavailable while others still work
Manual intervention: Engineers need to regularly reload Nginx to force DNS re-resolution
Cascading failures: During high-traffic events when AWS is scaling the ALB, Nginx might miss the new IPs
Let's explore various solutions to address this fundamental mismatch.
Solution 1: Resolver Directive with Valid Parameter
Nginx provides a resolver
directive that can be configured to periodically re-resolve DNS names. This is the most straightforward solution: [Need to check with Nginx Version compatibility]
Let's break down what's happening here:
resolver 10.0.0.2 8.8.8.8 valid=30s
: Configures Nginx to use AWS VPC dns and Google's DNS (8.8.8.8) as fallback to resolve the ip and cache DNS entries for 30 secondsipv6=off
: Disables IPv6 resolution (optional but helpful if your environment doesn't support IPv6)server my-alb-1234567890.us-east-1.elb.amazonaws.com resolve
: Theresolve
parameter tells Nginx to re-resolve this hostname periodically
This solution works well for many scenarios but has limitations with upstream blocks. Let's look at a more robust approach.
Solution 2: Variable for Backend with Resolver
A more flexible approach uses variables to store the backend address:
The key difference in this approach is storing the ALB address in a variable ($backend_server
) and using that variable directly in the proxy_pass
directive.
When Nginx sees a variable in proxy_pass
, it re-resolves the DNS name for each request, using the cache duration specified in the resolver
directive.
This gives you fine-grained control over how often DNS resolution occurs. But this solution also may not work as variables does not inherit location block uri, but backend expects it.
See final solution.
Solution 3: External Service Discovery
For production environments, consider using a dedicated service discovery tool:
This approach relies on an external service discovery system like Consul, which would need additional configuration:
The service discovery approach:
Delegates DNS management to a specialized tool
Provides additional health checking capabilities
Can handle complex service discovery logic
Works well in microservices environments
Difference Between proxy_pass with variable and proxy_pass with hardcoded url
When you use a hardcoded URL in the proxy_pass directive vs proxy_pass with variable:
Nginx handles it differently than when using a variable:
URI Handling: With a hardcoded URL, Nginx processes the URI differently and preserves certain aspects of the original request that are important for authentication.
With variable it does not inherit the location block that may be important for the backend.
With hardcode url it cachescaches the result for the lifetime of the worker process and reuse it for further requests, that could get change but it will not get resolved untill reload or restarted.
Header Preservation: The way Nginx builds the upstream request maintains authentication headers better with hardcoded URLs.
The host header may be inaccurate
Hardcoded URL: Resolved once at startup or config reload, cached for worker lifecycle
Variable-based URL: Resolved at request time using the resolver directive settings
Final Solution
To solve all the problems related with dns resolutions, correct request uri and header preservation, we need to combine the solutions.
Here is the final configuration that worked for me.
First we need to add resolver for AWS Dns in http block in nginx.conf file
In server block of app.conf
Best Practices and Considerations
Regardless of which solution you choose, consider these best practices:
1. Configure Proper Timeouts
Always set appropriate timeouts in your Nginx configuration:
2. Implement Health Checks
Add active health checks to detect backend issues early:
3. Monitor DNS Resolution
Add logging to track DNS resolution:
4. Test Under Load
Before deploying to production, test how your solution performs under load, especially during IP transitions.
Conclusion
Dynamic DNS resolution in Nginx when dealing with AWS ALB endpoints requires careful consideration. While the default behaviour of resolving DNS only at startup is a performance optimization, it creates challenges in dynamic cloud environments.
Last updated