This article is contributed. See the original author and article here.
Background
In using databases as a key component of internet infrastructure, IT departments are finding unexpected problems in particular when using DBaaS (Database-as-a-Service). One of these challenges is in connection management. There are three areas where connection management can be a problem:
- CPU overhead when an application “thrashes” connections rapidly by opening, closing and authenticating connections;
- Memory overhead when applications hold open long-lived connections that are often idle, which would be better used as block cache or may require a larger instance size than CPU requirements dictate
- Noisy neighbor congestion for a multi-tenant database. Limiting the number of active connections on a per-customer basis ensures fairness.
Solution
The Heimdall Proxy was designed for any SQL database including Azure Database for MySQL and Azure SQL Data Warehouse (SQL DW) for connection pooling such as:
- General connection reuse: New connections leverage already established connections to the database to avoid connection thrashing. This results in lower CPU utilization;
- Connection multiplexing: Disconnects idle connections from the client to the database, freeing those connections for reuse for other clients. This dramatically reduces connection counts to the database, and frees memory to allow the database to operate more effectively;
- Tenant Connection Management: The combination of 1) Per-user and per database connection limiting and 2) Multiplexing control the number of active queries a particular tenant can use at a time. This protects database resources and helps ensure the customer SLA (Service-level Agreement) is met and not disrupted by a busy neighbor using the same database.
Figure 1: Heimdall Proxy Architecture Diagram
The Heimdall Proxy provides better control over database resources, providing more efficient and consistent behavior. As a result, users will reduce their database instance size and/or support higher customer density on the same database. In this blog, we explain how these functions work and are configured.
Basic Connection Pooling
A basic connection pooler opens a number of connections upfront (the pool), then as an application needs a connection, instead of opening a new connection, it simply borrows a connection from the pool and returns it as soon as it is not needed. For most pools to be effective:
- The application is aware pooling will be used, and does not leave connections idle, but instead opens and closes them as needed;
- All connections leverage the same properties, such as the database user and catalog (database instance);
- State is not changed on the connection.
For a typical application server environment (e.g. J2EE), basic pooling is supported. In other environments, where pooling was not part of the initial design, simply inserting a connection pooler can cause more overhead than expected:
- When multiple users are connecting, and each user rarely uses more than a few connections (e.g. Data Analytics): This may open a set of connections per user or close connections that are retrieved from the pool that do not match the desired properties and open new ones. This results in a large amount of connection thrashing (e.g. Apache Tomcat pooling and most other poolers).
- When many catalogs are used: In order to avoid changing the connection state, a discrete pool per catalog is created allowing an appropriate connection to be reused. This avoids triggering a USE statement before each new request.
- When attempting to constrain total connections to the database and on a per-user basis
Figure 2: Basic Connection Pooling
For basic connection pooling, an active (green) front-side connection is paired with a back-side connection as shown in Figure 2 above. Additionally, you may have idle (red), unassigned connections in the backend for new connections. As such, you are NOT reducing the total number of connections, but are reducing the thrashing that occurs as the connections are opened and closed. The main benefit of basic pooling is lower CPU load.
To configure connection pooling on Heimdall Central Console, select the Data Source tab. Click the checkbox to turn on Connection Pooling showed below:
Connection Multiplexing
Beyond basic pooling, there is connection multiplexing, which does not associate a client connection with a fixed database connection. Instead, active queries or transactions are assigned to an existing idle database connection, or else a new connection will be established. If a client connection is idle, no resources are used on the database, reducing connection load and freeing memory. Shown in Figure 3 below, only active connections (green) are connected to the database via the connection pool while the idle connections (red) are ignored.
Figure 3: Connection Multiplexing
Multiplexing is a much more complicated technology than basic pooling. Therefore, many factors need to be accounted for. In the following situations, multiplexing will halt, and a connection will remain pinned, including:
- If a transaction starts, then the connection mapping will remain until the transaction completes with a commit, rollback, or the client connection is terminated;
- If a prepared statement occurs on a connection, this makes the connection stateful, and will remain pinned to the database until the connection is terminated;
- If a temporary table is created, the connection will remain pinned until the table is deleted.
To configure multiplexing on the Heimdall Central Console, go to the VirtualDB tab. And under Proxy Configuration, just click Multiplex option shown below:
In the event that special queries break multiplexing logic, and multiplexing needs to be disabled on the connection, go to the Rules tab for more granular control (along with other pool behaviors). For example, you can add the below rules to:
- Disable multiplexing when locking tables
- Enable multiplexing when unlocking tables
Tenant Connection Management
The third use-case helps ensure SLAs by enforcing per-tenant limits on connections and when combined with multiplexing, total active queries. This prevents one user from saturating the database, ensuring fairness of resources for others. A second tier of pool management is activated, that of “user pools”.
In the Data Sources tab, the pool can be managed at two tiers: the user level and the database. You can limit each user to a number of total connections and idle connections. Use the icon to add limits as shown below:
Shown above, the total connections allowed to the database across all users is 1000, but each user is only allowed 100, and of those, only 10 can be idle. Excess idle connections will be disposed of. Each time a connection is returned from the pool, there is a chance the connection will be closed: A value of 1000 means that there is a 1/1000 chance that the connection will be closed. This behavior is different from most connection poolers that set an absolute connection age which for large deployments can result in a stampede of many connections closing and reopening at once.
Figure 4: Multi-tenancy with Pooling, Multiplexing and Per-tenant connection limits
Figure 4 shows two tenants (with unique usernames or catalogs), allowing only active connections (green) to the database when multiplexing is enabled. If Tenant A attempts to perform a third query (blue) while two are active, it will be queued until one of the current active queries completes.
The net result of the combination of 1) Pooling and 2) Multiplexing, and 3) Per-tenant limits is that no single tenant can saturate database capacity, resulting in the SLAs of other customers failing. Further, as beyond a certain point, adding execution threads to the database will result in negative performance. This control can improve overall performance in many cases, allowing more capacity during peak load.
Use Cases
Magento
Magento is an e-commerce package written in PHP. Since PHP does not support efficient connection pooling due to its processing model, each page-view opens a connection to the database, requests all the data it needs, then closes the connection. For every page-view, it results in a very high amount of connection thrashing against the database and can consume a large percentage of the database CPU. With the Heimdall proxy, basic connection pooling alone can reduce the load on the database by up to 15% percent.
Slatwall Commerce
Slatwall is an eCommerce platform written in Java, and is natively designed to use pooling. Although, under heavy load, it can result in the saturation of the allowed connections on MySQL (at most 7000). In order to support larger user-loads, the Heimdall proxy can reduce the connection load by an order of magnitude, resolving connection limits on the database, and allowing the CPU load to be the limiting factor in larger deployments. Per the developer of Slatwall, connection offload with multiplexing and pooling resulted in a 10x reduction in connections to the database.
Complementary Features
While Heimdall proxy provides connection management for databases, there are other features provided that further improve SQL scale, performance and security:
- Query Caching: The best way to lower database CPU is to never issue a query against the database in the first place. The Heimdall proxy provides the caching and invalidation logic for Amazon ElastiCache as a look-aside results cache. It is simple way to improve RDS scale and improve response times without application changes;
- Automated Read/Write split: When a single database becomes too expensive to upgrade, or there is already a standby reader that is sitting idle, separating read and write queries can be used to offload write queries to an alternate server, improving resource utilization. Moreover, replication lag detection is supported to ensure ACID compliance.
- Active Directory/LDAP integration: By authenticating against LDAP, the Heimdall proxy manages connections for a large number of users, and synchronizes the authentication credentials into the database. In environments that require database resources to be accessible to many users in the enterprise, while providing data security, this feature is easy to administer, while preventing individual users from over-taxing resources.
Next Steps
Deployment of our proxy requires no application changes, saving months of deployment and maintenance. To get started, you can download a free trial on the Azure Marketplace, or contact Heimdall Data at info@heimdalldata.com.
Resources
- Blog: Using the Heimdall Proxy to Split Reads and Writes for MySQL
- Blog: Automated Query Caching for MySQL
- Heimdall Data technical documentation
- Contact: info@heimdalldata.com
Heimdall Data, a Microsoft technology partner, offers a database proxy that intelligently manages connections for SQL databases. With the Heimdall’s connection pooling and multiplexing feature, users can get optimal scale and performance from their database without application changes.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments