inriver - Logging & Performance

Written by Alex Moss | Dec 28, 2023

The problem

While there have been countless benefits for inriver PIM developers reaped from the move from on-premise inriver to the cloud-hosted multi-tenant IPMC solution, one of the challenges that has come up is the monitoring of the solution and figuring out if your extensions are performing as expected. More importantly, if they are not performing as expected, then how to diagnose the performance issue and measure your solutions to the problem. This challenge is exacerbated by the fact that running extensions locally while connected to the IPMC environment through the remoting connection we see significantly different performance characteristics than when an extension is running in the IPMC environment. The only way to get a true read on the extension is to measure it while it's running in the IPMC environment.

As with most partners that do a lot of work with inriver, Aperture Labs has built a library of extensions that can be extended and deployed for our customers to perform common tasks. One of these very common tasks is importing data into the PIM. For one specific client that had a very wide data and relatively deep dataset (multiple-thousand fields, and around 70,000 'rows'), they also had pretty complex automation rules around calculating field values as the data was importing, requiring the look up of additional data in the PIM, traversal of links, etc.

As we went through the development process with this client, we had seen this import slow from 2-3 seconds per 'row' to 5-6, then 7-8 seconds and longer for each row processed. So we turned our attention to the performance of this import. We did our usual process of reviewing all import code, looking for potential optimization locations. Taking care of that low hanging fruit yielded some improvements but not everything we had hoped for, so we started adding logging to the code. We instrumented and logged everything from calculating and setting and individual field values, to creating links, saving entities and processing entire rows. Again, this process found us some additional areas of optimization but not enough, in fact after the optimization that saved, let's say, 1,000ms per row, the processing of each row was only about 500ms faster.

The Hypothesis

This led us to the following hypothesis, the act of logging itself was a significant contributor to the performance issues that we were experiencing. While we had always assumed that logging was all but 'free' from a performance perspective we were seeing evidence that excessive logging (and we had introduced excessive logging) was a contributing to a significant amount of time to our imports.

The Test

We developed a dummy test extension (the code can be found here) that simulated a spreadsheet style import of 1,000 rows with between 50 & 100 fields in each row. Each field would require between 0 and 3 'operations' to be performed on it to simulate calculating the field value. We added some randomized sleep time for the calculation operations, and to simulate the saving of entities to ensure that we have something resembling real world.

Our test process would time and log every interaction with an appropriate log level as we would in a production environment:

The beginning and end of the process logged as a warning (we would normally do this as information but for the test I wanted to elevate this above any loops) - 2 log entries per import
The processing of a row is logged as Information to show progress for the import - 1 log entry per row
The processing of each field is logged as Debug to show progress of the field - 1 log entry per field, 50-100 logs per row. This is typical to debug issues with data coming in incorrectly or not at all.
The processing of each calculation operation for each field is logged as Verbose - 0-3 entries per field, 0-300 entries per row. This is typical to deep dive into data and logic issues.

The Result

By running this test 4 times, excluding a different set of logs each time we came up with the following result:

Logging all log levels, the simulated import took 1076 seconds to run
Excluding the "Verbose" log level, the simulated import took 813 seconds to run
Excluding "Verbose" and "Debug" log levels, the simulated import took 197 seconds to run.
Excluding "Verbose", "Debug", and "Information" log levels, the simulated import took 192 seconds to run.

As we can see, the results show a strong indication that the act of logging has an impact on the performance of the PIM. The challenge is that extensive logging is sometimes required to troubleshoot bugs, issues, undesired behavior, and even performance in the PIM.

The Solution

To solve the issue of balancing visibility into what the PIM is doing during processes and the overall performance of the system we have come up with a simple extension method to the inriver Context called SelectiveLog. This extension method can be called exactly like the Context.Log method. Using a server setting and/or an extension setting called "LOGGING_LEVEL" we can control which log levels we want to enter into the log. The extension method will read this setting from the context and essentially swallow logs that are not at the minimum log level that has been configured.

The Outcome

By implementing this extension method, converting all our logs to use SelectiveLog, and ensuring we are using appropriate log levels everywhere we are logging we were able to get our import performance back down to the 1-2s per row, while including all the expensive calculations and lookups. We are still able to lower the log threshold of either the server or an individual extension without the need to deploy code when we want or need to debug an issue. Additionally developers do not need to remember to wrap their log entries in conditional logic to take advantage, they simply need to use Context.SelectiveLog instead of Context.Log

The Code

This link contains the code for the dummy import test extension and the SelectiveLog ExtensionMethods

View full post