While there have been countless benefits for inriver PIM developers reaped from the move from on-premise inriver to the cloud-hosted multi-tenant IPMC solution, one of the challenges that has come up is the monitoring of the solution and figuring out if your extensions are performing as expected. More importantly, if they are not performing as expected, then how to diagnose the performance issue and measure your solutions to the problem. This challenge is exacerbated by the fact that running extensions locally while connected to the IPMC environment through the remoting connection we see significantly different performance characteristics than when an extension is running in the IPMC environment. The only way to get a true read on the extension is to measure it while it's running in the IPMC environment.
As with most partners that do a lot of work with inriver, Aperture Labs has built a library of extensions that can be extended and deployed for our customers to perform common tasks. One of these very common tasks is importing data into the PIM. For one specific client that had a very wide data and relatively deep dataset (multiple-thousand fields, and around 70,000 'rows'), they also had pretty complex automation rules around calculating field values as the data was importing, requiring the look up of additional data in the PIM, traversal of links, etc.
As we went through the development process with this client, we had seen this import slow from 2-3 seconds per 'row' to 5-6, then 7-8 seconds and longer for each row processed. So we turned our attention to the performance of this import. We did our usual process of reviewing all import code, looking for potential optimization locations. Taking care of that low hanging fruit yielded some improvements but not everything we had hoped for, so we started adding logging to the code. We instrumented and logged everything from calculating and setting and individual field values, to creating links, saving entities and processing entire rows. Again, this process found us some additional areas of optimization but not enough, in fact after the optimization that saved, let's say, 1,000ms per row, the processing of each row was only about 500ms faster.
This led us to the following hypothesis, the act of logging itself was a significant contributor to the performance issues that we were experiencing. While we had always assumed that logging was all but 'free' from a performance perspective we were seeing evidence that excessive logging (and we had introduced excessive logging) was a contributing to a significant amount of time to our imports.
We developed a dummy test extension (the code can be found here) that simulated a spreadsheet style import of 1,000 rows with between 50 & 100 fields in each row. Each field would require between 0 and 3 'operations' to be performed on it to simulate calculating the field value. We added some randomized sleep time for the calculation operations, and to simulate the saving of entities to ensure that we have something resembling real world.
Our test process would time and log every interaction with an appropriate log level as we would in a production environment:
By running this test 4 times, excluding a different set of logs each time we came up with the following result:
As we can see, the results show a strong indication that the act of logging has an impact on the performance of the PIM. The challenge is that extensive logging is sometimes required to troubleshoot bugs, issues, undesired behavior, and even performance in the PIM.
To solve the issue of balancing visibility into what the PIM is doing during processes and the overall performance of the system we have come up with a simple extension method to the inriver Context called SelectiveLog. This extension method can be called exactly like the Context.Log method. Using a server setting and/or an extension setting called "LOGGING_LEVEL" we can control which log levels we want to enter into the log. The extension method will read this setting from the context and essentially swallow logs that are not at the minimum log level that has been configured.
By implementing this extension method, converting all our logs to use SelectiveLog, and ensuring we are using appropriate log levels everywhere we are logging we were able to get our import performance back down to the 1-2s per row, while including all the expensive calculations and lookups. We are still able to lower the log threshold of either the server or an individual extension without the need to deploy code when we want or need to debug an issue. Additionally developers do not need to remember to wrap their log entries in conditional logic to take advantage, they simply need to use Context.SelectiveLog instead of Context.Log
This link contains the code for the dummy import test extension and the SelectiveLog ExtensionMethods