Sympretek Home

Root Cause Analysis

Large scale internet application platforms have become increasing complex. The time to problem identification and resolution can be lengthy and costly. A typical platform consists of network devices, load balancers, web servers, application servers, identity servers, database servers, and a range of other specialized servers. Applications deployed on these platforms are constructed using a number of complex software components. Further, applications frequently must marshal data from multiple other applications on separate platforms over which the development and operation teams have little or no direct control.

Each of these components, platforms, and applications may be operating appropriately in isolation. However, when combined, integration issues cause both latent undetected and explicit visible problems. If each of these hardware and software components have hundreds or thousands of settings and methods for completing their tasks, the possible problem causes that must be recognized and shifted through becomes enormous.

When an associated end-user operation may have to cross dozens of system boundaries and perform thousands of actions, it becomes too much for an expert's unassisted experience and skill. Visibility into application processing flows and a mechanism for breaking the integration challenge down into manageable elements are essential problem solving methods.

Automated tools are offered that assist in identifying the cause of these problems. However, these tools also struggle with this complexity and they themselves add additional complexity. This additional complexity and cost can only be justified when each team takes full advantage of the tools offerings during each stage of a application's life-cycle.

Testing processes must be identified that demonstrate both the platform and application robustness. Prior to application deployment, it is not sufficient to simply insure that the platform will start up. A test mechanism must be utilized that increases the confidence that the attendant services will also function as expected. This is much like unit testing applied to the infrastructure platform. While this testing cannot avoid all problems, even a partial reduction in the occurrence of final integration problems is beneficial.

Sympretek provides services that assist in identifying these integration points and moving the difficult problem solving process forward.