Over the years many agile proponents have come out strongly against offshoring some of the development team, and in particular against having a remote testing team. We made use of not one, but two separate outsourcing providers located in two distant locations. While we had many challenges, what we found was that by starting with an overall testing strategy and an understanding of the strengths and constraints, we were able to achieve outstanding results. In particular, we were able to reduce defects found in customer beta testing by 84% and known customer issues at deployment by 97%.
Landmark Graphics is a wholly owned subsidiary of Halliburton and is the premier provider of software and technology services for the upstream oil and gas industry. Its software solutions help geoscientists and engineers make highly complex technical and business decisions. The specific product line involved in this case study is DecisionSpace Nexus, a next generation reservoir simulation software suite which utilizes a finite difference mathematical model to allow companies to accurately model hydrocarbon assets, enabling rapid decisions on high-dollar development scenarios.
Like many other software systems, Nexus is collection of integrated applications. This system of systems comprises multiple millions of lines of source code and provides a complete user experience from data preparation to numerical simulation to 3-dimensional visualization. And while many modules of the Nexus family are next-generation, there are several aspects that are legacy and are complicated further by the need to support both the next-generation simulator as well as the legacy simulator.
The development and testing process
The Nexus development team had been doing many things right since they started development in 2001, but had not fully embraced agile development. In 2007 the team moved to a more structured Scrum environment. While not a drastic change for the team, the additional structure seemed to work well and helped identify some of the areas where there were bottlenecks.
One of the most important things that the entire team realized was the importance of having an automated regression test suite which exercised the functionality without going through the graphical user interface. While the team did not have extensive unit tests, they did have a good set of functional tests that provided overall feature coverage. Prior to any commitment being finalized, the development regression suite was run. In addition to the developer regression suite, the team relied on a customer regression suite, a manual smoke test, and additional exploratory testing. The developer regression suite safety net had paid off well for the team and they had general confidence in their check-ins. However, these developer tests were not finding issues that were showing up in the more complicated customer models. To make matters worse, the big challenge with the customer regression suite was that it took almost a week of computation time on a high-end cluster to determine if the tests passed or failed.
High performance computing software is designed to be able to run on multiple processors, sometimes upwards of 64 or 128 cores. This parallel computing can effectively reduce computation time significantly. The flip side of that is that there is a reason why the software needs to use a lot of processing power – it is doing very complicated scientific calculations that take a lot of computational cycles. While Moore’s Law and multicore processors have increased computational power, the complexity that the engineer builds into their models has grown in lockstep. Engineers tend to design their models so that they can get results back with overnight turnaround. Some models like those in our customer suite take several days to complete. One engineer at a customer site joked that the most complex models are measured not in days, but in haircuts.
In addition to the challenges of long computation times, reservoir simulation brings the additional challenge that the problem is being solved by approximation techniques. In other words, small perturbations in either data or algorithmic code, or even running on a slightly different processing environment can result in different output results. The development team had been long aware of that issue and had built an intelligent differencing tool to help understand whether generated results were within engineering accuracy of the baseline results.
We used a fairly standard approach to test automation. A given test scenario was run through the system and the results compared against a known baseline. For most software testing the difference is absolute. What is required for our situation is to have a smarter differencing engine that compares results and reports on whether the differences are within engineering accuracy. If they are, the new results then become the new baseline. If not, then there is an issue that needs to be addressed or understood. Sometimes our engineers will find that while the results do not appear to be within engineering tolerance, the software is nonetheless doing the right thing. The differences are artificial and could be considered the results of “butterfly effects” of a poorly conditioned system.
Over the years many agile proponents have come out strongly against offshoring some of the development team, and in particular against having a remote testing team. We had a corporate mandate to utilize outsourcing and decided we were going to make the best of it. In the end we were pleasantly surprised by the overall results. We made use of not one, but two separate outsourcing providers located in two distant locations.
Florin Simion, one of our co-authors, is a professor of petroleum reservoir engineering within reservoir simulation and had a software development team in Romania. We partnered with him to build a team of 2 software developers and 3 petroleum engineers. While we had typical startup challenges, it was refreshing to talk to engineers who actually understood what we were trying to do. They quickly took over the smoke test, and while it initially took nearly their whole day to run the manual smoke test, they eventually got to the point where they could do it in 4 hours. This allowed time to do additional testing and also to serve as domain experts to assist the development team.
While we had been interested in doing more GUI test automation, our team did not have the bandwidth or the expertise to do it well. Our testers were petroleum engineers—they needed to have engineering skills in order to know whether the test results were meaningful. We evaluated LogiGear as a partner because they provided expertise in GUI test automation that included their own test automation software toolkit. We saw this as an opportunity to augment our domain talent with test automation, thus allowing our engineers to focus on higher value testing. We kicked this off with a test team in Vietnam and a project manager based in California.
Developer tests were catching many regression issues, but the more complex customer regressions were still catching a lot of problems. Besides taking a long time to run to provide results, the problems discovered with the more complex customer data were also difficult to debug.
Our developer test suite was very simple but covered most of the functionality. On the other hand, the customer datasets did not utilize all the potential functionality, but were significantly more complex both in terms of size of the models as well as in the overall interactions with the models. For the overall system, the smoke tests covered a few “happy paths” created from the top six integrated training workflows. These tests did not exercise any particularly complex scenarios, but since they were manual tests they were both time consuming and monotonous for the testers.
As things were, both the developers and the testers were barely keeping up with the defect backlog. And when they did think they had things under control customers would invariably find issues in beta testing or once deployed.
What we did
We decided that we needed to evaluate how we could optimize our testing efforts. The developer and customer regressions were working well but did require some maintenance to keep up with new functionality. Figure 1 shows the direction that we took with our testing strategy. We looked at what was working and where we had gaps. It was an investment that we hoped would pay out with better coverage and faster feedback. With the help of our outsourcing partners, we set out to make it happen.
The team felt that the greatest need was an additional set of tests that were more complex than the developer tests and exhibited some of the complexities of the customer datasets—but would provide overnight turnaround. We dedicated one of our Houston petroleum engineer testers to developing what we called the “Mid-Tier” regression suite. This suite of test models was built of synthetic data subsets similar to some of the more complex customer models. Effort was put into making sure that the test suite would run overnight.
At about the same time we started working with LogiGear’s Vietnam team to automate our smoke tests. Our objective was to increase coverage through automation, while at the same time freeing up our reservoir engineers so that they could utilize their domain expertise to do more exploratory testing and to design more test cases.
One useful way to look at testing strategy is through the Testing Quadrants originally proposed by Brian Marick and then further expanded by Lisa Crispin and Janet Gregory.
We largely focused the automation effort on quadrants Q2 and Q4. The mid-tier and customer tests were geared first towards functional accuracy (Q2), but since computational performance is one of our key differentiators we made sure to track any performance changes as we ran all our tests (Q4). It is also worth noting that the Nexus team chose to use a lightweight form of functional testing with the developer regression suite to cover what would often be done via unit testing in Q1. The team did have some unit tests and arguably could have had more unit tests, however their approach to having a developer regression suite of lightweight functional tests worked quite well for them. The nature of the reservoir simulation problem is such the solution of the whole system of equations is necessary to see the full interplay of the complex physics being simulated.
A key aspect of our overall test automation strategy was that it freed our valuable reservoir engineers to be able to spend more time on Exploratory Testing (Q3). This is where the engineers could really utilize their domain knowledge to challenge the system in a manner likely to be used by one of our customers.
With test automation, a potential pitfall is often the time required to maintain automated tests. Especially in agile development, if major revisions of the automated tests are required for each new software release then the test team will always have problems keeping the tests up to date. A primary goal of our automation was to utilize a method that would allow the team to maintain and grow the test suite without major effort.
The initial smoke test automation pilot project with LogiGear targeted the top six integrated workflows used for Nexus software training. In addition to providing experienced test automation engineers, LogiGear also provided the test automation tool (TestArchitect) which they developed that utilizes the Action Based Testing methodology. Initially, the testing tool required some development to support Linux as well as some proprietary legacy application components. The ability to customize the tool was critical to automating all the test cases.
Test leads for the pilot were established at each testing location. Workflows were prioritized and accountability for initial LogiGear automation split across assigned testing resources. Houston engineers found it easiest to provide movies to guide the testers through the integrated smoke test workflows. Our Simco test lead had previously spent time with the team in Houston, and being very familiar running the smoke tests manually could answer any questions the LogiGear testers had regarding workflow requirements that may not have been clear in the movie clips.
The remote test engineer viewed the movie and used the Action Based Testing method to create the test cases as a series from keywords (actions) with arguments. The automation focused not on automating test cases, but automating the actions. Since there are far fewer actions than test cases, and action implementations tend to be shorter than test case implementations, the automation effort is more manageable. This is especially evident when the application under test changes. Using the action based test suite, only a limited number of actions had to be maintained.
Managing the globally distributed teams was challenging but worked out quite well overall. Our primary development team was in Houston with some developers in France and Romania. We had domain testers in Houston and Romania and the automation testers were in Vietnam. Having the project manager for LogiGear in their California office was invaluable for the communication required for test tool augmentation as well as any necessary testing automation reprioritization.
As mentioned earlier, the Nexus product line is made up of multiple applications which are developed by sub-teams. The individual products are quite diverse in the technology, team size, amount of legacy code and other parameters. The Context Leadership Model shown in Figure 3 is a model that we have used to look at projects based on the degree of uncertainty and complexity.
Complexity includes project composition such as team size, geographic distribution, and team maturity. Uncertainty includes both market and technical uncertainty. The four quadrants are named with metaphorical animals.
The Data Studio team, while globally distributed, was nonetheless fairly small and also had very well defined tasks necessary for the update from the legacy simulator to cover the new functionality. With low uncertainty, generally low complexity and a senior team leader, we let the team largely manage themselves.
Surfnet was a new product and was looking to provide a solution that no other commercial product currently solved. This meant that it had high uncertainty. The team was relatively small, although globally distributed. The senior developers were collocated in Houston with two remote developers and a tester in Romania. The remote team was managed independently by one of the senior developers and communicated as necessary via email and phone conversations with a minimum of a weekly sync up meeting.
The NexusView team started with two developers and one primary tester. This project had some overlap with the Surfnet project so we simply merged the team into the Surfnet Scrum meetings.
The Nexus simulator team was all collocated in facilities in Houston and was by far the largest team. Nexus is the core engine and must coordinate with the other supporting applications. Overall the uncertainty was moderate and the overall complexity put it into the cow category. As a result we settled on a longer iteration length of three weeks. The team started with daily standups, and while they found value in the standups they felt that the nature of their R&D work fit better with standups every other day. The team adjusted and continued to deliver in a highly effective manner.
The overall system of systems required managing all of the uncertainty and even more complexity. The total team size was such that we did not feel the need for a Scrum of Scrums model. Instead, we had two ScrumMasters that covered all of the projects, and essentially had them pair to cover the overall release. Each ScrumMaster had primary accountability for a couple of teams, and the other participated in key Scrum meetings for those projects that they did not have direct responsibility. In that way both of them were up on the overall program and knew what cross team issues needed to be resolved. This model worked quite well as not only did the cross team communications happen efficiently, but when one of the ScrumMasters was out we had the other one help out without missing a beat.
How it worked out
The results from the project were impressive. Our concerted effort on improving quality demonstrated significant improvement over the prior year. In both cases we had a 2-3 month beta program with a couple of key customers. Table III summarizes the results and compares with the prior year. The improvement in quality was substantial.
Although overall things went very well, there were several challenges that we either had to overcome or live with. We deal with very sensitive customer data. While customers are willing to share that data with us for our limited use in testing the software, our agreements generally do not extend to our offshore partners. This limited some of what we were able to accomplish with our partners and required use of synthetic data for much of the testing done by the offshore teams. While we would have preferred to have more flexibility here, this was something that we found we could work with.
Initially the time shift to both Romania and Vietnam created challenges with communication. In the end the time shift and overlap in times between teams actually turned out to work to our benefit. Most of our team was in Houston, while our petroleum engineering partner was in Bucharest, Romania, and our test automation partner was in Vietnam. The time shift to Romania is a very manageable 8 hours, and particularly manageable as our partner was very flexible with work schedules. We utilized the Romania team to help with communications with the Vietnam team.
Once we had tests in operation, the Vietnam team would initiate the automation tests during their day, overlapping with the Romania team during the Romanian morning. By the afternoon in Romania, the petroleum engineering team would take a deeper dive into any issues raised by the automation tests to make sure that we understood what the issues were. In the end what we got was a daily automation that ran during Houston nights and provided reliable status by the time developers arrived the next morning.
With the success of the Landmark division, over 20 Halliburton divisions are now utilizing 3-15 LogiGear automation engineers per division, and are using customized applications of TestArchitect to allow testing to keep up with rapid development cycles. The positive results through the combination of in-house management, outsourced testing, and TestArchitect test automation has increased confidence in the ability to deliver a high quality product to clients.
Conclusion: what we learned
Test Automation is Necessary to Maintain Velocity. Prior to this initiative the team was diligently working but nonetheless struggling to keep up with quality issues. Our existing automation testing was invaluable, but we still relied too much on manual testing. We also realized that additional automation suites could make a big improvement in our overall productivity. By augmenting our test automation we were able to find issues faster and give our domain experts more time on exploratory testing.
A Testing Strategy Helps to Maximize Efficiency. The team had some good automation and exploratory testing, but knew they could be better. Rather than just randomly add more tests, we looked to see which types of tests would add the most value. For us we found that adding an additional set of functional tests and automating GUI smoke tests could pay off quite well.
Outsourcing Can Work When Used Judiciously. We relied heavily on outsourcing partners to get this work done. While we had some minor challenges in the beginning, we found that it was quite workable. It won’t work well if you don’t have the right talent or the right attitude. Our partners wanted us to succeed and we wanted them to succeed.
We found that even test automation can be outsourced effectively. The key was the combination of domain expertise, provided by our own team and our Romanian partner, Simco; with the test automation expertise of LogiGear; and the in-house project management that made this globally distributed team work in our agile development environment. By focusing on what our partners were good at and recognizing what we were good at in-house, we were able to leverage our overall talent. We found qualified petroleum engineers who were able to be part of our team and make significant contributions. We found talented testers that arguably understood GUI test automation better than us.
Test automation does not replace exploratory testing. While test automation is critical to check against regression defects, we found exploratory testing still to be critical. Our exploratory testing found more than 80% of the defects—that was made possible by automating more tests so we could free-up our domain experts to do more exploratory testing.
Distributed teams can be effective. Our teams were globally distributed and we certainly had some overhead associated with that distribution. We aimed to minimize the overhead of the distribution using a pattern common to software development—loose coupling and tight cohesion. We aimed to have locally collocated teams that had tight cohesion, and recognized that there was coupling and dependencies across distributed teams. We first sought to understand those dependencies and then made sure to monitor and manage the dependencies.
Joe has more than 25 years of executive management experience in computer software and networking and is responsible for customer facing operations.Prior to joining LogiGear in 2007, Joe held senior executive roles at Solovatsoft, a provider of outsourced IT services with resources in Russia, and Mark Systems, a software firm specializing in large-scale data retrieval systems and telephone-computer integration. His professional background includes software development management, customer service operations and technical writing.Joe holds a BS in computer science and math, is a graduate of the Stanford MBA program for Senior Executives, and has completed the E-Commerce Management program at San José State University.
 B.K. Coats, G.C. Fleming, J.W. Watts, M. Rame, G.S. Shiralkar, SPE 87913: “A Generalized Wellbore and Surface Facility Model, Fully Coupled to a Reservoir Simulator,” SPE Reservoir Evaluation & Engineering, Volume 7, Number 2, 2004, pg 132-142.
 B. S. Al-Matar, et. al. , SPE 106069: “Next-Generation Modeling of a Middle Eastern Multireservoir Complex”, SPE Reservoir Simulation Symposium, 26-28 February 2007, Houston, Texas, U.S.A.
 L. Crispin and J. Gregory, Agile Testing: A Practical Guide for Testers and Agile Teams, Addison-Wesley, 2009.
 H. Buwalda, “Action Based Testing,” Better Software, Volume 13, Number 2, March/April 2011.
 T. Little, “Context-Adaptive Agility: Managing Complexity and Uncertainty”, IEEE Software, May/June 2005.
 P. Pixton, N. Nickolaisen, T. Little, K. McDonald, “Stand Back and Deliver: Accelerating Business Agility,” Addison-Wesley, 2009.
 T. Little, “Assessing the Cost of Outsourcing: Efficiency, Effectiveness and Risk,” IEEE EQUITY 2007, March 19-21, 2007, Amsterdam, Netherlands.