In the last issue on testing the SMAC stack we talked about the social and mobile aspects of testing. We will be referring to them in this article. In this issue part 2, we focus on the Analytics and Cloud aspect. The goal of this article is to understand a simple landscape of analytics and cloud.
Understanding the Flow
Let’s look at a diagram to walk through a very basic analytics and cloud workflow. We are looking for testing points in the analytics and cloud parts of SMAC.
You have an app on your mobile phone. You execute some workflows or use cases. In addition to the functionality of the app, all of the activities send data back to the cloud. From the app you made, some examples could be:
- What and how many various social interactions do you do?
- How many Tweets, Instagram posts or Yelp posts? What payment methods do you use for purchases?
- In what location are you using the app?
- What is your connection? Wi-Fi, 3G or 4G?
All this data is captured and sent to the data store in the cloud.
Any associated IoT (Internet of Things) devices ranging from garden humidity sensors to heart monitors are gathering and sometimes streaming large amounts of data.
The data that is captured and stored will be defined by the business side of product development. The business will decide what part of the user’s behaviors is meaningful for analysis.
In the SMAC stack, all this data is stored in the cloud, in reality it does not have to be stored in the cloud or take advantage of cloud services, but for our SMAC discussion, it is.
How the data is stored is enough for a different examination: the pluses and minuses or cost of structured data and data warehouses vs. unstructured data and Hadoops is a business decision. The methods and tools to test those vary greatly and deserve their own article.
The data is analyzed and spun around by various algorithms, written by data scientists to capture a particular aspect of users and use that the business requires.
Let’s look at an example here to bring this workflow to more reality.
Let’s say my mobile phone has an app that controls a watering device for the garden at my house. An IoT device in my garden measures the soil moisture content. The data from this device, depending on the functionality in the app, might be correlated to data from the national weather service that is streamed and focuses on air temperature and relatives humidity. This streaming data might be a giant data set stored and manipulated in the cloud. the resulting analytics are then sent back to the business and the Dev team to pull the data they need to look at the user patterns to optimize workflows and add or remove functionality.
At what point in this diagram do we need to insert some testing?
Data testing is a well understood and traditional area in software testing. Typically, data integrity (accuracy and consistency), access and availability, as well as all the defined functional and error handling tests, will be run and automated.
Also, it is important to verify and test that the analytics algorithms are working as well and as expected. It is important to note, testing the correct function of the algorithm is one set of tests, and validating the data science behind the algorithm is very different. The business and data scientists design what the algorithm is collecting, sorting and calculating. The business and data science correctness of that is not what we are testing. It is very important but it is not the software testing we do. That doesn’t mean you would never test that. If you are the subject matter expert and you are the most knowledgeable in that domain, perhaps you would test the data science, but more commonly not.
The analytics gathered are very useful information for test teams. A very common set of data is click analysis. Click analysis is real usage, not happy path or modeled workflows- it’s what your users are actually doing. They are real-world scenarios and paths that must be tested and are the best to automate since you know they are actual uses. Check these workflows against your test case for gaps. Depending on the data collected, check you are testing with real user data, at real user peak and low use times, testing on the correct devices, with the right connectivity. The analytics must validate or give you data to fix your test coverage and hopefully reveal gaps, error handing (to be tested) and boundary cases.
Cloud/ Data Warehouse/Hadoop
As I mentioned above, how the data is stored- whether in more traditional relational data bases in data warehouses or in more modern Hadoops- is a business decision but will have major implications on how you test.
If you are a consumer of cloud services, in addition to the variety of different tools and methods for testing the competing infrastructures, you will be testing normal data storage servers for security, performance, load, concurrency and race conditions.
The providers of cloud services will do most, if not all the infrastructure testing here and have SLAs(service level agreements) promising certain performance and load benchmarks and security attributes they meet.
Cloud testing, in many cases, is similar to traditional testing. It is important to fully understand the Social, Mobile, Analytics and Cloud stack to test it effectively. The analytics and cloud aspects are mainly comprised of data, data manipulation and how the data is stored in the cloud. The testing focuses on the analytics algorithm functionality testing, normal data integrity test as well as common server tests- most commonly performance and security tests. The infrastructure chosen for the data in the cloud will dictate some significant changes to that testing.