Welcome!

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, Log Management, @CloudExpo, SDN Journal

@BigDataExpo: Article

Big Data Requires Advanced Analytics

Complex carrier network performance data on HP Vertica yields performance and customer metrics boon for Empirix

The next edition of the HP Discover Performance Podcast Series explores how network testing, monitoring, and analytics provider Empirix developed unique and powerful data processing capabilities.

Empirix uses an advanced analytics engine to continuously and proactively evaluate carrier network performance and customer experience metrics -- amid massive data flows -- to automatically identify issues as they emerge.

To learn more about how a combination of large-scale, real-time performance and pervasive data access made the HP Vertica analytics platform stand out to support such demands for Empirix, join Navdeep Alam, Director of Engineering, Analytics and Prediction at Empirix, based in Billerica, Mass.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:

Gardner: Why do you have such demanding requirements for data processing and analysis?

Alam: What we do is actively and passively monitor networks. When you're in a network as a service provider, you have the opportunity to see the packets within that network, both on the control plane and on the user plane. That just means you're looking at signaling data and also user plane data -- what's going on with the behavior; what's going at the data layer. That’s a vast amount of data, especially with mobile, and most people doing stuff on their devices with data.

Alam

When you're in that network and you're tapping that data, there is a tremendous amount of data -- and there's a tremendous amount of insights about not only what's going on in the network, but what's going on with the subscribers and users of that network.

Empirix is able to collect this data from our probes in the network, as well as being able to look at other data points that might help augment the analysis. Through our analytics platform we're able to analyze that data, correlate it, mediate it, and drive metrics out of that data.

That’s a service for our customers, increasing value from that data, so that they can turn around a return on investment (ROI) and understand how they can leverage their networks better to increase operations and so forth. They can understand their customers better and begin to analyze, slice and dice, and visualize data of this complex network.

They can use our platform, as well to do proactive and predictive analysis, so that we can create even better ROI for our customers by telling them what potentially might go wrong and what might be the solution to get around that to avoid a catastrophe.

New opportunities

Gardner: It’s interesting that not only is this data being used for understanding the performance on the network itself, but it's giving people business development and marketing information about how people are using it and where the new opportunities might be.

Is that something fairly new? Were you able to do that with data before, or is it the scale and ability to get in there and create analysis in near-real-time that’s allowed for such a broad-based multilevel approach to data and analysis?

Alam: This is something we've gotten into. We definitely tried to do it before with success, but we knew that in order to really tackle mobile and the increasing demands of data, we really had to up the ante.

Our investment with HP Vertica and how we've introduced that in our new analytics platform, Empirix IntelliSight 1.0, that recently came out, is about leveraging that platform -- not only for scalability and our ability to ingest and process data, but to look at data in its more natural format, both as discrete data, and also as aggregate data. We allow our customers to view that data ad hoc and analyze that data.

It positioned us very well. Now that we have a central point from which all this data is being processed and analyzed, we now run analytics directly at this data, increasing our data locality and decreasing the data latency. This definitely ups our ante to do things much faster, in near real time.

We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

Gardner: Obviously, the sensors, probes, agents, and the ability to pull in the information from the network needs to reside or be at close proximity to the network, but how are you actually deployed? Where does the infrastructure for doing the data analysis reside? Is it in the networks themselves, or is there a remote site? Maybe you could just lay out the architecture of how this is set up.

Alam: We get installed on site. Obviously, the future could change, but right now we're an on-premise solution. We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

One of the things we learned is that this is a tremendous amount of data. It doesn't make sense for us to just hold it and assume that we will do something interesting with it afterward.

The way we've approached our customers is to say, "What kind of value do you seen in this data? What kind of metrics or key performance indicators (KPIs), or what do you think is valuable in this data? We then build a framework that defines the value that they can gain from data -- what are the metrics and what kind of structure they want to apply to this data. We're not just calculating metrics, but we're also applying some sort of model that gives this data some structure.

As they go through what we call the Empirix Intelligent Data Mediation and Correlation (IDMC) system, it's really an analytics calculator. It's putting our data into the Vertica system, so that at that point we have meaningful, actionable data that can be used to trigger alarms, to showcase thresholds, to give customers great insight to what's going on in their network.

Growing the business

From that, they can do various things, such as solve problems proactively, reach out to the customers to deal with those issues, or to make better investments with their technology in order to grow their business.

Gardner: How long have you been using Vertica and how did that come to be the choice that you made? Perhaps you could also tell us a little bit about where you see things going in terms of other capabilities that you might need or a roadmap for you?

Alam: We've been using Vertica for a few years, at least three or four, even before I came on-board. And we're using Vertica primarily for its ability to input and read data very quickly. We knew that, given our solutions, we needed to load a lot of data into the system and then read a lot of data out of it fast and to do it at the same time.

At that time, the database systems we used just couldn't meet the demands for the ever-growing data. So we leveraged Vertica there, and it was used more as an operational data store. When I came on board about a year-and-a-half ago, we wanted to evolve our use of Vertica to be not just for data warehousing, but a hybrid, because we knew that in supporting a lot of different types of data, it was very hard for us to structure all of those types of data.

We wanted to create a framework from which we can define measures and metrics and KPIs and store it in a more flat system from which we can apply various models to make sense of that data.

Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

That really presented us a lot of challenges, not only in scalability, but our ability to work and play with data in various ways. Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

It required us to look at how we could leverage Vertica as an intelligent data-storage system from which we could process data, store it, and then get answers out of that data very, very quickly. Again, we were looking for responses in a second or so.

Now that we've put all of our data in the data basket, so to speak, with Vertica, we wanted to take it to the next level. We have all this data, both looking at the whole data value chain from discrete data to aggregate data all in one place, with conforming dimensions, where the one truth of that data exists in one system.

We want to take it to the next step. Can we increase our analytical capabilities with the data? Can we find that signal from the noise now that we have all this data? Can we proactively find the patterns in the data, what's contributing to that problem, surface that to our customers, and reduce the noise that they are presented with.?

Solving problems

Instead of showing them that 50 things are wrong, can I show them that 50 things are wrong, but that these one or two issues are actually impacting your network or your subscribers the most? Can we proactively tell them what might be the cause or the reason toward that and how to solve it?

The faster we can load this data, the faster we can retrieve the value out of this data and find that needle in the haystack. That’s where the future resides for us.

Gardner: Clearly, you're creating value and selling insight to the network to your customers, but I know other organizations have also looked at data as a source of revenue in itself. The analysis could be something that you could market. Is there an opportunity with the insight you have in various networks -- maybe in some aggregate fashion -- to create analysis of behavior, network use, or patterns that would then become a revenue source for you, something that people would subscribe to perhaps?

Alam: That's a possibility. Right now, our business has been all about empowering our customers and giving them the ability to leverage that data for their end use. You can imagine, as a service provider, having great insight into their customers and the over-the-top applications that are being leveraged on their network.

Could they then use our analytics and the metadata that we're generating about their network to empower their business systems and their operations to make smarter decisions? Can they change their marketing strategy or even their APIs about how they service customers on their network to take advantage of the data that we are providing them?

The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

Gardner: Are there any metrics of success that are particularly important for you. You've mentioned, of course, scale and volume, but things like concurrency, the ability to do queries from different places by different people at the same time is important. Help me understand what some of the other important elements of a good, strong data-analysis platform would be for you?

Alam: Concurrency is definitely important. For us it's about predictability or linear scalability. We know that when we do reach those types of scenarios to support, let’s say, 10 concurrent users or a 100 concurrent users, or to support a greater segmentation of data, because we have gone from 10 terabytes to 30 terabytes, we don't have to change a line of code. We don't have to change how or what we are doing with our data. Linear scalability, especially on commodity hardware, gives us the ability to take our solution and expand it at will, in order to deal with any type of bottlenecks.

Obviously, over time, we'll tune it so that we get better performance out of the hardware or virtual hardware that we use. But we know that when we do hit these bottlenecks, and we will, there is a way around that and it doesn't require us to recompile or rebuild something. We just have to add more nodes, whether it’s virtual or hardware.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

Latest Stories
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
IT organizations are moving to the cloud in hopes to approve efficiency, increase agility and save money. Migrating workloads might seem like a simple task, but what many businesses don’t realize is that application migration criteria differs across organizations, making it difficult for architects to arrive at an accurate TCO number. In his session at 21st Cloud Expo, Joe Kinsella, CTO of CloudHealth Technologies, will offer a systematic approach to understanding the TCO of a cloud application...
Connecting to major cloud service providers is becoming central to doing business. But your cloud provider’s performance is only as good as your connectivity solution. Massive Networks will place you in the driver's seat by exposing how you can extend your LAN from any location to include any cloud platform through an advanced high-performance connection that is secure and dedicated to your business-critical data. In his session at 21st Cloud Expo, Paul Mako, CEO & CIO of Massive Networks, wil...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Secure Channels, a cybersecurity firm, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Secure Channels, Inc. offers several products and solutions to its many clients, helping them protect critical data from being compromised and access to computer networks from the unauthorized. The company develops comprehensive data encryption security strategie...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
yperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it....
As businesses adopt functionalities in cloud computing, it’s imperative that IT operations consistently ensure cloud systems work correctly – all of the time, and to their best capabilities. In his session at @BigDataExpo, Bernd Harzog, CEO and founder of OpsDataStore, presented an industry answer to the common question, “Are you running IT operations as efficiently and as cost effectively as you need to?” He then expounded on the industry issues he frequently came up against as an analyst, and ...
Docker containers have brought great opportunities to shorten the deployment process through continuous integration and the delivery of applications and microservices. This applies equally to enterprise data centers as well as the cloud. In his session at 20th Cloud Expo, Jari Kolehmainen, founder and CTO of Kontena, discussed solutions and benefits of a deeply integrated deployment pipeline using technologies such as container management platforms, Docker containers, and the drone.io Cl tool. H...
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are m...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, will introduce two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a...
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
Cloud resources, although available in abundance, are inherently volatile. For transactional computing, like ERP and most enterprise software, this is a challenge as transactional integrity and data fidelity is paramount – making it a challenge to create cloud native applications while relying on RDBMS. In his session at 21st Cloud Expo, Claus Jepsen, Chief Architect and Head of Innovation Labs at Unit4, will explore that in order to create distributed and scalable solutions ensuring high availa...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...