Welcome!

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, Agile Computing, @CloudExpo, SDN Journal

@BigDataExpo: Blog Feed Post

Real-time Big Data or Small Data?

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

big_little_bird

Real-Time Analytics Small Data Big Data
Data Volume None None
Data Velocity 100K events / day (<<1K events / second) Billion+ events / day (>>1K events / second)
Data Variety 1-6 unstructured on sources AND 1 single destination (an output file, a SQL database, a BI tool) 6+ structured and 6+ unstructured for sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data Models Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business Functions One line of business (e.g. financial trading) Several lines of business – to – 360 view
Business Intelligence No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.

 

 

Example: connect to desktop trading applications and transport trade events to an Oracle database.

ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.

 

Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:

Read the original blog entry...

More Stories By Jim Kaskade

Jim Kaskade currently leads Janrain, the category creator of Consumer Identity & Access Management (CIAM). We believe that your identity is the most important thing you own, and that your identity should not only be easy to use, but it should be safe to use when accessing your digital world. Janrain is an Identity Cloud servicing Global 3000 enterprises providing a consistent, seamless, and safe experience for end-users when they access their digital applications (web, mobile, or IoT).

Prior to Janrain, Jim was the VP & GM of Digital Applications at CSC. This line of business was over $1B in commercial revenue, including both consulting and delivery organizations and is focused on serving Fortune 1000 companies in the United States, Canada, Mexico, Peru, Chile, Argentina, and Brazil. Prior to this, Jim was the VP & GM of Big Data & Analytics at CSC. In his role, he led the fastest growing business at CSC, overseeing the development and implementation of innovative offerings that help clients convert data into revenue. Jim was also the CEO of Infochimps; Entrepreneur-in-Residence at PARC, a Xerox company; SVP, General Manager and Chief of Cloud at SIOS Technology; CEO at StackIQ; CEO of Eyespot; CEO of Integral Semi; and CEO of INCEP Technologies. Jim started his career at Teradata where he spent ten years in enterprise data warehousing, analytical applications, and business intelligence services designed to maximize the intrinsic value of data, servicing fortune 1000 companies in telecom, retail, and financial markets.

Latest Stories
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, will share examples from a wide range of industries – includin...
Unless your company can spend a lot of money on new technology, re-engineering your environment and hiring a comprehensive cybersecurity team, you will most likely move to the cloud or seek external service partnerships. In his session at 18th Cloud Expo, Darren Guccione, CEO of Keeper Security, revealed what you need to know when it comes to encryption in the cloud.
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
We're entering the post-smartphone era, where wearable gadgets from watches and fitness bands to glasses and health aids will power the next technological revolution. With mass adoption of wearable devices comes a new data ecosystem that must be protected. Wearables open new pathways that facilitate the tracking, sharing and storing of consumers’ personal health, location and daily activity data. Consumers have some idea of the data these devices capture, but most don’t realize how revealing and...
"We are an all-flash array storage provider but our focus has been on VM-aware storage specifically for virtualized applications," stated Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
"We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
It's easy to assume that your app will run on a fast and reliable network. The reality for your app's users, though, is often a slow, unreliable network with spotty coverage. What happens when the network doesn't work, or when the device is in airplane mode? You get unhappy, frustrated users. An offline-first app is an app that works, without error, when there is no network connection. In his session at 18th Cloud Expo, Bradley Holt, a Developer Advocate with IBM Cloud Data Services, discussed...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at 20th Cloud Expo, Ed Featherston, director/senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
Between 2005 and 2020, data volumes will grow by a factor of 300 – enough data to stack CDs from the earth to the moon 162 times. This has come to be known as the ‘big data’ phenomenon. Unfortunately, traditional approaches to handling, storing and analyzing data aren’t adequate at this scale: they’re too costly, slow and physically cumbersome to keep up. Fortunately, in response a new breed of technology has emerged that is cheaper, faster and more scalable. Yet, in meeting these new needs they...
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busin...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
"We are the public cloud providers. We are currently providing 50% of the resources they need for doing e-commerce business in China and we are hosting about 60% of mobile gaming in China," explained Yi Zheng, CPO and VP of Engineering at CDS Global Cloud, in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.