Welcome!

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, Agile Computing, @CloudExpo, SDN Journal

@BigDataExpo: Blog Feed Post

Real-time Big Data or Small Data?

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

big_little_bird

Real-Time Analytics Small Data Big Data
Data Volume None None
Data Velocity 100K events / day (<<1K events / second) Billion+ events / day (>>1K events / second)
Data Variety 1-6 unstructured on sources AND 1 single destination (an output file, a SQL database, a BI tool) 6+ structured and 6+ unstructured for sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data Models Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business Functions One line of business (e.g. financial trading) Several lines of business – to – 360 view
Business Intelligence No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.

 

 

Example: connect to desktop trading applications and transport trade events to an Oracle database.

ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.

 

Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:

Read the original blog entry...

More Stories By Jim Kaskade

Jim Kaskade currently leads Janrain, the category creator of Consumer Identity & Access Management (CIAM). We believe that your identity is the most important thing you own, and that your identity should not only be easy to use, but it should be safe to use when accessing your digital world. Janrain is an Identity Cloud servicing Global 3000 enterprises providing a consistent, seamless, and safe experience for end-users when they access their digital applications (web, mobile, or IoT).

Prior to Janrain, Jim was the VP & GM of Digital Applications at CSC. This line of business was over $1B in commercial revenue, including both consulting and delivery organizations and is focused on serving Fortune 1000 companies in the United States, Canada, Mexico, Peru, Chile, Argentina, and Brazil. Prior to this, Jim was the VP & GM of Big Data & Analytics at CSC. In his role, he led the fastest growing business at CSC, overseeing the development and implementation of innovative offerings that help clients convert data into revenue. Jim was also the CEO of Infochimps; Entrepreneur-in-Residence at PARC, a Xerox company; SVP, General Manager and Chief of Cloud at SIOS Technology; CEO at StackIQ; CEO of Eyespot; CEO of Integral Semi; and CEO of INCEP Technologies. Jim started his career at Teradata where he spent ten years in enterprise data warehousing, analytical applications, and business intelligence services designed to maximize the intrinsic value of data, servicing fortune 1000 companies in telecom, retail, and financial markets.

Latest Stories
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing and analyzing streaming data is the Lambda Architecture, represent...
Things are changing so quickly in IoT that it would take a wizard to predict which ecosystem will gain the most traction. In order for IoT to reach its potential, smart devices must be able to work together. Today, there are a slew of interoperability standards being promoted by big names to make this happen: HomeKit, Brillo and Alljoyn. In his session at @ThingsExpo, Adam Justice, vice president and general manager of Grid Connect, will review what happens when smart devices don’t work togethe...
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle.
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), will provide an overview of various initiatives to certifiy the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldw...
Providing the needed data for application development and testing is a huge headache for most organizations. The problems are often the same across companies - speed, quality, cost, and control. Provisioning data can take days or weeks, every time a refresh is required. Using dummy data leads to quality problems. Creating physical copies of large data sets and sending them to distributed teams of developers eats up expensive storage and bandwidth resources. And, all of these copies proliferating...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, will posit that disruption is inevitable for c...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Deep learning has been very successful in social sciences and specially areas where there is a lot of data. Trading is another field that can be viewed as social science with a lot of data. With the advent of Deep Learning and Big Data technologies for efficient computation, we are finally able to use the same methods in investment management as we would in face recognition or in making chat-bots. In his session at 20th Cloud Expo, Gaurav Chakravorty, co-founder and Head of Strategy Development ...
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...