Welcome!

Blog Feed Post

Open Source Stack: From LAMP to CHIRPS

LAMP is an acronym for an archetypal model of web-based solution stacks, originally consisting of largely interchangeable components: Linux, the Apache HTTP Server, the MySQL relational database management system, and the PHP/Python programming language. As a solution stack, LAMP is suitable for building dynamic websites and web applications.

  • Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution.
  • The Apache HTTP Server has been the most popular web server on the public Internet.
  • MySQL is a multithreaded, multi-user, SQL database management system (DBMS).
  • PHP is a server-side scripting language designed for web development but also used as a general-purpose programming language. Python is a widely used general-purpose, high-level programming language. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. 

LAMP has dominated the open source space for years. In spite of success, LAMP has some limitations that make it become a barrier to the present Big Data era. For example, LAMP was designed primarily for on-premise solutions and can’t store unstructured or semi-structured data directly. The processing unit for business logic does not support data-intensive workload well. It lacks adequate functions for data visualization and flexible features for data presentation.

There is a strong need for a different type of stack for today’s world. Here I define a new model to meet this need: Cloud, Hadoop, Impala, R, Pentao, Spark/Storm/Solr (CHIRPS).


  • Cloud becomes the default OS for Big Data solutions. Private clouds provide secure hosting facilities. Public clouds offer subscription-based elastic runtime environments. Hybrid clouds furnish the combined benefits from both worlds. OpenStack, for example, is an open source cloud platform for private and public clouds that provides access to large pools of compute, storage and networking resources throughout a corporate IT infrastructure.
  • Hadoop provides a distributed file system to store raw data and a parallel algorithm called MapReduce. It also has a data warehouse infrastructure Hive, columnar store HBase, data analytics package Mahout, as well as other components like Sqoop, Flume, Pig, Ambari, etc.
  • Impala is a parallel database query engine that offers high-performance query processing on Hadoop. The response time is claimed to be 90X faster over Hive.
  • R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
  • Pentaho supplies the data ingestion (ELT) function in Kettle, and data visualization/presentation in the Business Analytics suite, such as reporting, analysis, dashboard, and workflow, as well as a machine learning package Weka.
  • Spark delivers real-time Big Data performance (100X faster than MapReduce) via a fast in-memory or on-disk engine for large-scale data processing. Storm gives the fault-tolerant real-time processing for Big Data streaming. Solr is an open source enterprise search platform, featuring powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document handling, and geo search.

There are a variety of usage patterns and implementation styles for CHIRPS, depending on the scope and type of the solutions being constructed. For more information, please contact Tony Shan ([email protected]). ©Tony Shan. All rights reserved.

Read the original blog entry...

More Stories By Tony Shan

Tony Shan works as a senior consultant, advisor at a global applications and infrastructure solutions firm helping clients realize the greatest value from their IT. Shan is a renowned thought leader and technology visionary with a number of years of field experience and guru-level expertise on cloud computing, Big Data, Hadoop, NoSQL, social, mobile, SOA, BI, technology strategy, IT roadmapping, systems design, architecture engineering, portfolio rationalization, product development, asset management, strategic planning, process standardization, and Web 2.0. He has directed the lifecycle R&D and buildout of large-scale award-winning distributed systems on diverse platforms in Fortune 100 companies and public sector like IBM, Bank of America, Wells Fargo, Cisco, Honeywell, Abbott, etc.

Shan is an inventive expert with a proven track record of influential innovations such as Cloud Engineering. He has authored dozens of top-notch technical papers on next-generation technologies and over ten books that won multiple awards. He is a frequent keynote speaker and Chair/Panel/Advisor/Judge/Organizing Committee in prominent conferences/workshops, an editor/editorial advisory board member of IT research journals/books, and a founder of several user groups, forums, and centers of excellence (CoE).

Latest Stories
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
In this presentation, you will learn first hand what works and what doesn't while architecting and deploying OpenStack. Some of the topics will include:- best practices for creating repeatable deployments of OpenStack- multi-site considerations- how to customize OpenStack to integrate with your existing systems and security best practices.
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
The now mainstream platform changes stemming from the first Internet boom brought many changes but didn’t really change the basic relationship between servers and the applications running on them. In fact, that was sort of the point. In his session at 18th Cloud Expo, Gordon Haff, senior cloud strategy marketing and evangelism manager at Red Hat, will discuss how today’s workloads require a new model and a new platform for development and execution. The platform must handle a wide range of rec...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of ...
DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the 22nd International CloudEXPO | DXWorldEXPO "Early Bird Registration" is now open. Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
Enterprises are universally struggling to understand where the new tools and methodologies of DevOps fit into their organizations, and are universally making the same mistakes. These mistakes are not unavoidable, and in fact, avoiding them gifts an organization with sustained competitive advantage, just like it did for Japanese Manufacturing Post WWII.
The revocation of Safe Harbor has radically affected data sovereignty strategy in the cloud. In his session at 17th Cloud Expo, Jeff Miller, Product Management at Cavirin Systems, discussed how to assess these changes across your own cloud strategy, and how you can mitigate risks previously covered under the agreement.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...
Traditional on-premises data centers have long been the domain of modern data platforms like Apache Hadoop, meaning companies who build their business on public cloud were challenged to run Big Data processing and analytics at scale. But recent advancements in Hadoop performance, security, and most importantly cloud-native integrations, are giving organizations the ability to truly gain value from all their data. In his session at 19th Cloud Expo, David Tishgart, Director of Product Marketing ...
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes how...
As Enterprise business moves from Monoliths to Microservices, adoption and successful implementations of Microservices become more evident. The goal of Microservices is to improve software delivery speed and increase system safety as scale increases. Documenting hurdles and problems for the use of Microservices will help consultants, architects and specialists to avoid repeating the same mistakes and learn how and when to use (or not use) Microservices at the enterprise level. The circumstance w...