Welcome!

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Microsoft Cloud, Open Source Cloud, @DXWorldExpo

@CloudExpo: Article

Case Study: Integrating Redshift, DynamoDB and JasperSoft

JasperSoft and Redshift are touted as the new way to do data warehousing in the cloud

We recently completed a proof-of-concept (POC) that involved pulling data out of DynamoDB and into Redshift so that business users could analyze the data in an ad hoc manner with JasperSoft. JasperSoft and Redshift are touted as the new way to do data warehousing in the cloud and we were using DynamoDB as a sort of alternative to something Hadoop based.

We were amazed at how quickly and easily you could get a business-user friendly view of the data we had stored in DynamoDB using Redshift and JasperSoft. The actual human effort required to copy some initial data from DynamoDB into Redshift and then view it in JasperSoft was barely a few hours. However, there were a few unforseen technical challenges, but these challenges were not insurmountable. Ultimately we will continue with this technology combination because, as well as being easy to deploy and use, it gives us confidence that we can scale the POC into a "real" solution, and that our growing data needs will be taken care of.

Redshift's inbuilt copy from DynamoDB function makes getting data into Redshift fast but has limitations
Redshift provides an out-of-the-box copy function to copy data from DynamoDBinto Redshift without the need to set up servers or write any code other than a few simple lines of SQL. We were able to copy many of our tables straight out of DynamoDB and into Redshift and start running ad hoc queries without having to fire up servers or create an entire data translation layer of software.

We couldn't copy all of our tables, however, as DynamoDB's String Set field type is not currently supported by Redshift's copy function. After attempts at various SQL hacks and closely reading the Redshift manual we realized that we would not be able to work with these tables in the POC. In the future we will write some simple copy scripts (unless Amazon beats us to it and updates the copy command which, given their continuous product improvement, is likely).

Copying takes time
As it was a POC we were only copying across 3.5GB or 50 million rows of data but this process did take some time to complete - it took us 37 hours. Both Redshift and DynamoDB were running on the lowest performance settings and the DynamoDB instance was also servicing the needs of the live beta application we were trying to extract data from. We suspect this process could easily be made quicker by increasing the DynamoDB instance power and the power of Redshift but we did not test this.

The time it takes to copy 3.5GB of data indicated to us that a considered approach is necessary for getting data from DynamoDB into Redshift, especially considering that the live data will be much larger in volume. For example, when this goes into production we are only going to copy new records on a daily basis instead of clearing the entire Redshift database and reloading it to stay up to date.

Working in SQL with Redshift makes life easy
Once we had pulled our initial copy of data into Redshift we needed to manipulate the data to get it into a form that business users could analyze and create reports with.

The great thing about Redshift is that you are working in an environment you are familiar with. SQL. Redshift, at the time of writing, is based on PostgreSQL 8.0.2 so we were able to apply familiar string manipulation and math functions as well as create and join new tables to make the data much easier to understand for a non-technical business user.

Some SQL functions aren't yet supported by Redshift so we had to read through the documentation every now and then to find a suitable alternative. Sometimes it was just about trying to find the alternate name Redshift was using for a function we were used to using. Other times it meant creating some interesting workaround SQL. For example, Redshift doesn't support a function that can convert a Unix timestamp to a date so we had to manually convert our time stamps to dates using a mathematical formula.

Instant, non-technical user friendly data access with JasperSoft
We easily spun up a JasperSoft OnDemand instance and connected it to Redshift quite quickly. We were then creating ad hoc views and reports in a matter of minutes.

We did have some issues analyzing one of our tables straight out-of-the-box, one table had almost 3.5GB. Attempting to view reports on this table led to JasperSoft crashing. With some tweaking of the way the reports ran we were able to prevent analysis of this table from crashing JasperSoft.

More Stories By Scott Middleton

Scott Middleton is the CEO and Principal Consultant at Terem Technologies, a company that specializes in custom software development for innovative companies and high-tech ventures.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: implemen...
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
In a world where the internet rules all, where 94% of business buyers conduct online research, and where e-commerce sales are poised to fall between $427 billion and $443 billion by the end of this year, we think it's safe to say that your website is a vital part of your business strategy. Whether you're a B2B company, a local business, or an e-commerce site, digital presence is key to maintain in your drive towards success. Digital Performance will take priority in 2018 for the following reason...
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
"This week we're really focusing on scalability, asset preservation and how do you back up to the cloud and in the cloud with object storage, which is really a new way of attacking dealing with your file, your blocked data, where you put it and how you access it," stated Jeff Greenwald, Senior Director of Market Development at HGST, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Creating replica copies to tolerate a certain number of failures is easy, but very expensive at cloud-scale. Conventional RAID has lower overhead, but it is limited in the number of failures it can tolerate. And the management is like herding cats (overseeing capacity, rebuilds, migrations, and degraded performance). In his general session at 18th Cloud Expo, Scott Cleland, Senior Director of Product Marketing for the HGST Cloud Infrastructure Business Unit, discussed how a new approach is neces...
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I focus on what we are calling CAST Highlight, which is our SaaS application portfolio analysis tool. It is an extremely lightweight tool that can integrate with pretty much any build process right now," explained Andrew Siegmund, Application Migration Specialist for CAST, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We view the cloud not as a specific technology but as a way of doing business and that way of doing business is transforming the way software, infrastructure and services are being delivered to business," explained Matthew Rosen, CEO and Director at Fusion, in this SYS-CON.tv interview at 18th Cloud Expo (http://www.CloudComputingExpo.com), held June 7-9 at the Javits Center in New York City, NY.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The Founder of NostaLab and a member of the Google Health Advisory Board, John is a unique combination of strategic thinker, marketer and entrepreneur. His career was built on the "science of advertising" combining strategy, creativity and marketing for industry-leading results. Combined with his ability to communicate complicated scientific concepts in a way that consumers and scientists alike can appreciate, John is a sought-after speaker for conferences on the forefront of healthcare science,...