Improving Efficiency With Redis
Recently, a company hired me to optimize one of their products as it took too long to read data from their database. Apparently, the data was too large for a simple query so we had to reconstruct the query and the data response.
While the project went on, I remember thinking about Redis and how it could have been used initially to bear some of the data needs. Ultimately, they were on track for a relaunch and it was deemed too risky to restructure the entire database so we had to optimize the old SQL way. Although it was a success, I can’t stop thinking about what could have been if we restructured. Perhaps writing this article will help with that.
Here I will discuss Redis both as a database and an addition to your tech stack. I will also demonstrate an implementation of Redis in a small application and share what I observed.
Before we begin, I would like to start with some introductions. Pardon me if you know the details and skip to the next section.
What is Redis?
Redis lets you store different data structures in your server’s main memory. It is relevant because the main memory is where your computer temporarily stores data for faster access — data stored on your disk is slow to read/write compared to memory.
In Redis, you can store different data structures like Lists, Hash, Sets, Sorted Sets and Strings which are the most commonly used although there are other structures for special situations. If you would like to familiarize yourself with basic Redis commands and data structures, you can try out this interactive tutorial.
On Redis Persistence
A real application requires you to store data permanently or at least until the user deletes it. Since Redis stores data inside memory (which is volatile), there are two mechanisms for permanently storing data on your disk. One is Redis Database (RDB) where Redis takes occasional snapshots of your data. The snapshot is usually stored as ‘dump.rdb’ and is useful for backups and replicating data. Another option is the Append-Only File (AOF) where Redis appends a log file which is a sequence of commands needed for reconstructing the current dataset. On server restart, Redis takes this file, runs all the commands in the log and reconstructs your data.
One advantage of AOF persistence is that you can sync your data every second (there are other options listed on the site) and reduce your risk of data loss. RDB persistence only happens at a fixed interval which means you can lose at least 60 seconds of data. It is usually recommended to use both RDB and AOF to guarantee data safety and easier replication. [1]
(Please visit Redis website for a detailed explanation of all options)
Redis and Technical Debt
Technical Debt is when you opt for an easier or cheaper solution initially, only to find yourself paying more in cash and time for better alternatives. It is a common scenario in the freelance space to see a business having to face technical debt due to ill-informed decisions in the initial stage. Unfortunately some of the businesses conclude that the risk of change is too much and then go ahead to patch up the existing stack. [2]
It is often a good idea to know the facts and prepare for expansion before even starting the project. A project with 50 users can quickly grow to 5000.
Common use cases of Redis
Redis can be used for caching but so much more can be done with it. Instead of the conventional method of storing every data in your (non-)relational database or filesystem, you can Redis bear some of the data weight, especially in cases where speed is a priority. Since the usage possibilities are endless due to the diversity of projects, allow me to list some of the issues I have come across:
- Session storage: I store important session data (like authentication) in memory via Redis. This way the data can be read faster and I don’t have to worry about horizontal scaling [3].
- Primary database: If I have a fairly small amount of data to store and consider speed a priority, Redis serves better as a primary database.
- Store unique fields for large databases: Let’s say I want to check if a given username or email exists in my large database. It is easier and way faster to store these unique fields in a Redis Set or Hash structure and check for existence [4].
- Chats/Messaging systems: Customers will always expect chatting systems to read/write data as fast as possible. I prefer to store messages in Redis (and periodically sync with MySQL).
- Collaboration software: For a serious collaboration platform, read/write speed is of the essence. I have found that quickly reading/writing with Redis provides a better experience.
As you might have noticed, there are use cases in almost every project you might think of. From notification systems to real-time complex systems, speed is of the essence and Redis serves you better. The most important thing is to decide when and where an implementation will add value to your business.
Experiences from sample implementation
For a demonstration, I used Redis to store data for a small blog that users could register/login onto, and could vote posts and sort by score or time. This was a perfect use case in my opinion because it used all of the five Redis data structures in detail. Also, using Redis as a primary database makes it easier for one to take it down a notch and instead use Redis as a partial addition to your tech stack.
In practice, the implementation was similar to conventional databases, except that I didn’t have to create tables or use queries to read/write data.
To register users, for example, I used an index of usernames and emails to ensure uniqueness, then the user’s details were stored using a Hash data structure. A sample code implementation is given below:
Note: Please check the GitHub repository for the full source code.
To vote for specific posts, here’s a sample code given:
The algorithm is simple: check if the user already voted for the post. If yes, remove the vote from the Redis Set, subtract 432 from the post score, subtract 1 from the post’s vote count.
Why 432? A post’s initial score is the Unix time at creation (which is the number of seconds since January 1, 1970 [5]. We want a post to be rated popular only for 24 hours and let other newer posts take its place. To do that, we divide the total seconds per day (86,400) by the maximum votes required to be rated highest in a day (200). This way a post cannot score higher than a new post uploaded 24 hrs later. I got this idea from Josiah Carlson in his book “Redis in Action”.
More interesting logic can be found around the application. The source code can be seen in full at https://github.com/joshuaetim/redis_app
Although this shows Redis being used as a primary database, we can easily limit Redis to sections where speed and flexibility matters. In a real-life scenario, I would use Redis for the voting, username/email uniqueness and caching features.
Taking responsibility as a developer
As a developer working in a company or freelancing, you have a responsibility to advise your client on avoiding technical debt. It could be a bit unpleasant especially when the client seems confident with their limited information but you have to find a way to pass it across. Raise up the possibility of rapid expansion and how it could be prepared for without breaking the bank.
Of course you should be considerate of the budget and try to offer solutions as creatively as possible. For example, Redis storage is limited to a section of the server’s memory and it begins to slow down your application if that limit exceeds the quota. For your client, that would mean buying more dedicated servers to increase memory. To avoid that, you might want to consider what is really worth storing with Redis. You could use Redis partially to augment the speed of your database operations without actually storing much information there. You can check out this blog post on such use cases (http://oldblog.antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html)
Side note: The book “Pragmatic Programmer” by Andrew Hunt and David Thomas explains in Chapter 1 (A Pragmatic Philosophy) how to handle these sorts of communications. Another helpful resource is “The Software Developer’s Life Manual” by John Sonmez.
Taking responsibility as a business owner
The developers working for you most likely have an idea of what Redis could mean for the stability and expansion of your business over time. Take some time to discuss your options with them and gauge the financial implications yourself. Sometimes, developers can get too excited about a piece of technology and forget about budget restraints so it is your responsibility to find a common ground.
Implementing Redis can be mostly inexpensive but in cases of larger or critical systems, you might want to plan carefully. Discuss with your technical team about your data needs and use that to determine the limits of your Redis implementation. Not everything needs to be stored on Redis.
Closing thoughts
Had the company in question initially used Redis to handle some of their database operations, there would be no need for the SQL surgery I had to perform. Keep in mind, though, that technical debt is very easy to fall into and most business/technical minds get into it. We cannot accurately predict the future but we can at least start somewhere.
There are other things to watch out for including scalability of the actual application instance, email handling in budget-friendly ways, and many more. I intend to talk about these cases one at a time. Until then, have fun in your current project.
Footnotes
- https://redis.io/topics/persistence
- https://en.wikipedia.org/wiki/Technical_debt
- As your application gets larger you might need to scale horizontally (adding more physical servers). The session data could be available on one server and absent in another. Using a Redis database or cluster ensures data is at a central location. (Here’s how sessions work — https://stackoverflow.com/questions/3804209/what-are-sessions-how-do-they-work)
- https://redis.io/topics/data-types
- https://en.wikipedia.org/wiki/Unix_time