19th October 2022
An interview with the full-text search engine creator and NearForm Staff Engineer, Michele Riva, about Lyra
At NearForm open source is at the core of our values. Our team is actively contributing to Node.js, as well as supporting open source projects such as Fastify, Lyra, Pino, and Clinic.js internally. And we use the projects we develop in our client projects
In July Michele Riva, Staff Engineer here at NearForm, was due to give a talk at WeAreDevelopers World Congress about building a full-text search engine in TypeScript. Lyra is a direct result of the talk Michele gave at that conference.
We sat down with Michele to get some more information about Lyra including what it is, why it is relevant, how it works, what types of projects it can enhance and a host of other topics.
Follow along below to learn more about the innovative, and extremely fast, full-text search engine that we call Lyra.
We’ve been seeing a lot of buzz on github and social media about the full-text search engine, Lyra, that you recently released on github. Can you give us a high-level overview of what Lyra is?
Sure! Lyra, is the very first, full-text search engine capable of running on edge networks.
Edge networks are a hot topic when it comes to high-performance applications. Deploying a full-text search engine which has speed (speed of search in a database) as it’s main purpose to a network that is faster, by design, can only drastically improve its performance.
So, Lyra, is unique in this context, and that is what makes Lyra very different from any other search engine out there.
Also, deploying on an edge network makes it significantly easier and effortless to maintain a full-text search engine, as the whole environment is controlled and provisioned by your cloud provider, letting us developers concentrate on the search experience exclusively, rather than maintaining and updating the software.
Use Cases for Lyra
So, Lyra is a full-text search engine that can run on an edge network which has been built with performance and speed in mind. This has to have developers salivating at the potential use cases. What do you see as some of the potential use cases for Lyra?
I would say there are many different use cases. We can start by saying that when compared to traditional search engines, if you want to deploy them on your own, a lot of time and a lot of expertise are required.
Elastic is my favorite open source project, and it works great. However, it can be really complicated to maintain, for example, an Elasticsearch cluster. With Elastic, if you have unpredictable traffic, then you have to know how to scale it properly and how to maintain it.
Since Lyra was built to run on edge networks these are non-issues.
We just use Nebula, which is a build system that we are building at NearForm for Lyra, to basically get an app up and running in just a few seconds by defining the configuration file and using the CLI tool for deploying our instance directly on an edge network such as Cloudflare Workers, AWS Lambda@Edge, Netlify Edge or any other edge network that will eventually be supported by Lyra.
But the fact that you don’t need a DevOps person to maintain your infrastructure because Lyra runs on an edge network, is only one thing to consider when choosing a search engine.
Lyra is also built to scale automatically. When it comes to unpredictable traffic Cloudflare Workers and Lambda@Edge are amazing solutions for running unpredictable and spiky applications.
For example, if a company is running a commercial for their product which leads to higher traffic on their website, Lyra would immediately be able to scale virtually, infinitely, depending on the edge network you’re working on.
So, if I’m building a website or a mobile app and I need a search function, is Lyra something that I could use?
Yeah, you can definitely use it on the frontend or on the backend as you prefer.
If you have a React Native application, for example, you can easily use Lyra inside of your React Native application. If you have a web application, you can use Lyra directly inside the browser.
Now you’ve talked a lot about edge networks, but are there instances where having Lyra on a server might be ideal?
I would say there are situations where having Lyra on a different server might be a good idea. For example, when there is a very large collection of documents to search through.
So, if you have, let’s say a website containing 1 billion pages, you might want to index 1 billion contents on a remote server, and then ask for data, only the specific data that you want to require.
But, if you have a small collection of data, let’s say 10,000 elements or even 1 million elements, you can keep everything in memory on your browser and Lyra will be able to just run on the browser directly.
Lyra in Action
You mentioned earlier that it is easy to deploy Lyra in a project. Do you have any examples?
Absolutely, my colleague here at NearForm, Paolo Insogna, created a little demonstration on deploying Lyra to an edge network in about 5 seconds.
⏱ How long does it take to deploy a full-text search engine to an edge network?
⚡️ What if we say around 5 seconds?
In this preview made by @p_insogna, we’re deploying a local copy of a Pokèdex to @Cloudflare Workers using Nebula, the build system for Lyra. #opensource pic.twitter.com/USPEDt1oqf
— NearForm Open Source Releases (@NearFormOSS) August 25, 2022
Yes, now that you mention it I do remember seeing that tweet. That reminds me, I saw another tweet about running Lyra on Cloudflare Workers. Can you explain what that tweet was about?
Yes, I can describe that for you. Basically, Cloudflare Workers allow for 10 milliseconds of CPU time on their free plan, and we typically resolve a search result in half of a millisecond.
So there is a lot of unused CPU time. We also close the connection even before the time limit is up.
— NearForm Open Source Releases (@NearFormOSS) August 22, 2022
That is some serious speed! Can you explain a little more about how you are able to achieve this on an edge network?
Absolutely, one thing to consider is that, Lyra running on an edge network will have low latency, which means that you’re basically connecting to a server that is physically closer to you, so there’s less travel for the requests to get to the server when compared to some of the other search engines available. And that’s because we, again, run on an edge network, so it’s faster by design.
Also, this single query that we run on a small data set is not very impressive, but, when we run a very large amount of queries on a very large dataset, this is where Lyra wins against everyone. And that’s because you are never running the same query on the same worker. That’s because, by design, an edge network will be responsible for taking charge of the request on a different worker every time.
So, there’s no concurrent access to server resources, which means that you always have the best performance ever, even with high spikes in traffic, your performance will remain untouched and unchanged.
Advantages of Lyra
That makes sense: faster results with low latency and individual workers for each query. I could see this being a game changer for a company that sells products online. What are some other advantages Lyra provides for a business that is building an application?
I would say saving on capacity when it comes to maintaining and deploying a full-text search engine. That’s one thing you want to keep in mind.
And also, you don’t have to worry about scaling when you have increased and unpredictable traffic. This is a common problem with more traditional search engines that are self-hosted. There are search engines that host the application for you, so you don’t have to worry about it, but that can become really costly, so it’s a trade-off.
With Lyra, it’s free and open source, you can basically use it, and deploy whatever you want with very little expense. It’s really cheap to run Lyra on AWS, on Netlify or Cloudflare, and you also save a lot of time that can be dedicated to building your application rather than maintaining your tech stack
And also, from a business perspective, you only have to learn Lyra once and you can use it everywhere. So again, if you deploy React Native applications, for example, you just use Lyra. You already know how it works. You already know how to interface with the search engine, and you have the same interface on the web.
You have the same interface from any system that wants to interact with Lyra. And Lyra runs everywhere, so you only learn once and use it everywhere. Apart from that, we have a lot of ideas on expanding Lyra and giving it extra features. So, there are many reasons why you may want to stay tuned on this one.
History of Lyra
That is fascinating, and very well articulated. So now that we’ve covered what it is, how it works and when to use it I’ve heard there is an interesting story behind how Lyra came to be. Can you tell us what made you dive into the world of full-text search engines?
Yeah. So, I started working on Lyra in July 2022, and it was for a talk at WeAreDevelopers World Congress, which is the biggest developer conference in the world with 8,000 developers altogether in Berlin. My talk got accepted, and I had to talk about how to build a full-text search engine from scratch in TypeScript.
I wanted to give a talk about algorithms in data structures, and full-text search engines are a beautiful starting point, if you want to learn how algorithms in data structure work. So, I built Lyra as a demo for my talk. And eventually, my manager told me, “Okay, you should definitely open source this. Why not? I mean, it’s working, so you can publish it.”
So I decided to publish it on npm. And when we hit our first stable release on the 2nd of August, if I remember correctly, in two days, we reached 2000 stars on GitHub, and in one week, we reached 3000 stars on GitHub, and since then, we have thousands of downloads on npm. So, I would say it’s getting a lot of traction.
We’ve already made community first releases, where the community is providing fixes, new features to Lyra, and we already made two or three different releases where the core team haven’t worked on any of the stuff we released, which is crazy. I mean for a young project, such as Lyra.
Well it sounds like you got a successful demo for your talk! So do you think that because you built this as an example to show in your talk rather than a tool for production that maybe you didn’t overanalyze it too much? Do you think that affected your approach to buiding it at all?
When I built it, I built it with speed in mind, and that’s because one of the goals I had for my talk was to demonstrate that there is no slow programming language, just bad algorithms and data structures design.
That’s Interesting. I know it’s already lightning fast, and I know you’re a performance guy, so are you already thinking of ways to making it faster?
Right now, I’m not thinking about making it faster, but I’m thinking of making it more optimized. One issue we have is that the data structure we are using to perform searches is quite optimized, but could be even more optimized. So, we are experimenting with different data structures to compress the index size even more, which means faster searches, faster deploys and less storage costs for your data.
So, I only see benefits except for the fact that it’s incredibly painful to develop such complicated data structures, that’s part of the game. And that’s something we definitely want to look into.
Get Involved with Lyra
It’s amazing to hear how it came about and see the traction it has already gained. With all of this attention, I imagine there are developers who are already jumping in to help expand it. How can somebody get involved with Lyra?
So in Lyra, we have issues that are marked as good first issues. We always try to make sure that there is time for people to find a good first issue and work on.
I recently sent out a Tweet that basically said, “Okay, I have this issue. I need it addressed before this evening. If there’s anyone interested in working on it. It’s easy. It could be a good way for you to onboard on the project. I will keep this open, so that the community can be involved in Lyra. If no one picks it, I will take it. But you have the opportunity”
Within 10 minutes we got a PR fixing the problem. Apart from that, Lyra has been built using very standard data structures and algorithms.
Right now it’s really easy for people to understand what’s going on by looking at the source code. And we documented the source code very well, so that everyone can start reading the code and contributing freely to Lyra.
So where can we learn more about Lyra?
Okay. I would say, you should definitely check out the website, github repo and follow Paolo (@p_insogna) and me (@MicheleRivaCode) on Twitter, because we are the people, right now, that are leading the efforts toward a better Lyra.
We are growing Lyra from the inside. We are part of the core team. The whole developer experience team at NearForm is actually involved in Lyra but Paolo and I are the main ones working on it.
Also follow NearForm (@NearForm, @NearFormOSS). Because NearForm is a source of news when it comes to the products we develop internally. And I am confident that in the near future there will be some very big news about Lyra, because we are investing a lot in it, and it shows.
We have a lot of ideas on how to extend the product even more. So, let’s keep in touch on Twitter.
I understand that it is a young project and you’ve explained very well, here, some different use cases and why it is so fast. What would you recommend for someone who is itching to move off one of the older search engines and would like to give Lyra a shot?
Okay. First of all these other search engines are older projects, which means that they are more mature and evolved. So, you can’t do everything you can do with them with Lyra today.
We are planning to support almost every feature that Elastic supports right now in the future. But as for now, it would be reckless to say that you can start using Lyra as a replacement.
There are specific use cases where you can already start using Lyra. We are building the most essential features that are missing such as a query engine, where you can actually query, for example, all the movie titles of movies from one date to another date, this is something you can’t do today. You can only search by text, not by dates, but this is something we will support, and we are working on. So, right now, I wouldn’t recommend making a switch, but we are working on providing these features, and once we have them, just do it. It’s not a problem.
You mentioned that you were planning on extending it and adding features to give it the capability to compete with some of the other search engines out there. Can you expand on that a little bit?
Sure! We are building a plugin system similar to how Fastify works, which means that we will keep the core as small as possible, and everyone can add their own features on top of it.
So, if you really need the feature, you can develop it yourself using our plugin system that we are building right now, which means that if you want to use it and you need a specific feature, you build it.
And if you want to, share it, otherwise, keep it private, it’s not a problem for us. We are working on a roadmap for supporting the essential features that will allow us to compete with other search engines.
Michele, this has been extremely interesting to sit and learn more about Lyra and why it is causing such a stir. I just have one more question. Where does the name come from?
Okay, yeah. That’s a cool story. So, Lyra is a constellation, and the idea of Lyra as a database is to be a distributed database, and a distributed system. When you look at it in a diagram, it looks like a constellation where each node is a star. So, Lyra, in its name has the distribution property intrinsic in the name itself. This is why I chose Lyra, because it’s a constellation of stars that gets deployed with every single deployment.