Simple Rate-Limiting for Netlify/Vercel Functions

Serverless Functions are definitely one of my favorite techs to play with. From contact forms to Slack bots, you can really build a lot of things with them.

If you are new to this concept, a serverless function is like an Express endpoint handler. It lives at a said endpoint (e.g. /api/hello-world), so it can be called. When called, it receives a request. Your job, as a developer, is to handle that request in order to send back a response.

You can check out this playground from Netlify to get a quick feel of what it's like to write one!

So, What's the Deal With Serverless Functions?

With serverless functions we can build APIs. We have one or a set of endpoints for our website to call. Although in reality, anyone can call them if they'd like to through a cURL, or a bunch of other programmatic ways. That's why with serverless functions, just like with any API, we want to introduce some sort of security layer. This layer helps us to address two things:

Usage of our service, understand under what terms our API can be used;
Cost management, especially with things that can scale on their own like serverless functions.

There are multiple locations where we can take action on those:

On the front-end itself: If applicable, validating forms there first can prevent our users from triggering our serverless functions with invalid data. We should consider at least making use of built-in form validation;
On everything between Internet and our functions: There's not much we can do here on Netlify or Vercel. However, when using serverless functions directly from their provider (AWS, Azure, GCP, etc.) there are usually some services or configurations we can fine-tune for that purpose;
On the function itself: Here we can roll a lot of magic, from authentication to rate-limiting~

In this post, we'll focus on rate-limiting our serverless functions with no overhead. Rate-limiting is about putting a limit on how many times a user can access a resource (a function) over a given period of time. For example, we can say that a user is only allowed to submit a contact form once every 10 minutes. It's a really handy method to prevent users from spamming our services.

Storing Things in Serverless Functions

Usually, when it comes to rate-limiting we want to do it before getting to our function. This addresses our two goals from above as the function won't be invoked at all when a user gets rate-limited. Sadly, as we saw earlier, that's not possible when on Netlify or Vercel. Instead, we have to do it on our function directly. Doing it on the function will still nicely address our usage of service goal. It will also help with cost management by reducing the function run time.

To have some rate-limiting going on we need a place where we can store a history of previous calls. This history will allow us to know when a user has hit its limit for the period, therefore allowing us to rate-limit them. As announced earlier we want to do it without involving other services, so we'll have to store this history on the function.

But we cannot store anything on a function? Or can we?

That's a nice question! By design, a function is not meant to store any data, although functions are run within what's called an instance. An instance is a runtime environment within which our function is being executed. The interesting thing with instances is that they persist through our function executions.

To understand that better, let's look at some code, shall we? We'll use Netlify/AWS function format for Node.js, but we'll have snippets for Netlify and Vercel when needed, no worries~

const foo = require("foo");

export.handler = async (event, context) => {
  await foo.bar();

  return { statusCode: 200 };
};

Here we have a dummy function. It imports a package foo, then its handler awaits for foo.bar to perform something. If successful it will return an empty 200 OK response, else if foo.bar throws it will return a 500 Internal Server Error. Lovely!

In fact, we can split this function in two parts. The first one is about everything outside our exported handler function. This part will only be run once: when our instance gets spawned (we refer to it as cold start). The second part is, as you guessed it, everything inside our exported handler function. This time everything in it will get to run on each function call.

Knowing that, let's do a little experiment:

let count = 0;

exports.handler = async (event, context) => {
  count++;

  return {
    statusCode: 200,
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ count })
  };
};

So... we have a counter that is initialized at 0 outside our handler. Inside the handler function, we increment it, and we respond with the counter value. Let's call our function once. What is the response that we get?

First call

{ "count": 1 }

Great, let's call it another time now:

Second call

{ "count": 2 }

Mindblowing! We managed to store something across multiple function calls.

Don't believe it? Try it yourself! I set up a demo on Netlify and on Vercel. If you're not alone calling these functions the counter might not be at 0, nonetheless you should see it being incremented refresh after refresh~

Now that we're aware of this serverless function hack, I need to put on a little disclaimer before going further:

The value of this counter lives at the instance level. Basically, we are using the instance memory (RAM) as a "database". This works well, however, we do not have control over should the instance be kept alive or terminated. That's to the cloud provider's will. To give us an idea, most cloud providers terminate function instances after 45-60 minutes of idling time. This can also happen a lot earlier in case the cloud provider prefers to do so to free up its resources. Once terminated, our little in-memory database gets wiped. Taking the above example, our counter will reset to 0.

That's not everything. Cloud providers can also spawn multiple instances of the same function. This means that distinct instances won't share the same in-memory database. Although, here again, cloud providers will try their best to optimize their resources. They will spawn multiple instances only when it's really needed. That's good for us as we can assume that in most cases there will only be one instance running.

Alright, having the above in mind, let's code something with it!

In-Memory Rate-Limiting

We'll have a look at two implementations. A basic one to understand how it is working, and a more advanced one using a little package designed for it. Again, what we will learn here will be valid for both Netlify and Vercel functions.

Basic Implementation

Let's consider a function that is in charge of handling contact form submissions:

functions/contact.js

exports.handler = async (event, context) => {
  // An hypothetical function we built
  await forwardContactFormToContactMail(event.body);

  return { statusCode: 200 };
};

For this basic implementation, we want to limit users to submitting at most one contact form every 10 minutes. To achieve that we need a way to recognize a user from one call to another. Because we aren't authenticating them we're not left with much other than using their IP address to do so. Thankfully this is something we have access to among headers of the request from the function providers. Here is how we can access the IP inside our handler:

// Getting client IP on Netlify...
const ip = event.headers["client-ip"]; // e.g. 198.51.100.32
// ...or on Vercel
const ip = req.headers["x-real-ip"]; // e.g. 198.51.100.32

Awesome! We now have a way to know who's calling. The next thing we're after is a history of whom called us so we know when someone is calling us too often. Here we will leverage our little in-memory database we discovered earlier to store when someone last called us:

functions/contact.js

const history = {};

exports.handler = async (event, context) => {
  history[event.headers["client-ip"]] = Date.now();

  // An hypothetical function we built
  await forwardContactFormToContactMail(event.body);

  return { statusCode: 200 };
};

And now that we are maintaining a history of who has been calling us, we can simply check if their last call is old enough before proceeding. If it is not old enough, we will return a standard 429 Too Many Requests response:

functions/contact.js

const history = {};

exports.handler = async (event, context) => {
  // If last call arrived later than "now - 10 minutes", rate-limit user
  if (history[event.headers["client-ip"]] > Date.now() - 10 * 60 * 1000) {
    return { statusCode: 429 };
  }
  history[event.headers["client-ip"]] = Date.now();

  // An hypothetical function we built
  await forwardContactFormToContactMail(event.body);

  return { statusCode: 200 };
};

And voilà! We managed to limit the ability of our users to submit at most one contact form every 10 minutes. Let's refactor our code a bit so it gets more readable and reusable:

functions/contact.js

const history = {};

const rateLimit = (ip, timeout = 60 * 1000) => {
  if (history[ip] > Date.now() - timeout) {
    throw new Error("Rate Limit Exceeded");
  }
  history[ip] = Date.now();
}

exports.handler = async (event, context) => {
  try {
    rateLimit(event.headers["client-ip"], 10 * 60 * 1000);
  } catch (error) {
    return { statusCode: 429 }; // Still returning a basic 429, but we could do anything~
  }

  // An hypothetical function we built
  await forwardContactFormToContactMail(event.body);

  return { statusCode: 200 };
};

And from here it's up to us to export both history and rateLimit to a file. Doing so will allow us to only import rateLimit and to use it on every handler we have.

Like with our previous segment I put online two examples of this implementation, one on Netlify, and one on Vercel. When accessing them for the first time you should see a "Hello World" with a 200 OK. Refreshing the link will throw you a 429 Too Many Requests until a minute has passed since your first call.

A small note on GDPR before we continue because we're relying on an IP address to recognize our users: For me it's an issue to store them in something as ephemeral as memory. If that's an issue for you, you can still hash the IP before storing it there.

Alright, moving on!

Using `lambda-rate-limiter`

So far we built a quite naive rate-limiting strategy. It works great with contact forms, but what if we are building a public service that provides information on $GME stock? Here it may be more convenient to allow our users to use our service up to 10 times per minute. We could say once every 6 seconds, but for many reasons it might be more convenient to just allow 10 calls every minute. This allows our users to manage how they are spending their limit of calls.

To achieve something like that let's install lambda-rate-limiter, a well-maintained package that does just that:

console

$ npm install lambda-rate-limiter
# yarn counterpart
$ yarn add lambda-rate-limiter

And now we can use it inside our handler:

functions/to-the-moon.js

const rateLimit = require("lambda-rate-limiter")({
  interval: 60 * 1000 // Our rate-limit interval, one minute
}).check;

exports.handler = async (event, context) => {
  try {
    // 10 stands for the maximum amount of requests allowed during the defined interval
    // rateLimit now returns a promise, let's await for it! (◕‿◕✿)
    await rateLimit(10, event.headers["client-ip"]);
  } catch (error) {
    return { statusCode: 429 }; // Still returning a basic 429, but we could do anything~
  }

  // An hypothetical function we built
  await getGmeStockInformation();

  return { statusCode: 200 };
};

That's already it! But for the sake of understanding what this new package is doing we will do something I really like doing: lurking at the GitHub repository source code~

This package's repository might look a bit daunting at first, but all its code lives inside a single file, within few lines, at src/index.js:

src/index.js

const LRU = require('lru-cache');

module.exports = (options) => {
  const tokenCache = new LRU({
    max: parseInt(options.uniqueTokenPerInterval || 500, 10),
    maxAge: parseInt(options.interval || 60000, 10)
  });

  return {
    check: (limit, token) => new Promise((resolve, reject) => {
      const tokenCount = tokenCache.get(token) || [0];
      if (tokenCount[0] === 0) {
        tokenCache.set(token, tokenCount);
      }
      tokenCount[0] += 1;
      return (tokenCount[0] <= parseInt(limit, 10) ? resolve : reject)(tokenCount[0]);
    })
  };
};

From that code, we can see two interesting things going on here:

The first one is that it uses a LRU cache instead of an object like we did. "LRU" stands for "Least Recently Used". It's a common caching strategy that will dump least recently used items from the cache when it has hit its size limit. What's interesting about having a proper caching algorithm is that it also gives us for free the ability to set the maxAge of items in the cache. With that we don't have to deal with intervals on our own anymore!
The second thing is inside the returned check function. Basically, it's doing a similar thing to what we did earlier about maintaining a history. Although it now also stores the number of calls received from a token (IP in our case). A subtlety here is that it's storing this number inside an array. This may look weird at first, but it's in fact a smart trick to copy a reference instead of a value. This allows the check function to edit the cached value without overriding the key with a new one, therefore not resetting its age.

We learned a few things peeking at this package's source code (at least I did)! On top of that, we're now able to apply some fancier rate-limiting on our function~

One last time, here are two examples with this implementation, one on Netlify, and one on Vercel. When accessing them you should see a "Hello World" with a 200 OK and your current usage rate. Hitting your rate limit of 5 per minute will throw you a 429 Too Many Requests until the interval has passed.

Going Beyond...

This is where the post would have got from advanced to too advanced haha. Congratulations if you made it until there! I hope it was clear and that you learned some interesting things~

As warned earlier this rate-limiting strategy has its pros and cons. It adds no dependency to our stack so, in my opinion, it's definitely worth implementing. However, at a high scale, it tends to behave more like a DDOS countermeasure than a proper rate-limiter due to the in-memory store we're relying on here. Implementing a more "enterprise-ready" rate-limiter for serverless functions is definitely doable on Netlify and Vercel though! It would involve a cache specialized database like Redis. Let me know if you'd be interested in a part 2 of this blog post where we dig into that option!

In the meantime, I'm more than happy to keep the discussion going on Twitter, thanks for reading!