Building a Leetcode Contest Percentile Analyser

Motivation for doing this

I got into practising DSAs daily back in 2020 when I was preparing to job hunt -- shortly after resigning from my previous job with nothing squared up.

I went through Stanford's course on DSAs on coursera and loved every bit of it. Then I walked through about 3 personally picked problems each day for maybe a month or two, keeping problem notes on Github, and not really tracking any metrics but focusing on truly understanding what I was doing.

However the first time I put that practice to the test with a Microsoft code assessment I came in at the bottom 36^th percentile. It was really disheartening looking at those problems I could solve, especially the graph problems, but knowing I wasn't moving fast enough to solve them optimally and correctly.

Anyway, shortly after that I got an offer for a fully remote role which I took so I never really bothered to reflect and improve on my practice process. Until now.

https://twitter.com/dozieokk/status/1479391181683539973?s=20

Since my tweet I've taken part in one contest and have been wondering how best to get back into practice. Do I refresh my theory first? Which platform should I use? What language do I use? e.t.c e.t.c. But most importantly, HOW do I practice -- what do I track?

Well yesterday I finalized on tracking speed of completion with the goal being to get into the top 10 percent (90th percentile) on Leetcode contests. And as with any competitive sport I need to constantly know where I stand against other players so I can appropriately track progress.

Unfortunately leetcode holds contests at most only once a week, so I decided to use past contests problems, time myself, and see where I would have placed based on my completion time if I had taken part.

Why not just do this manually?

Leetcode already ranks competitors, and it'd be super easy to just find out what rank falls within the top 10% within each competition and look at how long that person took to compare against my completion time. But for at least once in my life I wanted to be like every developer ever:

Spend 10 hours automating a task that will take 1 min to do manually.

And very importantly write about it 🤡 . (I recently joined an accountability group where we each have to write 1000 words a day and I need stuff to write about 😭 😭 )

Initial attempt

Research Findings

Leetcode returns contest data in pages of 25 results each. Getting all this data requires paginating through hundreds of pages.

I noticed the get request they use to get this page data doesn't require authentication, but it does have pagination enforced, so I decided I'd start with simply pulling in data from all pages of a contest.

Package Structure

I opted to create a simple yarn package and pull in axios and typescript, while writing my output to a local file.

Build and run process

I initially ran the script using two commands:

> yarn tsc app.ts
> node app.js <file-name>

but I quickly got tired of that and combined these into one yarn start command:

...
  "scripts": {
    "compile": "tsc app.ts",
    "start": "yarn compile && node app.js"
  },
...

So now I just run yarn start.

First draft of data fetching

The idea here was simple: while we haven't found a user who didn't make a submission, go through each contest page starting from the first one and push to an array containing all user's records with the records found on each new page.

const axios = require('axios');
const fs = require('fs');

interface ranker {
  rank: string;
  finish_time: number;
}
const competitors: ranker[] = [];

async function getUsers(url: string): Promise<ranker[]> {
  try {
    const response = (await axios.get(url)) as { data: { total_rank: ranker[] } };
    return response && response.data && response.data.total_rank;
  } catch (error) {
    console.error(error);
  }
}

async function buildCompetitors() {
  let res: ranker[];
  let page = 0;
  do {
    res = await getUsers(
      `https://leetcode.com/contest/api/ranking/weekly-contest-276/?pagination=${page}&region=global`,
    );
    competitors.push(...res);
    console.log(`https://leetcode.com/contest/api/ranking/weekly-contest-276/?pagination=${page}&region=global`);
    console.log(process.argv[2]);
    fs.writeFile(process.argv[2], JSON.stringify(competitors), (err) => {
      if (err) throw err;
      console.log('file saved');
    });
    page++;
  } while (res.find((val) => val.finish_time === 0) === undefined);
}

async function main() {
  await buildCompetitors();
}

main();

I let this run and went to bed, then came back to an error informing me that finish_time never gets to zero.

Okay fine, the error wasn't THAT polite:

(node:52369) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'find' of undefined
    at /Users/nnadozieokeke/leetcode-analysis/app.js:82:29
    at step (/Users/nnadozieokeke/leetcode-analysis/app.js:32:23)
    at Object.next (/Users/nnadozieokeke/leetcode-analysis/app.js:13:53)
...

but looking at it, it's really straightforward to see that res is only undefined if we've exhausted all possible objects, therefore the while condition is never true.

Figuring out why finish_time is never 0

Initially, when inspecting Leetcode's network requests I noticed the objects in each page had a finish_time property, and I just assumed this will tend to 0 and that partly informed my initial approach. But as I just stated above, it doesn't.

And try as I might, I couldn't figure out how to parse a single finish_time value to the table's corresponding Finish Time. Curious to know if anyone figures this out (Please comment). One thing I noticed is that the finish_time values increment as though counting seconds and this helped me change tact.

Rather than decode the time format Leetcode uses, I decided I'd find the value corresponding to a Finish Time of 00:00:00, then use that to convert other values to HH:MM:SS format by simply finding the difference in values.

Finding the finish_time value corresponding to 00:00:00

This value is typically on the first page with an object with score value === 0. So this reduces the problem to a search problem: find the first page with a score value of 0.

Of course we could search each page one at a time, but anyone into DSAs knows there're faster non O(n) algorithms for this problem, and I chose to use a recursive divide and conquer approach.

async function findLastPage(page = 0, step = 100): Promise<number> {
  let res: ranker[];
  page = page === 0 ? step : page;
  if (step === 0) return page + 1;
  console.log(`Searching from page: ${page}`, `In steps of: ${step}`);

  while ((res && res.find((val) => val.score === 0)) === undefined) {
    res = await getCompetitors(
      `https://leetcode.com/contest/api/ranking/biweekly-contest-69/?pagination=${page}&region=global`,
    );
    page += step;
  }

  return findLastPage(page - step * 2, Math.trunc(step / 2));
}

Using this, a search for biweekly-contest-69 gives a last page of 397 which is correct. Logs:

$ tsc app.ts
> Searching from page: 100 In steps of: 100
> Searching from page: 300 In steps of: 50
> Searching from page: 350 In steps of: 25
> Searching from page: 375 In steps of: 12
> Searching from page: 387 In steps of: 6
> Searching from page: 393 In steps of: 3
> Searching from page: 396 In steps of: 1
> 398

To be honest, I'm not sure if it's always correct because I really just went with the flow using feedback from my logs, but spot checks against 5 contests found the right page so I'm going with it.

Completing the percentile analyser

Alright, at this point with a working search function I felt I had the main tool I needed to get the insight I was looking for. In pseudo code here's the plan I followed:

Find the page with the first score of zero ( I assume all people who didn't hand in a solution with scores of 0 did not participate)
Retrieve the last place user's object to get the rank number and base finish_time. Here's what that looks like:

User in last place: { contest_id: 567,
  username: '_yash1_',
  username_color: null,
  user_badge: null,
  user_slug: '_yash1_',
  country_code: '',
  country_name: null,
  rank: 8633,
  score: 0,
  finish_time: 1632018600,
  global_ranking: 0,
  data_region: 'US' }

I use the finish time here as the reference point for the competition, and find the difference of all other times from this point to determine how long other users took.

I also use their rank to calculate on demand what ranks will fall within the top 10, 20, 30th... percentiles.

When I know what rank to look for depending on the percentile I'm interested in, I do another search for the object containing that rank.
Once that object is found, based on the time the user took, I'm able to tell what finish_time would have been required to fall within that rank's percentile.

The final code includes a bit more functionality, and it's available on github for anyone to play around with. I thought about adding a UI but tbh, I really just want to focus on increasing my percentile ranking right now so I've shelved that idea for now.

Thoughts and Optimizations

First thought is PLEASE DON'T judge my spaghetti code as a reflection of how I code for prod 😭 😭.

In terms of optimizations, it'll be much faster to scrape contest data and cache it for O(1) look ups. If I do create a UI this is the approach I'll take. Of course that'll require jobs that periodically scrape the leetcode site each time a contest is out. Unfortunately this added complexity is the reason I've shelved putting up a UI.

Other thoughts: I wish leetcode made this data available as an api for free -- I really only see it benefitting their community if developers can create on top of what they've built.

Repo Link

Leetcode Percentile Analyzer