
The disadvantage is that you need to have a duplicate of a large chunk of your database. The advantage is that your database is being hit (for the expensive ranking query) only once every half hour. Other less expensive queries, such as retrieving and displaying all the info from a specific post, or displaying the newest posts (as opposed to the best scored) could be done in SQL every time the relevant page is opened.

READIT NEWS SITE UPDATE
Then every half hour or so, you retrieve the most up to date information from the server, rank it, sort it, and update the data structure. Then rank and sort this data structure.Įvery time someone opens a page that shows the ranked posts, you just go to the data structure, retrieve the correct range of posts, and display them. One possible solution is to take all the relevant info from every post and store it in some data structure on the web server. Now, Reddit and Hacker News don't run their ranking algorithms in as SQL queries, but in python and ark respectively. Making it Easier to Participate in the Internet’s Best Conversations: Introducing New Ways to Share Reddit Content And Improved Embed Tooling for Publishers. Putting the algorithm in the SQL query fine on a smaller scale, but what if the website has a large number of users and a very large number of posts? That means that the every time any user opens a page that displays ranked posts, that query will be run. There are several similar questions on SO, but the only answer given is to put the ranking algorithm inside the SQL query. One thing that I could do is implement the algorithm straight in SQL, so that every time a user goes to a page displaying ranked posts, something like this would run: SELECT thing1, thing2 FROM table The algorithms themselves are simple enough, but I don't quite understand how they are used.


I've been looking at ranking algorithms recently, specifically those used by Reddit and Hacker News.
