Aoimirai - How the K-Pop bots work

To keep the site updated and all data flowing, Aoimirai employs three bots

The first (main) bot runs almost every minute and fetches views and likes for Youtube videos. It cannot be faster because Youtube detects a fast bot and throttles it, causing the bot to basically be useless until Youtube re-allows it. The current rate is 30~60 hits per hour.

The bot have a layered priority list, with recent videos being updated every 12 hours, then different styles having an allotted time to be updated (to guarantee they are always updated), with these styles being the top videos on male groups, female groups, male solos, female solos, coed groups and videos with high daily views. Once every priority video allotted time is run (which means not all video of each category might be updated in each 24 hour cycle), then the least updated videos are updated (which are usually old, low daily view videos). The daily view counter is calculated from the difference of views between two updates, so even videos that are updated each 12 hours, or videos updated each 3 days, have a daily view value. With the current priority system, it takes about 4 days to all videos be updated, while it would normally take only 3 days if there were no priority lists, but it would mean even high demand or daily view videos to take too long to be updated.

This bot also reports problems fetching data, which sometimes means a video were deleted.

The second bot is the followers bot, it fetches the number of followers each artist have on vLive, Youtube, Facebook, Twitter and Instagram. Since it requires more traffic per run, this but runs each 30 minutes (but fetches all medias) and can also report problems with any attempt. With this rate, it takes a little over a week to update all followers, but that is not a problem because some medias do not report followers in fine detail (Facebook reports, for instance, in round thousands or millions). 

The third most complex bot composes the sales from each artist based on GAON monthly and GAON yearly, and you can see a full explanation on the following video:


Contrary to the other bots, this is run manually once each month, when the GAON montly sales report is releases, or every time an update on the bot is done. It deletes all sales data and restarts from January 2011 adding sales for each artist, and each album. You can see the result by listing by artists on the K-pop Youtube list, opening an artist tab and clicking "check sales composition".

For sales prior to 2011, I gathered data from multiple sources ranging from archived MIAK reports, to Wikipedia and Fansites. This data is pre-defined on the artist and doesn't change, so its only added at the end of the full run as a constant.

As a rule of thumb, all sales are higher than displayed for two reasons. One, GAON monthly displays only the top 100, which means anything that sold less than about 700 copies is not accounted for unless we can get a "career total" from GAON (and even when they do release these, they don't break up per album so we won't really know where the difference comes from), second, because we don't have sales since the last full month, nor the corrected figures since the last YEAR. Since GAON monthly cannot track (nor reports) changes in the sales (returns, updates, corrections) we need to wait the Yearly report to check those, and only for the top 100 albums.

I am sorry that these numbers cannot be more accurate, but to be able to do that, I would need to have direct access to GAON internal numbers, and seeing how inaccurate even the reports we have are, I can't be sure that would fix too much. GAON's reports are not coming from a centralized database (like Aoimirai), which shows how bad their organization is: Artists names and Album names can change from report to report, sometimes inside the same report, and they also don't make their mind on how to show western names with hangul names. Sometimes they report an artist sales under the stylized name, others the hangul name, others the western name, others the stylized name and western in parenthesis, its a mess. The bot currently have all the different ways GAON used to address each artist UP TO DATE but I still have to check every report, every month, to see if they didn't change something. Add to this the fact they can change the name of albums between reports and you will find artists that have the same album listed under different names.

Finally, so far Aoimirai does not track download music. I plan on having it eventually (it is almost just a small tweak on the third bot) but given the hate I received from discrepancies on the physical sales data because of problems that were never mine to begin with, I wonder if its worth the trouble just to get my head bashed. Regardless, no matter how accurately I process download sales, it will be LESS accurate than physical sales because since GAON only reports, again, the top 100, there will be more musics left out than the number of albums left out.

As a side note, To help me detect how GAON changes artists names, I have a full list of every artist to ever feature on a GAON monthly, and I have to say, its huge. Since Aoimirai/kpop is a YOUTUBE focused list, I don't plan on adding them anytime soon, but with time I will check one by one to see if they have enough MV's to justify adding them up. Also, new or small groups that have little views/sales are left out on purpose to reduce the load on the bots. 

Bottom line is: All this work was done by one person, allowing people access to this huge database of videos, artists, followers and sales, for free, with the possibility to download the full database or use API to put a custom list on your site, and all I get is hate and enraged fans pissed that I left out this group or the sales number is slightly wrong (to no fault of my own), so please don't come at me with weapons because all I am doing is providing people with some fun source of data, and I would rather get people who are willing to help make this a even more interesting resource, than another death treat (yes, this is figurative).

Ads by google (click here to hide, consider tipping me to maintain the site)
This year donations: $0(not updated instantly), 2018 donations: $36, Server cost yearly: $180