Social Analytics Platform’s Leak Reveals Data Scraping

SafetyDetectives Cybersecurity Team SafetyDetectives Cybersecurity Team
Published on: October 21, 2021
Social Analytics Platform’s Leak Reveals Data Scraping

Intro

The Safety Detectives cybersecurity team, led by head researcher Anurag Sen, discovered an unsecured ElasticSearch server belonging to the social media analytics site IGBlade.com. The server contained scraped data on millions of social media profiles taken from Instagram and TikTok.

IGBlade collects data on social media users to provide its customers with “in-depth insights on any Instagram or TikTok account.” IGBlade’s server leaked over 2.6 million records of social user accounts, equating to 3.6+GB of data.

Among these records were screenshots and links to social profile pictures along with other forms of scraped personal data – a baffling discovery considering that data scraping is banned on most social media sites.

We do not know the reasons why IGBlade has scraped personal data, although, we must stress that all data on the database was publicly available.

The server’s content also points to a broader debate about the controversial uses of data scraping methods.

What is IGBlade?

IGBlade’s Instagram and TikTok analytics tool collects data from millions of social media accounts across 30+ data metrics. IGBlade then consolidates this information into a navigable social account search engine that shows information like follower growth, engagement rate, and account history.

Users must create an IGBlade account to receive detailed data insights, such as data visualizations, demographics stats, and account reports.

The scraped data of users on the server is the same data that features each user’s corresponding IGBlade.com page, and the database often provides links back to IGBlade.

This is how we know the database belongs to IGBlade.com. You can see evidence of links to IGBlade in the screenshot below.

What is IGBlade?

Kim Kardashian’s Instagram information plus a link containing ‘IGBlade’

What was leaked?

IGBlade’s ElasticSearch server was left publicly exposed without any password protection or encryption in place. As a result, IGBlade’s database leaked more than 2.6 million records, equating to 3.6+GB of data, and these files provide evidence of public data scraping on Instagram and TikTok.

Specifically, IGBlade’s server contained different types of personal data for social account users:

  • Full names
  • Usernames; such as Instagram/TikTok handles
  • Profile pictures; stored as screenshots or “photo links” on IGBlade
  • “About” information; i.e. each users’ “bio”
  • Email addresses
  • Phone numbers (in some cases); only when numbers feature on profiles
  • Location data; such as country of residence and locality (if set public)

Various other forms of user data could be seen on the server too, including:

  • Media counts; i.e. the number of photos/videos posted on accounts
  • Follower counts & following counts
  • Engagement rate metrics; for posts on user accounts

IGBlade’s server was live and being updated at the time of discovery. The size of IGBlade’s breach suggests more than 2 million social media users could be immediately affected by the leaked content of the server.

We found several examples of high-profile accounts on the server too. Prominent influencers, such as food bloggers, celebrities, and social media influencers all featured.

Public forms of data for huge verified celebrity accounts, such as Alicia Keys, Ariana Grande, Kim Kardashian, Kylie Jenner, and Loren Gray, had all been scraped and stored on IGBlade’s open ElasticSearch server.

You can see evidence of cached profile picture screenshots, screenshot links (which lead to profile images), and other personal data sets for various famous Instagram and TikTok accounts in the images below. Phone numbers feature at times too, specifically in cases where the number is mentioned on the scraped user’s profile.

What was leaked?

Screenshots of profile pictures featured on the database.

What was leaked?

Loren Gray’s business number & photo link scraped from Instagram.

What was leaked?

A link to Arianna Grande’s TikTok profile picture.

The server’s massive logs contain data for millions of social media accounts. You can see evidence of the server’s size and document count in the following screenshot.

What was leaked?

2.6+ million records/3.6+GB of data features on the server.

IGBlade’s ElasticSearch did not have authentication security features in place, leaving information available to anyone who found the server.

You can find a full breakdown of the size, scale, and location of IGBlade’s data breach in the table below.

Number of records leaked 2.6+ million
Number of affected users 2.6+ million
Size of breach 3.6+GB of data
Server location Canada
Company location Romania

The Safety Detectives cybersecurity team found IGBlade’s open ElasticSearch server on June 20th, 2021, though, the server’s content had apparently been exposed on the internet since May 31st, 2021.

We reached out to IGBlade on July 5th, 2021. IGBlade responded quickly following the disclosure process, and IGBlade’s database was secured on the same day.

Why Do People Use Social Scraping Tools?

Primarily, marketers and businesses use social analytics tools like IGBlade for advertising purposes.

Data scraping, more generally, allows companies and individuals to scale their success, as users can collect enough data insights to plan an effective marketing strategy.

Influencer marketers and social media managers benefit most from social media analytics tools like IGBlade, given each profession’s reliance on social media trends.

Companies also collect follower demographics data, growth data, and engagement data to monitor (and improve) the social media performance of their own corporate accounts/sites.

Hackers misuse data scraping methods to conduct cyberattacks on a mass scale.

While all of the information on IGBlade is publicly available, placing scraped personal data onto a single interface is dangerous. Hackers can instantly access user photos, contact details, and location data, opening the door to mass-scale social engineering attacks, fraudulent schemes, and fake accounts.

Data scraping directly violates Instagram and TikTok’s on-site policy and could needlessly place social media users in danger of cyberattacks.

Data Scraping Impact

The content of IGBlade’s ElasticSearch server could significantly impact both the company and the social media users it tracks.

Impact on IGBlade

Data scraping publicly available information online is not illegal and data scrapers do not face legal sanctions or punishments for their practices.

However, data scraping is not allowed on TikTok or Instagram.

Instagram’s terms of service state: “You must not crawl, scrape, or otherwise cache any content from Instagram including but not limited to user profiles and photos.”

TikTok’s terms of service also ban the process of “screen scraping.”

TikTok states: “[users may not] use any automated system or software, whether operated by a third party or otherwise, to extract any data from the Service for commercial purposes (“screen scraping”).”

Ultimately, these violations could land IGBlade in big trouble with Instagram and TikTok. Both sites could move to ban IGBlade from their services.

IGBlade’s business model relies on access to these social media sites. Therefore, a ban could disrupt IGBlade’s business operations, with profits arresting and users leaving the service should IGBlade fail to deliver value to its customers.

Impact on End Users

Those featured on the exposed database, along with other social media users, could face damaging impacts from the IGBlade server’s leak.

IGBlade placed numerous forms of publicly available personal data in one server, exposed to the potential threat of hackers and cybercriminals.

IGBlade’s server contains contact information, location data, profile images, and other forms of publicly available personal information that could aid hackers in several mass-send cybercrimes.

Instantly accessible contact details could allow hackers to adopt malicious social engineering attacks, such as bulk mailing phishing campaigns.

Hackers could quickly gather thousands of email addresses on IGBlade’s server. These cybercriminals could send phishing emails to every leaked account with contact details, attempting to coerce users into clicking a link or revealing sensitive personal information.

Phishers may even refer to other forms of personal data to build trust with the recipient.

Malicious files could infect the device of any user who clicks a phishing link, aiding cybercriminals in further crimes.

Mass robocalling scams are also possible due to the vast collection of contact details stored in the exposed database.

Robocalls may attempt to pose in an official capacity (e.g., the user’s bank) to commit fraud or to coerce other forms of personal data from the victim. A robocall may attempt to convince users their bank account is disabled, for example, or that their identity has been stolen.

Speaking of which, the server’s content also facilitates the creation of fake accounts.

Hackers could use the collection of account photos and information to set up thousands of fake/bot accounts quickly, imitating social media users’ profiles.

These accounts could lure in followers, spreading misinformation, and coercing users into other scams or phishing attacks.

Spam marketing campaigns are a possibility for hackers should they have accessed the server’s content, too, and hackers could even use leaked profile links/pictures to train AI facial recognition algorithms.

Is Data Scraping Okay?

The debate around data scraping and, in particular, social media data scraping has been a topic of conversation for some time now. On one side, critics feel the practice is dangerous for users while, on the other side, data brokers argue that public scraping is fine and perfectly legal.

The issue many people have with the practice revolves around the misuse of data scraping methods.

Cybercriminals can, unfortunately, enjoy all of the same benefits from scraping data as marketers or businesses. Data scraping amalgamates data sets from more than one of each user’s pages/social media accounts into a single server or platform. This means cybercriminals can enjoy fast navigation between user data in a single view without having to trawl through several internet pages.

Data scraping can also make information for thousands of users instantly accessible, as it’s all stored in the same place. Navigating logs in a database is a far quicker solution than navigating between each user on a social media site.

In this case, cybercriminals can use data scraping as a “cybercrime accelerant” rather than an “enabler.” Data scraping can accelerate the speed and scope of hackers’ criminal activities.

Criminal misuse is likely a reason many social media sites have banned public data scraping on their platforms. There also remains the fact social media users cannot code their page to prevent/prohibit data scraping bots.

People will continue to debate this topic as long as companies persistently scrape public data.

For many, two questions remain: Should social media sites be doing more to stop data scraping? And, in certain contexts, should public data scraping be legal in the first place?

Preventing Data Exposure

Social media data scraping is not a typical data exposure. Worse still, data scraping is fairly unavoidable in most cases.

However, there are a few things you can do to limit data scraping, and your exposure to data scraping servers and aggregated databases:

  • Check your privacy settings on social media. That means setting your profiles to “private” so only friends and trusted people can view your information and content.
  • Delete/block unknown users. Unknown friends and followers should be removed from your account and blocked. Users should also block any accounts sending suspicious messages – anonymous users could be scraping your account.
  • Screen new follower/friend requests. Most social platforms will send requests when someone wants to connect with your account. Deny the connection request if you don’t know the person connecting or if there’s anything suspicious about the account.
  • Limit the information you add/post to your account. As a final precaution, users should limit the details they provide on social accounts. That means less-detailed “about” info while posting other personal data (such as your address) should be avoided. Users should also keep banking and health information away from social media.

About us

SafetyDetectives.com is the world’s largest antivirus review website.

The SafetyDetectives research lab is a pro bono service that aims to help the online community defend itself against cyber threats while educating organizations on how to protect their users’ data. The overarching purpose of our web mapping project is to help make the internet a safer place for all users.

Our previous reports have brought multiple high-profile vulnerabilities and data leaks to light, including some 200+ million users exposed by Chinese social media management company Socialarks, as well as a database that leaked millions of records detailing an Amazon fake reviews scam.

For a full review of SafetyDetectives cybersecurity reporting over the past 3 years, follow SafetyDetectives Cybersecurity Team.

About the Author

SafetyDetectives Cybersecurity Team
SafetyDetectives Cybersecurity Team
SafetyDetectives Cybersecurity Team
Published on: October 21, 2021

About the Author

The SafetyDetectives research lab is a pro bono service that aims to help the online community defend itself against cyber threats while educating organizations on how to protect their users’ data. The overarching purpose of our web mapping project is to help make the internet a safer place for all users