Live Proxies

How to Scrape Instagram in 2025: A Complete Guide

Learn how to scrape Instagram profiles, posts & emails in 2025 using Python. Includes legal tips, API methods & real code examples for safe extraction.

How to Scrape Instagram in 2025
Live Proxies

Live Proxies Editorial Team

Content Manager

How To

14 May 2025

Imagine gathering real-time data from Instagram, extracting everything from user profiles and posts to follower counts and public emails with just a few lines of Python code. In 2025, Instagram scraping has become essential for marketers, brands, and researchers seeking to harness data for audience insights and competitive analysis. With over 2 billion monthly active users (Statista, 2023), Instagram offers a wealth of information, making it a goldmine for data-driven strategies.

In this tutorial, you'll learn how to scrape Instagram data efficiently and ethically using Python. We’ll cover the legal aspects, setting up your environment, the tools and technologies available, and practical code examples that demonstrate real-world use cases for scraping Instagram followers, posts, and even emails from Instagram.

Is It Legal to Scrape Instagram Data?

Yes, you can scrape Instagram data, but only if you adhere to legal and ethical guidelines. Instagram’s terms of service strictly prohibit the extraction of private data or the bypassing of security measures. Always ensure you’re scraping only public information and comply with relevant laws like GDPR and CCPA.

For example, if you're extracting publicly available follower counts or profile bios for market analysis, you must avoid capturing sensitive or private information. The focus should be on ethical data collection to support legitimate business intelligence efforts.

Setting Up Your Instagram Scraping Environment

A robust development environment is the first step toward successful Instagram scraping. This section will guide you through installing Python, the required libraries, and configuring your IDE for optimal productivity.

Installing Python and Required Libraries

To get started, install Python (version 3.8 or above is recommended) along with essential packages such as Requests and lxml. These libraries will help you send HTTP requests and parse HTML content from Instagram.

# Install the required Python libraries
pip install requests lxml

These commands establish the tools needed to scrape Instagram data effectively.

Configuring Your Development Environment

For a smooth coding experience, set up an IDE like VS Code or PyCharm. These environments provide helpful features such as syntax highlighting, debugging, and version control integration. Alternatively, Jupyter Notebook offers an interactive way to test your scraping scripts one cell at a time, which is especially useful during the development and debugging phases.

Tools and Technologies for Instagram Scraping

There are various tools and frameworks available for Instagram scraping, each suited to different needs:

Node.js with Puppeteer or Playwright

For scraping JavaScript-heavy sites like Instagram, Node.js libraries such as Puppeteer and Playwright are ideal. They allow automated browsing with headless browsers and can simulate user interactions effectively. Compared to Selenium, these tools offer better performance and flexibility, especially for dynamic content.

PHP with cURL or Goutte

For simpler tasks, PHP-based solutions using cURL or the Goutte library can handle basic scraping needs. However, they are less effective when dealing with JavaScript-rendered content.

Scrapy (Python Framework)

Scrapy is a powerful framework for large-scale web scraping that includes robust request handling and data storage features. It is particularly useful when you need to scrape large volumes of structured data from Instagram.

Selenium (Python, Java, JavaScript)

Selenium automates browser interactions, making it useful for scraping dynamic content. Although it is slower compared to Puppeteer and Playwright, Selenium is a reliable option for beginners or for projects requiring extensive browser automation.

Go (Colly Library)

The Colly library in Go offers high-performance scraping with efficient concurrent requests, making it a good option for large-scale data extraction.

Ruby (Nokogiri)

Nokogiri is a lightweight Ruby library for extracting data from static web pages. It’s ideal for simpler scraping tasks that do not require handling dynamic content.

R (rvest Package)

For data analysis projects, R’s rvest package can be used for web scraping, offering an easy-to-use interface for extracting and cleaning data.

Cloud-Based Scraping APIs

Services like ScraperAPI, Zyte, Apify, and Phantombuster offer ready-made scraping solutions that can bypass Instagram’s anti-scraping mechanisms. These platforms are especially useful for scaling operations without having to manage anti-bot tactics. Most of them come with built-in CAPTCHA handling and headless browser support for dynamic content.

Proxies for Instagram Scraping

Separately, using a dedicated proxy service is a powerful option whether you’re managing your own scrapers or integrating with cloud APIs. Tools like Live Proxies provide rotating residential or mobile IPs, helping prevent IP bans, bypass rate limits, and maintain session consistency. They’re especially effective when scraping directly.

How to Scrape Instagram Profiles

Scraping Instagram profiles in 2025 is more efficient, especially if you leverage Instagram’s backend API endpoints instead of parsing raw HTML. By inspecting network activity using browser DevTools, we can discover the exact request Instagram’s web app makes when loading user profile data. This gives us a structured JSON response perfect for clean data extraction.

How to scrape Instagram profiles

Discovering Instagram’s Web API Using DevTools

Open Chrome and go to an Instagram profile like https://www.instagram.com/justinbieber/ Open DevTools (F12 or right-click → Inspect), then go to the Network tab. Refresh the page and filter requests by "XHR". You’ll find a request made to:

Find the network request that contains the required data in its response.

Find the network request

Determine which headers are necessary for the request to work correctly.

Determine which headers are necessary

Determine the exact payload needed to successfully send the request.

Determine the exact payload needed

https://www.instagram.com/api/v1/users/web_profile_info/?username=justinbieber

This endpoint returns a structured JSON object containing profile details like full name, biography, follower counts, and bio links.

Fetching and Parsing JSON Content

Below is an example of how to scrape Instagram profile data using Python, Requests. This code fetches the HTML and extracts the username and bio from a public Instagram profile.

Here’s a fully working example in Python using the same API endpoint discovered through DevTools:

import requests

# Headers to simulate a real browser request
headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Not(A:Brand";v="99", "Google Chrome";v="133", "Chromium";v="133"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36',
    'x-ig-app-id': '936619743392459',
    'x-requested-with': 'XMLHttpRequest',
}

# Set the username to scrape
params = {'username': 'justinbieber'}

# Send a GET request to Instagram's API endpoint
response = 
requests.get(
    'https://www.instagram.com/api/v1/users/web_profile_info/',
    params=params,
    headers=headers
)
response.raise_for_status()

# Parse the JSON response
profile = response.json()['data']['user']

# Extract and print profile details
print("Full Name:", profile['full_name'])
print("Biography:", profile['biography'])
print("Bio Link:", profile['bio_links'][0] if profile['bio_links'] else "None")
print("Followers:", profile['edge_followed_by']['count'])
print("Following:", profile['edge_follow']['count'])

This script scrapes structured profile data without needing Selenium or HTML parsing, faster and more stable.

Script scrapes

Extracting Public Email Addresses from Bio

While scraping Instagram profiles, one valuable data point when publicly available is the email address listed in the user’s bio. Many creators, influencers, and small businesses include their contact details for collaboration or inquiries. You can extract this using a simple regex pattern on the biography field from the API response.

business_email = profile['business_email']
print(business_email)

How to Scrape Instagram Posts

Instagram posts are a goldmine for marketers, researchers, and social media analysts. Thanks to DevTools and Instagram’s internal GraphQL API, you can extract post metadata directly from the backend, rather than scraping the visible page.

How to scrape Instagram posts

Discovering Post API Using DevTools

Navigate to an Instagram post (e.g., https://www.instagram.com/p/DH2aXcbMzcJ/), open DevTools > Network, and look for graphql/query. The request will contain a doc_id and a JSON-like variables field in the POST body, which includes the shortcode (unique post identifier).

Locate the network request that contains the necessary data in its response.

Locate the network request

Identify which headers are necessary for the request to work correctly.

Identify which headers are necessary for the request

Determine the precise payload needed to successfully send the request.

Try to exclude tokens and unnecessary fields, and focus only on the essential parameters. This can be achieved through trial-and-error testing.

Determine the precise payload needed

Extracting Post Data

Here’s how to structure the request to fetch post data directly:

import requests 

# Define headers to mimic Instagram web traffic
headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Not(A:Brand";v="99", "Google Chrome";v="133", "Chromium";v="133"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36',
    'x-ig-app-id': '936619743392459',
    'x-requested-with': 'XMLHttpRequest',
}

# POST data structure required by Instagram GraphQL API
data = {
    'fb_api_caller_class': 'RelayModern',
    'fb_api_req_friendly_name': 'PolarisPostActionLoadPostQueryQuery',
    'variables': '{"shortcode":"DIjyDIgx1v4","fetch_tagged_user_count":null,"hoisted_comment_id":null,"hoisted_reply_id":null}',
    'server_timestamps': 'true',
    'doc_id': '8845758582119845',  # May change over time
}

# Send the POST request to Instagram's GraphQL API
response = requests.post('https://www.instagram.com/graphql/query', headers=headers, data=data)
response.raise_for_status()

# Parse and print part of the JSON response
post_data = response.json()
media = post_data["data"]["xdt_shortcode_media"]

# Basic info
shortcode = media.get("shortcode")
display_url = media.get("display_url")
is_video = media.get("is_video")
accessibility_caption = media.get("accessibility_caption")

# Owner info
owner = media.get("owner", {})
owner_username = owner.get("username")
owner_full_name = owner.get("full_name")

# Top comment (if exists)
comments = media.get("edge_media_to_parent_comment", {}).get("edges", [])
top_comment = comments[0]["node"]["text"] if comments else None

# Print or store result
print("Shortcode:", shortcode)
print("Image URL:", display_url)
print("Is Video:", is_video)
print("Alt Text:", accessibility_caption)
print("Owner:", owner_username, "-", owner_full_name)
print("Top Comment:", top_comment)

Extracting post data

What You Get in the Response

This API returns a rich JSON payload including:

  1. Post caption and description.
  2. Image or video URL.
  3. Number of likes and comments.
  4. Commenter usernames and timestamps.

You can adapt this to extract trends, engagement stats, or content for analytics dashboards.

Handling Pagination

For Instagram posts, handling pagination or infinite scroll is essential. Strategies include simulating scroll events using Selenium or headless browsers and retrieving the dynamically loaded content.

Business Insight: This data can be used to analyze trending hashtags and content performance, which is vital for strategic marketing.

How to Scrape Instagram Followers (Informational)

This section is for informational purposes only. Scraping follower lists or accessing follower-specific data from Instagram typically requires login-based access or interactions with private APIs. These actions are against Instagram’s Terms of Service and should not be implemented unless you are using Instagram's official APIs with appropriate permissions.

Why Is Follower Data Valuable?

Marketers and analysts often use follower counts and trends to assess influence and audience reach. However, this tutorial strictly focuses on public data scraping, which does not include scraping the list of followers or their details.

Best Practice Alternative:

If you're looking to analyze audience metrics, use publicly visible follower counts (available in profile metadata), or consider Instagram’s Graph API under approved access through Facebook for Business.

How to Scrape Emails from Instagram (Informational)

This section discusses publicly listed email extraction only. Attempting to extract or infer hidden, private, or secured email addresses, especially those behind forms or buttons, is a direct violation of Instagram’s Terms of Service and may breach data privacy regulations like GDPR and CCPA.

Public Email Detection:

In some cases, creators or businesses list their emails directly in their bio. This is public information and can be extracted. We showed an example of this in the Profile Scraping section using the biography field.

What Not to Do:

  • Do not attempt to extract emails from login-only pages.
  • Do not automate clicking “Email” buttons within the Instagram app.
  • Do not scrape emails via GraphQL endpoints requiring authentication.

Conclusion

In this guide, you learned how to effectively and ethically perform Instagram scraping in 2025 using browser DevTools, backend API requests, and Python code. From extracting profile metadata to gathering public post content, each method outlined focuses strictly on public data that is freely available to any visitor of the platform.

Important: All scraping techniques discussed here:

  1. Do not require log in or session authentication.
  2. Do not extract private or user-restricted information.
  3. Adhere to Instagram’s Terms of Service.
  4. Respect user privacy and global data protection regulations (like GDPR and CCPA).

By focusing on public-facing data, businesses, and researchers can gain meaningful insights while maintaining ethical standards and platform compliance.

As scraping techniques and platform protections evolve, always ensure you stay updated on current legal and ethical boundaries. Responsible scraping is not just good practice, it's a smart strategy.

References

To ensure you're building compliant and efficient scraping tools, it's important to rely on trusted, official sources. Below are key references used or recommended throughout this tutorial:

Instagram Terms of Use

https://help.instagram.com/581066165581870 Outlines the acceptable use policies and restrictions regarding automation, scraping, and data usage.

Instagram Platform Policy

https://developers.facebook.com/policy/ Instagram is part of Meta's platform, and this page details the policies developers must follow when accessing data via APIs.

Meta Graph API Documentation (for approved business access)

https://developers.facebook.com/docs/instagram-api Official API docs for businesses and apps with approved access, including endpoints for posts, insights, and basic profile data.

Frequently Asked Questions About Instagram Scraping

Can I scrape Instagram data without getting detected?

Techniques such as rotating proxies (e.g., Live Proxies), user-agent switching, and request throttling can help minimize detection risks when you scrape Instagram.

What are the best tools for scraping Instagram?

Popular tools include Python libraries like Requests and BeautifulSoup for static content, Selenium for dynamic content, and cloud-based scraping APIs like ScraperAPI and Apify.

How do I scrape Instagram followers effectively?

If you need the follower count, it’s publicly visible and can be extracted using tools like Requests (as shown in the profile scraping section), Playwright, or Selenium. Provided you respect rate limits. However, retrieving the actual list of followers requires login access to private endpoints, which violates Instagram’s Terms of Service. The proper way to access follower lists is through the official Instagram Graph API with authorized permissions.

How do I scrape emails from Instagram?

Extracting emails from Instagram profiles can be done with regular expressions on publicly visible data. Always ensure ethical usage and compliance with privacy laws.

What is the best way to store scraped Instagram data?

Storing data in formats like JSON and CSV is effective for analysis, while databases like MongoDB or PostgreSQL are ideal for handling large-scale datasets.