Scrape TikTok Like a Pro: Step-by-Step Methods, Tools, and Tips

TikTok has become a goldmine of user-generated content and social media insights. With over 1 billion active users creating millions of videos daily, the platform offers unprecedented opportunities for data analysis, trend monitoring, and business intelligence. This comprehensive guide shows you how to scrape TikTok data effectively using Python.

Dominykas Niaura

Aug 01, 2025

10 min read

Why scrape TikTok?

TikTok scraping unlocks valuable insights that can transform your business strategy and research capabilities. The platform's vast ecosystem of content, creators, and user interactions provides rich data for multiple use cases:

Trend analysis and market research. Monitor viral content patterns, emerging hashtags, and cultural movements in real time. Spot trends before they go mainstream and gain an edge in product development or content marketing.
Influencer research and marketing. Evaluate creator performance, engagement metrics, and audience demographics to find the right brand partners. You can also monitor influencer campaigns and measure ROI more effectively.
Sentiment analysis and brand monitoring. Use comments, video captions, and hashtags to gauge public sentiment toward your brand or competitors. Spot early signs of PR crises and respond proactively.
Lead generation and sales intelligence. Uncover potential customers by analyzing content themes and user interests. B2B companies, in particular, can identify prospects discussing industry-specific pain points.
Content strategy optimization. Analyze what formats, topics, and posting times drive engagement. Reverse-engineer successful accounts to sharpen your own content strategy and improve organic reach.

Understanding TikTok's structure and anti-scraping measures

TikTok presents unique challenges for web scraping due to its sophisticated architecture and robust anti-bot protections. Understanding these systems is crucial for building effective scrapers.

Dynamic content loading. TikTok uses JavaScript rendering and infinite scroll mechanisms to load content dynamically. Unlike traditional websites with static HTML, TikTok constructs most of its data through server-side JavaScript, making simple HTTP requests insufficient.
Anti-bot detection systems. The platform employs multiple layers of bot detection, including browser fingerprinting, behavioral analysis, and JavaScript challenges. These systems monitor request patterns, mouse movements, scroll behavior, and device characteristics to identify automated traffic.
Rate limiting and IP blocking. TikTok implements aggressive rate limiting that can trigger temporary or permanent IP bans after relatively few requests. The platform also uses geographic restrictions and datacenter IP detection to block suspicious traffic patterns.
CAPTCHAs and verification challenges. When suspicious activity is detected, TikTok displays various CAPTCHA types, including image recognition, puzzle solving, and phone verification requirements that can completely halt automated scraping attempts.
Frequent structure changes. TikTok regularly updates its DOM structure, CSS selectors, and API endpoints to break existing scrapers. This means scrapers require constant maintenance and adaptation to remain functional.

Methods and tools for scraping TikTok

Several approaches exist for extracting TikTok data, each with distinct advantages and limitations. Choosing the right method depends on your technical requirements, scale needs, and budget constraints.

Browser automation with Playwright/Selenium. The most reliable approach for handling TikTok's JavaScript-heavy interface. Tools like Playwright can fully render pages, handle infinite scroll, and mimic human behavior patterns. This method provides the highest success rate but requires more computational resources.
Hidden API extraction. TikTok loads data through internal APIs that return JSON responses. By intercepting these API calls, you can extract structured data more efficiently than parsing HTML. However, these APIs change frequently and require reverse engineering.
Unofficial TikTok APIs. Several open-source libraries like TikTok-API provide simplified interfaces for data extraction. While easier to implement, these tools often break when TikTok updates its systems and may not support all data types.
Headless browser services. Cloud-based solutions offer managed browser infrastructure with built-in anti-detection features. These services handle proxy rotation, CAPTCHA solving, and infrastructure maintenance, but come with ongoing costs.
Hybrid approaches. Combining multiple methods often yields the best results. For example, using Playwright for initial page rendering and then extracting data from hidden JSON objects provides both reliability and efficiency.

Why proxies are necessary for stable TikTok scraping

TikTok actively defends against scraping by tracking IP activity and limiting access based on region and behavior. Without proxies, your scraper will likely get blocked fast, especially at scale.

Proxies help by spreading requests across multiple IP addresses, making your activity appear more natural and avoiding rate limits. They also let you bypass regional restrictions and view content as if you're in a different location.

For best results, use residential proxies, which route traffic through real user devices and are harder for TikTok to detect compared to datacenter IPs. Pair this with smart rotation strategies (adjusting IPs based on request volume, error rates, and timing) to avoid triggering anti-bot systems.

At Decodo, we offer high-performance residential proxies with a 99.86% success rate, <0.6s response time, and geo-targeting across 195+ locations. Here’s how to integrate them into your TikTok scraper:

Create your account. Sign up at the Decodo dashboard.
Select a proxy plan. Choose a subscription that suits your needs or opt for a 3-day free trial.
Configure proxy settings. Set up your proxies with rotating sessions for maximum effectiveness.
Select locations. Target specific regions based on your data requirements or keep it at Random.
Integrate into your scraper. Use the provided proxy endpoints in your scraping setup.

Get residential proxies for TikTok

Claim your 3-day free trial of residential proxies and explore full features with unrestricted access.

Start now

Step-by-step guide: how to scrape TikTok data

In this section, we’ll walk you through how to use Python to scrape various types of TikTok data: profiles, videos/posts, comments, and search results. You'll learn how to extract structured information while handling TikTok’s anti-bot protections, all with practical, ready-to-run scripts.

Before running any of the scripts in this guide, make sure you have Python 3.8+ installed. Then, install the necessary libraries using the following commands:

pip install playwright httpx parsel jmespath

After installing the packages, you’ll also need to install the Playwright browser binaries:

playwright install

These libraries cover everything used in the scripts:

Playwright – for browser automation when scraping profiles
httpx – for making fast, async HTTP requests
parsel – for parsing HTML content
jmespath – for querying structured JSON data

Once these are set up, you’re ready to run any of the scraping scripts in this tutorial.

Scraping TikTok profiles

Profile scraping is a great entry point for influencer research, audience analysis, or competitive benchmarking. This Python script uses Playwright to automate a headless browser session and collect public profile data from TikTok.

The script works by launching a stealthy headless browser (Firefox in this case), routing traffic through a proxy for stability and geo-targeting. It visits the user’s profile page and tries to extract structured JSON data embedded in the page (__UNIVERSAL_DATA_FOR_REHYDRATION__). If that fails, it falls back to scraping visible HTML elements.

Here’s what it collects:

Username and display name
Bio/description
Follower, following, and like counts
Total number of videos
Verification status

To reduce detection, the script uses a custom user-agent and disables navigator.webdriver, which helps mask automated behavior, as well as residential proxies.

import asyncio
import json
from playwright.async_api import async_playwright

async def scrape_tiktok_profile(username: str, proxy_host: str, proxy_port: int, proxy_username: str, proxy_password: str):
    """Scrape TikTok profile data using proxy"""
    
    async with async_playwright() as p:
        browser = await p.firefox.launch(
            headless=True,
            proxy={'server': f'http://{proxy_host}:{proxy_port}', 'username': proxy_username, 'password': proxy_password}
        )
        
        context = await browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
        )
        await context.add_init_script("Object.defineProperty(navigator, 'webdriver', { get: () => undefined });")
        page = await context.new_page()
        
        try:
            await page.goto(f"https://www.tiktok.com/@{username}", timeout=20000)
            await page.wait_for_timeout(3000)
            
            # Extract from JSON
            user_info, video_count = {}, 0
            json_script = await page.query_selector('script#__UNIVERSAL_DATA_FOR_REHYDRATION__')
            
            if json_script:
                try:
                    data = json.loads(await json_script.text_content())
                    # Try multiple JSON paths
                    for path in [['__DEFAULT_SCOPE__', 'webapp.user-detail', 'userInfo'], ['default', 'UserModule', 'users']]:
                        current = data
                        for key in path:
                            current = current.get(key, {})
                        if current and 'user' in current:
                            user_info = current['user']
                            video_count = current.get('stats', {}).get('videoCount', 0)
                            break
                        elif current and 'uniqueId' in current:
                            user_info = current
                            break
                except:
                    pass
            
            # HTML fallback
            if not user_info:
                elem = await page.query_selector('[data-e2e="user-title"]')
                if elem:
                    text = await elem.text_content()
                    user_info = {'uniqueId': text.strip(), 'nickname': text.strip(), 'signature': '', 'verified': False}
            
            if not user_info:
                return {'error': 'Could not extract profile data'}
            
            # Extract stats from HTML
            stats = []
            for selector in ['strong[data-e2e]', '[data-e2e="number-strong"]', 'strong']:
                elements = await page.query_selector_all(selector)
                if len(elements) >= 3:
                    stats = [await elem.text_content() for elem in elements[:3]]
                    break
            
            # Clean bio
            bio = user_info.get('signature', '').replace('\n', ' | ').replace('\r', ' | ')
            
            return {
                'username': user_info.get('uniqueId', ''),
                'display_name': user_info.get('nickname', ''),
                'bio': bio,
                'followers': stats[1] if len(stats) > 1 else "0",
                'following': stats[0] if len(stats) > 0 else "0", 
                'likes': stats[2] if len(stats) > 2 else "0",
                'videos': video_count,
                'verified': user_info.get('verified', False)
            }
            
        except Exception as e:
            return {'error': str(e)}
        finally:
            await browser.close()


async def main():
    # Decodo proxy configuration
    result = await scrape_tiktok_profile(
        username='beccastravel',  # Example TikTok username – replace
        proxy_host='gate.decodo.com',
        proxy_port=7000,
        proxy_username='YOUR_USERNAME',  # Replace
        proxy_password='YOUR_PASSWORD'   # Replace
    )
    
    print(json.dumps(result, indent=2))


if __name__ == "__main__":
    asyncio.run(main())

import asyncio
import json
from playwright.async_api import async_playwright

async def scrape_tiktok_profile(username: str, proxy_host: str, proxy_port: int, proxy_username: str, proxy_password: str):
    """Scrape TikTok profile data using proxy"""
    
    async with async_playwright() as p:
        browser = await p.firefox.launch(
            headless=True,
            proxy={'server': f'http://{proxy_host}:{proxy_port}', 'username': proxy_username, 'password': proxy_password}
        )
        
        context = await browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
        )
        await context.add_init_script("Object.defineProperty(navigator, 'webdriver', { get: () => undefined });")
        page = await context.new_page()
        
        try:
            await page.goto(f"https://www.tiktok.com/@{username}", timeout=20000)
            await page.wait_for_timeout(3000)
            
            # Extract from JSON
            user_info, video_count = {}, 0
            json_script = await page.query_selector('script#__UNIVERSAL_DATA_FOR_REHYDRATION__')
            
            if json_script:
                try:
                    data = json.loads(await json_script.text_content())
                    # Try multiple JSON paths
                    for path in [['__DEFAULT_SCOPE__', 'webapp.user-detail', 'userInfo'], ['default', 'UserModule', 'users']]:
                        current = data
                        for key in path:
                            current = current.get(key, {})
                        if current and 'user' in current:
                            user_info = current['user']
                            video_count = current.get('stats', {}).get('videoCount', 0)
                            break
                        elif current and 'uniqueId' in current:
                            user_info = current
                            break
                except:
                    pass
            
            # HTML fallback
            if not user_info:
                elem = await page.query_selector('[data-e2e="user-title"]')
                if elem:
                    text = await elem.text_content()
                    user_info = {'uniqueId': text.strip(), 'nickname': text.strip(), 'signature': '', 'verified': False}
            
            if not user_info:
                return {'error': 'Could not extract profile data'}
            
            # Extract stats from HTML
            stats = []
            for selector in ['strong[data-e2e]', '[data-e2e="number-strong"]', 'strong']:
                elements = await page.query_selector_all(selector)
                if len(elements) >= 3:
                    stats = [await elem.text_content() for elem in elements[:3]]
                    break
            
            # Clean bio
            bio = user_info.get('signature', '').replace('\n', ' | ').replace('\r', ' | ')
            
            return {
                'username': user_info.get('uniqueId', ''),
                'display_name': user_info.get('nickname', ''),
                'bio': bio,
                'followers': stats[1] if len(stats) > 1 else "0",
                'following': stats[0] if len(stats) > 0 else "0", 
                'likes': stats[2] if len(stats) > 2 else "0",
                'videos': video_count,
                'verified': user_info.get('verified', False)
            }
            
        except Exception as e:
            return {'error': str(e)}
        finally:
            await browser.close()


async def main():
    # Decodo proxy configuration
    result = await scrape_tiktok_profile(
        username='beccastravel',  # Example TikTok username – replace
        proxy_host='gate.decodo.com',
        proxy_port=7000,
        proxy_username='YOUR_USERNAME',  # Replace
        proxy_password='YOUR_PASSWORD'   # Replace
    )
    
    print(json.dumps(result, indent=2))


if __name__ == "__main__":
    asyncio.run(main())

Once scraped, the data is returned in a structured JSON format that’s easy to store, analyze, or integrate into dashboards. Here's an example response for a real profile:

{
  "username": "beccastravel",
  "display_name": "Becca -A Brit in Aus\ud83c\udde6\ud83c\uddfa\ud83c\udf0a\ud83e\udebc",
  "bio": "Brit living in Australia \ud83c\udde6\ud83c\uddfa",
  "followers": "6580",
  "following": "1404",
  "likes": "2.5M",
  "videos": 1034,
  "verified": false
}

Scraping TikTok videos/posts

Scraping TikTok video pages allows you to extract rich metadata like post descriptions, hashtags, view counts, likes, shares, bookmarks, and more. This kind of data is especially useful for trend tracking, content analysis, and performance benchmarking.

All you need to do is plug your proxy credentials and a few TikTok post URLs into the script. The scraper sends asynchronous requests to TikTok video pages using httpx and parses the response HTML with parsel. It first attempts to extract structured data from TikTok’s embedded JSON (__UNIVERSAL_DATA_FOR_REHYDRATION__). If that fails, it falls back to scraping basic meta tags in the HTML.

The script is designed to:

Return a clean summary of each video’s author, description, stats (likes, plays, comments, shares, bookmarks), and hashtags
Handle HTTP errors gracefully and skip failed requests
Add a small delay between requests to reduce the risk of triggering TikTok’s anti-bot defenses
Use proxy routing to prevent IP bans and maintain reliable access

One key advantage is that it surfaces data not directly visible on the public TikTok post page, such as total play counts and bookmark numbers.

import jmespath
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response
from parsel import Selector

async def create_client() -> AsyncClient:
    """Create HTTP client with proxy"""
    headers = {
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    }
    
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    
    return AsyncClient(
        http2=True,
        headers=headers,
        proxy=proxy,
        timeout=30.0,
        follow_redirects=True
    )

def parse_video(response: Response) -> Dict:
    """Parse video data from response"""
    selector = Selector(response.text)
    
    # Try JSON extraction first
    try:
        json_data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
        if json_data:
            full_data = json.loads(json_data)
            video_data = full_data["__DEFAULT_SCOPE__"]["webapp.video-detail"]["itemInfo"]["itemStruct"]
            
            result = jmespath.search("""
            {
                id: id,
                description: desc,
                hashtags: textExtra[?hashtagName].hashtagName,
                author: {
                    username: author.uniqueId,
                    nickname: author.nickname,
                    verified: author.verified
                },
                stats: {
                    plays: stats.playCount,
                    likes: stats.diggCount,
                    comments: stats.commentCount,
                    shares: stats.shareCount,
                    bookmarks: stats.collectCount
                }
            }""", video_data)
            
            result['url'] = str(response.url)
            
            # Clean description
            if result.get('description'):
                result['description'] = result['description'].replace('\n', ' | ')
            
            return result
    except:
        pass
    
    # HTML fallback
    description = selector.xpath("//meta[@property='og:description']/@content").get() or ""
    
    return {
        'description': description.replace('\n', ' | '),
        'hashtags': [],
        'url': str(response.url),
        'stats': {'plays': 0, 'likes': 0, 'comments': 0, 'shares': 0, 'bookmarks': 0}
    }

async def scrape_tiktok_videos() -> List[Dict]:
    """Scrape videos"""
    
    # Example URLs – replace
    video_urls = [
        "https://www.tiktok.com/@kululagu/video/7447897941609630992",
        "https://www.tiktok.com/@reachyusuf/video/7507345573087890695",
        "https://www.tiktok.com/@yuji_beleza/video/7503491626569043223"
    ]
    
    client = await create_client()
    
    try:
        results = []
        
        for i, url in enumerate(video_urls):
            try:
                response = await client.get(url)
                
                if response.status_code != 200:
                    results.append({'error': f'HTTP {response.status_code}', 'url': url})
                    continue
                
                video_data = parse_video(response)
                
                if 'error' not in video_data and video_data.get('author'):
                    results.append(video_data)
                else:
                    results.append({'error': 'Failed to extract data', 'url': url})
                
                # Small delay between requests
                await asyncio.sleep(1)
                
            except Exception as e:
                results.append({'error': str(e), 'url': url})
        
        return results
        
    finally:
        await client.aclose()

async def main():
    """Main function"""
    videos = await scrape_tiktok_videos()
    
    for i, video in enumerate(videos, 1):
        if 'error' not in video and video.get('author'):
            author = video.get('author', {}).get('username', 'Unknown')
            desc = video.get('description', 'No description')
            stats = video.get('stats', {})
            
            # Ensure all stats are integers
            def safe_int(value):
                if value is None:
                    return 0
                try:
                    return int(value)
                except (ValueError, TypeError):
                    return 0
            
            likes = safe_int(stats.get('likes'))
            plays = safe_int(stats.get('plays'))
            comments = safe_int(stats.get('comments'))
            shares = safe_int(stats.get('shares'))
            bookmarks = safe_int(stats.get('bookmarks'))
            hashtags = video.get('hashtags', [])
            
            hashtag_text = f" #{' #'.join(hashtags)}" if hashtags else ""
            print(f"{i}. @{author}:")
            print(f"   {desc}{hashtag_text}")
            print(f"   {likes:,} likes, {plays:,} plays, {comments:,} comments")
            print(f"   {shares:,} shares, {bookmarks:,} bookmarks")
        else:
            print(f"{i}. Error: {video.get('error', 'Failed to extract data')}")
        
        # Add separator line between videos (except after the last one)
        if i < len(videos):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

import jmespath
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response
from parsel import Selector

async def create_client() -> AsyncClient:
    """Create HTTP client with proxy"""
    headers = {
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    }
    
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    
    return AsyncClient(
        http2=True,
        headers=headers,
        proxy=proxy,
        timeout=30.0,
        follow_redirects=True
    )

def parse_video(response: Response) -> Dict:
    """Parse video data from response"""
    selector = Selector(response.text)
    
    # Try JSON extraction first
    try:
        json_data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
        if json_data:
            full_data = json.loads(json_data)
            video_data = full_data["__DEFAULT_SCOPE__"]["webapp.video-detail"]["itemInfo"]["itemStruct"]
            
            result = jmespath.search("""
            {
                id: id,
                description: desc,
                hashtags: textExtra[?hashtagName].hashtagName,
                author: {
                    username: author.uniqueId,
                    nickname: author.nickname,
                    verified: author.verified
                },
                stats: {
                    plays: stats.playCount,
                    likes: stats.diggCount,
                    comments: stats.commentCount,
                    shares: stats.shareCount,
                    bookmarks: stats.collectCount
                }
            }""", video_data)
            
            result['url'] = str(response.url)
            
            # Clean description
            if result.get('description'):
                result['description'] = result['description'].replace('\n', ' | ')
            
            return result
    except:
        pass
    
    # HTML fallback
    description = selector.xpath("//meta[@property='og:description']/@content").get() or ""
    
    return {
        'description': description.replace('\n', ' | '),
        'hashtags': [],
        'url': str(response.url),
        'stats': {'plays': 0, 'likes': 0, 'comments': 0, 'shares': 0, 'bookmarks': 0}
    }

async def scrape_tiktok_videos() -> List[Dict]:
    """Scrape videos"""
    
    # Example URLs – replace
    video_urls = [
        "https://www.tiktok.com/@kululagu/video/7447897941609630992",
        "https://www.tiktok.com/@reachyusuf/video/7507345573087890695",
        "https://www.tiktok.com/@yuji_beleza/video/7503491626569043223"
    ]
    
    client = await create_client()
    
    try:
        results = []
        
        for i, url in enumerate(video_urls):
            try:
                response = await client.get(url)
                
                if response.status_code != 200:
                    results.append({'error': f'HTTP {response.status_code}', 'url': url})
                    continue
                
                video_data = parse_video(response)
                
                if 'error' not in video_data and video_data.get('author'):
                    results.append(video_data)
                else:
                    results.append({'error': 'Failed to extract data', 'url': url})
                
                # Small delay between requests
                await asyncio.sleep(1)
                
            except Exception as e:
                results.append({'error': str(e), 'url': url})
        
        return results
        
    finally:
        await client.aclose()

async def main():
    """Main function"""
    videos = await scrape_tiktok_videos()
    
    for i, video in enumerate(videos, 1):
        if 'error' not in video and video.get('author'):
            author = video.get('author', {}).get('username', 'Unknown')
            desc = video.get('description', 'No description')
            stats = video.get('stats', {})
            
            # Ensure all stats are integers
            def safe_int(value):
                if value is None:
                    return 0
                try:
                    return int(value)
                except (ValueError, TypeError):
                    return 0
            
            likes = safe_int(stats.get('likes'))
            plays = safe_int(stats.get('plays'))
            comments = safe_int(stats.get('comments'))
            shares = safe_int(stats.get('shares'))
            bookmarks = safe_int(stats.get('bookmarks'))
            hashtags = video.get('hashtags', [])
            
            hashtag_text = f" #{' #'.join(hashtags)}" if hashtags else ""
            print(f"{i}. @{author}:")
            print(f"   {desc}{hashtag_text}")
            print(f"   {likes:,} likes, {plays:,} plays, {comments:,} comments")
            print(f"   {shares:,} shares, {bookmarks:,} bookmarks")
        else:
            print(f"{i}. Error: {video.get('error', 'Failed to extract data')}")
        
        # Add separator line between videos (except after the last one)
        if i < len(videos):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

Below is the response you get with this script. This kind of output is especially helpful for researchers, marketers, or analysts who need quick, structured insights from multiple TikTok posts without manually inspecting each one:

1. @kululagu:
   I dont like the third movie but ı wonder ıf ı will like it when ı grow older #beforesunrise #beforetrilogy #beforesunset #beforemidnight #ethanhawke #juliedelpy #fyp  #beforesunrise #beforetrilogy #beforesunset #beforemidnight #ethanhawke #juliedelpy #fyp
   218,900 likes, 923,000 plays, 503 comments
   8,939 shares, 37,206 bookmarks
--------------------------------------------------
2. @reachyusuf:
   what's yo favorite type of bread? Flour is a miracle right now. Currently a bag of flour costs $600 usd. My charity has been able to acquire 3000 bags thank God. $200 covers 1 bag of flour (25 kg) through my charity. If you can sponsor a family for a bag of flour please d0nate in the l1nk in b1o! Families are using macaroni to make bread because they cannot afford flour. Please spread the word and d0nate if you can!
   538,100 likes, 2,800,000 plays, 16,700 comments
   55,400 shares, 46,365 bookmarks
--------------------------------------------------
3. @yuji_beleza:
   African basketball players in Japan 🇯🇵🏀
   9,100,000 likes, 109,100,000 plays, 40,300 comments
   146,400 shares, 405,175 bookmarks

1. @kululagu:
   I dont like the third movie but ı wonder ıf ı will like it when ı grow older #beforesunrise #beforetrilogy #beforesunset #beforemidnight #ethanhawke #juliedelpy #fyp  #beforesunrise #beforetrilogy #beforesunset #beforemidnight #ethanhawke #juliedelpy #fyp
   218,900 likes, 923,000 plays, 503 comments
   8,939 shares, 37,206 bookmarks
--------------------------------------------------
2. @reachyusuf:
   what's yo favorite type of bread? Flour is a miracle right now. Currently a bag of flour costs $600 usd. My charity has been able to acquire 3000 bags thank God. $200 covers 1 bag of flour (25 kg) through my charity. If you can sponsor a family for a bag of flour please d0nate in the l1nk in b1o! Families are using macaroni to make bread because they cannot afford flour. Please spread the word and d0nate if you can!
   538,100 likes, 2,800,000 plays, 16,700 comments
   55,400 shares, 46,365 bookmarks
--------------------------------------------------
3. @yuji_beleza:
   African basketball players in Japan 🇯🇵🏀
   9,100,000 likes, 109,100,000 plays, 40,300 comments
   146,400 shares, 405,175 bookmarks

Scraping TikTok comments

Comment scraping is essential if you're looking to understand audience sentiment, gather user feedback, or enrich content analysis with direct viewer reactions. This Python script uses httpx to fetch a TikTok video’s HTML, extract necessary tokens, and call TikTok’s internal comment API – all while routing requests through a proxy for stability and anonymity.

To use the scraper, simply replace the video URL, plug in your proxy credentials, and control how many comments to retrieve by changing the max_comments parameter. The script then:

Parses the video ID from the URL
Sends a request to the page to extract dynamic tokens like aid, msToken, and region required to query TikTok's comment API
Builds the comment API URL and retrieves a list of comments
Returns cleaned comment data including the author's handle, nickname, number of likes, replies, and the comment text itself

import asyncio
import re
from httpx import AsyncClient
from urllib.parse import urlencode

async def create_client():
    headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    return AsyncClient(headers=headers, proxy=proxy, timeout=30.0)

async def scrape_tiktok_comments(video_url, max_comments=20):
    video_id = re.search(r'/video/(\d+)', video_url).group(1)
    client = await create_client()
    
    try:
        response = await client.get(video_url)
        text = response.text
        
        params = {"aweme_id": video_id, "cursor": 0, "count": max_comments, "current_region": "US", "aid": "1988"}
        
        for pattern in [r'"aid":(\d+)', r'"msToken":"([^"]+)"', r'"region":"([^"]+)"']:
            match = re.search(pattern, text)
            if match and 'aid' in pattern:
                params['aid'] = match.group(1)
            elif match and 'msToken' in pattern:
                params['msToken'] = match.group(1)
            elif match and 'region' in pattern:
                params['region'] = match.group(1)
        
        api_url = "https://www.tiktok.com/api/comment/list/?" + urlencode(params)
        api_response = await client.get(api_url, headers={"accept": "application/json", "referer": video_url})
        
        if api_response.status_code == 200:
            data = api_response.json()
            comments = []
            for comment in data.get('comments', []):
                if comment.get('text'):
                    comments.append({
                        'text': comment.get('text', ''),
                        'author': comment.get('user', {}).get('unique_id', ''),
                        'author_nickname': comment.get('user', {}).get('nickname', ''),
                        'likes': comment.get('digg_count', 0),
                        'replies': comment.get('reply_comment_total', 0)
                    })
            return comments
    except:
        pass
    finally:
        await client.aclose()
    return []

async def main():
    video_url = "https://www.tiktok.com/@kululagu/video/7447897941609630992"
    comments = await scrape_tiktok_comments(video_url)
    
    if not comments:
        print("No comments found")
        return
    
    for i, comment in enumerate(comments, 1):
        author = comment['author']
        nickname = comment['author_nickname']
        text = comment['text']
        likes = comment['likes']
        replies = comment['replies']
        
        display_author = f"@{author}"
        if nickname and nickname != author:
            display_author += f" ({nickname})"
        
        print(f"{i}. {display_author}:")
        print(f"   {text}")
        
        stats = []
        if likes > 0:
            stats.append(f"{likes:,} likes")
        if replies > 0:
            stats.append(f"{replies:,} replies")
        
        if stats:
            print(f"   {', '.join(stats)}")
        
        if i < len(comments):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

import asyncio
import re
from httpx import AsyncClient
from urllib.parse import urlencode

async def create_client():
    headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    return AsyncClient(headers=headers, proxy=proxy, timeout=30.0)

async def scrape_tiktok_comments(video_url, max_comments=20):
    video_id = re.search(r'/video/(\d+)', video_url).group(1)
    client = await create_client()
    
    try:
        response = await client.get(video_url)
        text = response.text
        
        params = {"aweme_id": video_id, "cursor": 0, "count": max_comments, "current_region": "US", "aid": "1988"}
        
        for pattern in [r'"aid":(\d+)', r'"msToken":"([^"]+)"', r'"region":"([^"]+)"']:
            match = re.search(pattern, text)
            if match and 'aid' in pattern:
                params['aid'] = match.group(1)
            elif match and 'msToken' in pattern:
                params['msToken'] = match.group(1)
            elif match and 'region' in pattern:
                params['region'] = match.group(1)
        
        api_url = "https://www.tiktok.com/api/comment/list/?" + urlencode(params)
        api_response = await client.get(api_url, headers={"accept": "application/json", "referer": video_url})
        
        if api_response.status_code == 200:
            data = api_response.json()
            comments = []
            for comment in data.get('comments', []):
                if comment.get('text'):
                    comments.append({
                        'text': comment.get('text', ''),
                        'author': comment.get('user', {}).get('unique_id', ''),
                        'author_nickname': comment.get('user', {}).get('nickname', ''),
                        'likes': comment.get('digg_count', 0),
                        'replies': comment.get('reply_comment_total', 0)
                    })
            return comments
    except:
        pass
    finally:
        await client.aclose()
    return []

async def main():
    video_url = "https://www.tiktok.com/@kululagu/video/7447897941609630992"
    comments = await scrape_tiktok_comments(video_url)
    
    if not comments:
        print("No comments found")
        return
    
    for i, comment in enumerate(comments, 1):
        author = comment['author']
        nickname = comment['author_nickname']
        text = comment['text']
        likes = comment['likes']
        replies = comment['replies']
        
        display_author = f"@{author}"
        if nickname and nickname != author:
            display_author += f" ({nickname})"
        
        print(f"{i}. {display_author}:")
        print(f"   {text}")
        
        stats = []
        if likes > 0:
            stats.append(f"{likes:,} likes")
        if replies > 0:
            stats.append(f"{replies:,} replies")
        
        if stats:
            print(f"   {', '.join(stats)}")
        
        if i < len(comments):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

This kind of output that the script delivers is especially useful for identifying common reactions, capturing standout quotes, or spotting recurring audience themes. With a bit of additional logic, you could also perform basic sentiment analysis or track changes in engagement over time. Here’s a sample response from scraping the first 5 comments on a video:

1. @aadyagurjar (aadya):
   she is me i am her
   7,796 likes, 6 replies
--------------------------------------------------
2. @uqqhi (sophie):
   "I don't feel things for people anymore"
   2,128 likes
--------------------------------------------------
3. @thefoolinthewall (kp𒉭):
   my favorite movies of all time. before sunset is perfect.
   5,582 likes, 2 replies
--------------------------------------------------
4. @lunbottomboot7l (ttennyyapplees):
   "I never felt like it was the right man" she gets it
   5,540 likes, 1 replies
--------------------------------------------------
5. @pris.pls (💘⭐️🧚🏻‍♀️ pris pls 💌🌻🌙):
   shes so real for that tho
   927 likes

1. @aadyagurjar (aadya):
   she is me i am her
   7,796 likes, 6 replies
--------------------------------------------------
2. @uqqhi (sophie):
   "I don't feel things for people anymore"
   2,128 likes
--------------------------------------------------
3. @thefoolinthewall (kp𒉭):
   my favorite movies of all time. before sunset is perfect.
   5,582 likes, 2 replies
--------------------------------------------------
4. @lunbottomboot7l (ttennyyapplees):
   "I never felt like it was the right man" she gets it
   5,540 likes, 1 replies
--------------------------------------------------
5. @pris.pls (💘⭐️🧚🏻‍♀️ pris pls 💌🌻🌙):
   shes so real for that tho
   927 likes

Scraping TikTok search results

If you're tracking trends, monitoring a topic’s reach, or analyzing how people react to a breaking event, scraping TikTok search results can give you real-time insights into what’s being posted and how it’s performing.

The Python script below scrapes TikTok’s search results for a given keyword by simulating a browser visit, extracting internal API tokens, and querying TikTok’s backend search endpoint, all routed through a proxy for stability.

Here’s how it works:

It creates a proxy-enabled HTTP client using httpx, with proper headers to mimic a real browser.
It visits the public TikTok search page for the given query and extracts dynamic values like aid, msToken, region, and device_id – all of which are required to build a valid API call.
It generates a random search_id to make the request appear unique and time-specific.
It builds a search API URL with all the required query parameters, fetches the data, and parses relevant information from each video result.

The script collects and displays:

Author username and nickname
Video description (truncated if too long)
Engagement stats: likes, plays, comments, shares

import asyncio
import re
import secrets
import datetime
from httpx import AsyncClient
from urllib.parse import urlencode, quote

async def create_client():
    headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    return AsyncClient(http2=True, headers=headers, proxy=proxy, timeout=30.0, follow_redirects=True)

async def extract_api_params(client, search_url):
    response = await client.get(search_url)
    if response.status_code != 200:
        return {}
    
    text = response.text
    params = {}
    patterns = {
        'aid': [r'"aid":(\d+)', r'window\._APP_ID\s*=\s*(\d+)', r'"app_id":(\d+)'],
        'msToken': [r'"msToken":"([^"]+)"', r'msToken=([^;]+)', r'"ms_token":"([^"]+)"'],
        'device_id': [r'"device_id":(\d+)', r'"device_id":"([^"]+)"'],
        'region': [r'"region":"([^"]+)"']
    }
    
    for key, pattern_list in patterns.items():
        for pattern in pattern_list:
            match = re.search(pattern, text)
            if match:
                params[key] = match.group(1)
                break
    return params

def generate_search_id():
    timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
    random_hex_length = (32 - len(timestamp)) // 2
    random_hex = secrets.token_hex(random_hex_length).upper()
    return timestamp + random_hex

async def call_search_api(client, query, api_params, count=20):
    base_params = {
        "keyword": query,
        "offset": 0,
        "count": count,
        "search_id": generate_search_id(),
        "search_type": 0,
        "is_filter_search": 0
    }
    base_params.update(api_params)
    if 'aid' not in base_params:
        base_params['aid'] = '1988'
    
    api_url = "https://www.tiktok.com/api/search/general/full/?" + urlencode(base_params)
    api_headers = {
        "accept": "application/json, text/plain, */*",
        "referer": f"https://www.tiktok.com/search?q={quote(query)}",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-origin"
    }
    
    try:
        response = await client.get(api_url, headers=api_headers)
        return response.json() if response.status_code == 200 else {}
    except:
        return {}

def parse_search_results(api_response):
    if not api_response or 'data' not in api_response:
        return []
    
    videos = []
    for item in api_response['data']:
        try:
            if item.get('type') == 1:
                video_info = item.get('item', {})
                if video_info:
                    desc = video_info.get('desc', '')
                    author_data = video_info.get('author', {})
                    stats_data = video_info.get('stats', {})
                    
                    author = author_data.get('unique_id', '') or author_data.get('uniqueId', '')
                    nickname = author_data.get('nickname', '')
                    
                    likes = (stats_data.get('digg_count') or stats_data.get('diggCount') or 
                            stats_data.get('like_count') or stats_data.get('likeCount') or 0)
                    plays = (stats_data.get('play_count') or stats_data.get('playCount') or 
                            stats_data.get('view_count') or stats_data.get('viewCount') or 0)
                    comments = (stats_data.get('comment_count') or stats_data.get('commentCount') or 0)
                    shares = (stats_data.get('share_count') or stats_data.get('shareCount') or 0)
                    
                    if desc:
                        video = {
                            'id': video_info.get('id', ''),
                            'description': desc,
                            'author': author,
                            'author_nickname': nickname,
                            'likes': likes,
                            'plays': plays,
                            'comments': comments,
                            'shares': shares
                        }
                        videos.append(video)
        except:
            continue
    return videos

async def scrape_search_results(query, max_results=20):
    client = await create_client()
    
    try:
        search_url = f"https://www.tiktok.com/search?q={quote(query)}"
        api_params = await extract_api_params(client, search_url)
        await asyncio.sleep(1)
        api_response = await call_search_api(client, query, api_params, max_results)
        return parse_search_results(api_response)
    finally:
        await client.aclose()

async def main():
    search_query = "ozzy osbourne"   # Replace
    results = await scrape_search_results(search_query, max_results=15)
    
    if not results:
        print("No results found")
        return
    
    for i, video in enumerate(results, 1):
        author = video['author']
        nickname = video['author_nickname']
        description = video['description'][:100] + "..." if len(video['description']) > 100 else video['description']
        likes = int(video['likes']) if video['likes'] else 0
        plays = int(video['plays']) if video['plays'] else 0
        comments = int(video['comments']) if video['comments'] else 0
        
        display_author = f"@{author}"
        if nickname and nickname != author:
            display_author += f" ({nickname})"
        
        print(f"{i}. {display_author}")
        print(f"   {description}")
        print(f"   {likes:,} likes, {plays:,} plays, {comments:,} comments")
        
        if i < len(results):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

import asyncio
import re
import secrets
import datetime
from httpx import AsyncClient
from urllib.parse import urlencode, quote

async def create_client():
    headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    # Proxy configuration: replace with your credentials in this format: "protocol://username:password@host:port"
    proxy = "http://YOUR_USERNAME:[email protected]:7000"
    return AsyncClient(http2=True, headers=headers, proxy=proxy, timeout=30.0, follow_redirects=True)

async def extract_api_params(client, search_url):
    response = await client.get(search_url)
    if response.status_code != 200:
        return {}
    
    text = response.text
    params = {}
    patterns = {
        'aid': [r'"aid":(\d+)', r'window\._APP_ID\s*=\s*(\d+)', r'"app_id":(\d+)'],
        'msToken': [r'"msToken":"([^"]+)"', r'msToken=([^;]+)', r'"ms_token":"([^"]+)"'],
        'device_id': [r'"device_id":(\d+)', r'"device_id":"([^"]+)"'],
        'region': [r'"region":"([^"]+)"']
    }
    
    for key, pattern_list in patterns.items():
        for pattern in pattern_list:
            match = re.search(pattern, text)
            if match:
                params[key] = match.group(1)
                break
    return params

def generate_search_id():
    timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
    random_hex_length = (32 - len(timestamp)) // 2
    random_hex = secrets.token_hex(random_hex_length).upper()
    return timestamp + random_hex

async def call_search_api(client, query, api_params, count=20):
    base_params = {
        "keyword": query,
        "offset": 0,
        "count": count,
        "search_id": generate_search_id(),
        "search_type": 0,
        "is_filter_search": 0
    }
    base_params.update(api_params)
    if 'aid' not in base_params:
        base_params['aid'] = '1988'
    
    api_url = "https://www.tiktok.com/api/search/general/full/?" + urlencode(base_params)
    api_headers = {
        "accept": "application/json, text/plain, */*",
        "referer": f"https://www.tiktok.com/search?q={quote(query)}",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-origin"
    }
    
    try:
        response = await client.get(api_url, headers=api_headers)
        return response.json() if response.status_code == 200 else {}
    except:
        return {}

def parse_search_results(api_response):
    if not api_response or 'data' not in api_response:
        return []
    
    videos = []
    for item in api_response['data']:
        try:
            if item.get('type') == 1:
                video_info = item.get('item', {})
                if video_info:
                    desc = video_info.get('desc', '')
                    author_data = video_info.get('author', {})
                    stats_data = video_info.get('stats', {})
                    
                    author = author_data.get('unique_id', '') or author_data.get('uniqueId', '')
                    nickname = author_data.get('nickname', '')
                    
                    likes = (stats_data.get('digg_count') or stats_data.get('diggCount') or 
                            stats_data.get('like_count') or stats_data.get('likeCount') or 0)
                    plays = (stats_data.get('play_count') or stats_data.get('playCount') or 
                            stats_data.get('view_count') or stats_data.get('viewCount') or 0)
                    comments = (stats_data.get('comment_count') or stats_data.get('commentCount') or 0)
                    shares = (stats_data.get('share_count') or stats_data.get('shareCount') or 0)
                    
                    if desc:
                        video = {
                            'id': video_info.get('id', ''),
                            'description': desc,
                            'author': author,
                            'author_nickname': nickname,
                            'likes': likes,
                            'plays': plays,
                            'comments': comments,
                            'shares': shares
                        }
                        videos.append(video)
        except:
            continue
    return videos

async def scrape_search_results(query, max_results=20):
    client = await create_client()
    
    try:
        search_url = f"https://www.tiktok.com/search?q={quote(query)}"
        api_params = await extract_api_params(client, search_url)
        await asyncio.sleep(1)
        api_response = await call_search_api(client, query, api_params, max_results)
        return parse_search_results(api_response)
    finally:
        await client.aclose()

async def main():
    search_query = "ozzy osbourne"   # Replace
    results = await scrape_search_results(search_query, max_results=15)
    
    if not results:
        print("No results found")
        return
    
    for i, video in enumerate(results, 1):
        author = video['author']
        nickname = video['author_nickname']
        description = video['description'][:100] + "..." if len(video['description']) > 100 else video['description']
        likes = int(video['likes']) if video['likes'] else 0
        plays = int(video['plays']) if video['plays'] else 0
        comments = int(video['comments']) if video['comments'] else 0
        
        display_author = f"@{author}"
        if nickname and nickname != author:
            display_author += f" ({nickname})"
        
        print(f"{i}. {display_author}")
        print(f"   {description}")
        print(f"   {likes:,} likes, {plays:,} plays, {comments:,} comments")
        
        if i < len(results):
            print("-" * 50)

if __name__ == "__main__":
    asyncio.run(main())

This scraper is especially helpful for quickly sampling content around a topic, measuring post engagement, and discovering emerging narratives or viral trends. You can easily modify it to paginate through more results or filter by engagement thresholds. Here’s a snippet of what the output looks like:

1. @audio.producer1 (Audio Producer)
   John Michael "Ozzy" Osbourne (3 December 1948 - 22 July 2025) was an English singer, songwriter and ...
   44,200 likes, 270,400 plays, 1,739 comments
--------------------------------------------------
2. @nozza_123
   R.I.P LEGEND💔🥺#ozzyosbourne #artist #top5 #bestsongs #ranking #trendin #fyp 
   156,900 likes, 2,900,000 plays, 1,524 comments
--------------------------------------------------
3. @pastfusionai (Past Fusion AI)
   R.I.P. Ozzy Osbourne: The Prince of Darkness #ozzy #ozzyosbourne #life #story #history 
   41,900 likes, 968,500 plays, 225 comments
--------------------------------------------------
4. @el.mclovinn (Mclovin)
   Q.E.P.D Ozzy Osbourne 1948-2025 🕊️ #viral #fyp #ozzyosbourne #ozzyosbourneforever #ozzyosbourneedit 
   470,100 likes, 3,400,000 plays, 1,017 comments
--------------------------------------------------
5. @scrappy.and.chill (Scrappy 💙)
   #duet with @Past Vision We lost a music legend. This is the best use of AI I've seen. #ozzyosbourne ...
   2,652 likes, 142,500 plays, 72 comments
--------------------------------------------------
6. @_nkl2 (🧸)
   #ozzyosbourne #ukraine #riplegends 
   58,300 likes, 318,800 plays, 577 comments
--------------------------------------------------
7. @aivaultt (AI Vault)
   Evolution of Ozzy Osbourne! #ozzyosbourne #evolution #aigenerated 
   10,200 likes, 262,700 plays, 43 comments
--------------------------------------------------
8. @djxquizit (DJ Xquizit | Xavian)
   Descansa en paz Ozzy Osborne,  rest in piece #ozzyosbourne #restinpeace 
   229 likes, 4,370 plays, 11 comments
--------------------------------------------------
9. @vitosfrankai (Vito's Frank AI)
   "Due giganti camminano nel cielo. Addio Ozzy, addio Hulk." #hulkhogan #ozzyosbourne #ai #vitosfranka...
   27,000 likes, 400,600 plays, 576 comments
--------------------------------------------------
10. @boardroom (Boardroom)
   Ozzy Osbourne's farewell concert on July 5 became the highest-grossing charity concert ever, raising...
   60,200 likes, 801,700 plays, 791 comments

1. @audio.producer1 (Audio Producer)
   John Michael "Ozzy" Osbourne (3 December 1948 - 22 July 2025) was an English singer, songwriter and ...
   44,200 likes, 270,400 plays, 1,739 comments
--------------------------------------------------
2. @nozza_123
   R.I.P LEGEND💔🥺#ozzyosbourne #artist #top5 #bestsongs #ranking #trendin #fyp 
   156,900 likes, 2,900,000 plays, 1,524 comments
--------------------------------------------------
3. @pastfusionai (Past Fusion AI)
   R.I.P. Ozzy Osbourne: The Prince of Darkness #ozzy #ozzyosbourne #life #story #history 
   41,900 likes, 968,500 plays, 225 comments
--------------------------------------------------
4. @el.mclovinn (Mclovin)
   Q.E.P.D Ozzy Osbourne 1948-2025 🕊️ #viral #fyp #ozzyosbourne #ozzyosbourneforever #ozzyosbourneedit 
   470,100 likes, 3,400,000 plays, 1,017 comments
--------------------------------------------------
5. @scrappy.and.chill (Scrappy 💙)
   #duet with @Past Vision We lost a music legend. This is the best use of AI I've seen. #ozzyosbourne ...
   2,652 likes, 142,500 plays, 72 comments
--------------------------------------------------
6. @_nkl2 (🧸)
   #ozzyosbourne #ukraine #riplegends 
   58,300 likes, 318,800 plays, 577 comments
--------------------------------------------------
7. @aivaultt (AI Vault)
   Evolution of Ozzy Osbourne! #ozzyosbourne #evolution #aigenerated 
   10,200 likes, 262,700 plays, 43 comments
--------------------------------------------------
8. @djxquizit (DJ Xquizit | Xavian)
   Descansa en paz Ozzy Osborne,  rest in piece #ozzyosbourne #restinpeace 
   229 likes, 4,370 plays, 11 comments
--------------------------------------------------
9. @vitosfrankai (Vito's Frank AI)
   "Due giganti camminano nel cielo. Addio Ozzy, addio Hulk." #hulkhogan #ozzyosbourne #ai #vitosfranka...
   27,000 likes, 400,600 plays, 576 comments
--------------------------------------------------
10. @boardroom (Boardroom)
   Ozzy Osbourne's farewell concert on July 5 became the highest-grossing charity concert ever, raising...
   60,200 likes, 801,700 plays, 791 comments

Handling and bypassing TikTok's scraping protections

TikTok employs sophisticated anti-scraping measures that require careful handling. Understanding these challenges and implementing proper countermeasures is essential for successful data extraction.

Common obstacles when scraping TikTok

Rate limiting and IP bans. TikTok monitors request frequency and can impose temporary or permanent IP bans after detecting unusual traffic patterns. Signs include HTTP 429 errors, empty responses, or redirect loops to verification pages.
CAPTCHAs and anti-bot mechanisms. The platform frequently displays image recognition puzzles, sliding puzzles, and behavioral verification challenges. These appear when TikTok's systems detect automated traffic patterns.
Fingerprinting and behavioral detection. TikTok analyzes browser characteristics, mouse movements, scroll patterns, and timing to identify bot traffic. Consistent patterns or missing human-like behaviors trigger blocking mechanisms.
Dynamic content loading issues. Since TikTok relies heavily on JavaScript, traditional HTTP scraping often returns empty or incomplete data. The content requires full browser rendering and proper timing.

Recognizing when you've been blocked

Watch for these warning signs that indicate your scraper has been detected:

Empty or incomplete page content. Pages load but show minimal data or placeholder content.
CAPTCHA challenges. Pop-up verification dialogs requiring human interaction.
Redirect loops. Automatic redirects to verification or login pages.
HTTP error codes. 429 (Too Many Requests), 403 (Forbidden), or 503 (Service Unavailable).
Consistent timeouts. Pages that previously loaded quickly now time out frequently.

Solutions and workarounds

Using rotating proxies and residential IPs. Implement a robust proxy rotation system to distribute requests across multiple IP addresses. Residential proxies are particularly effective as they appear more legitimate than datacenter IPs.
Adjusting request headers and browser fingerprints. Rotate user agents and browser settings to avoid consistent fingerprinting.
Implementing delays and randomization. Add human-like delays and random actions to avoid detection patterns.
Leveraging JavaScript rendering and headless browsers. Use full browser automation with proper stealth settings.

Storing and analyzing scraped data

Proper data storage and analysis are crucial for extracting actionable insights from your TikTok scraping efforts. How you structure and analyze your data will shape the quality of your results. Here’s how to get started:

Output formats

JSON storage. JSON is a great format for storing scraped TikTok data, especially when dealing with nested structures like user profiles, post metadata, and engagement stats. It’s lightweight, human-readable, and supported across most programming languages, making it ideal for smaller-scale projects, debugging, or exporting data to other tools.
Database storage. For larger-scale scraping or long-term data collection, databases offer more structure and efficiency. PostgreSQL is a strong choice when you need relational integrity and powerful querying capabilities. On the other hand, MongoDB works well with semi-structured data and plays nicely with JSON-style documents, making it a natural fit for handling complex TikTok records like posts or profile snapshots.

Basic data cleaning tips

Normalize number formats (e.g. convert "1.2M" to 1200000)
Remove HTML artifacts from text fields like bios or captions
Strip whitespace and emojis if you’re doing text-based analysis
Handle missing or null values to prevent errors in your pipeline
Convert timestamps to a consistent format (e.g. ISO 8601)

Example analyses

Trend detection. Track hashtag frequency, sound usage, or caption keywords over time to detect emerging trends. Plot data weekly or daily to visualize spikes and cycles.
Sentiment analysis. Run basic sentiment models on video captions or comment sections to understand public opinion. This can help with brand monitoring, campaign feedback, or competitor analysis.
Influencer identification. Filter profiles by follower count, engagement rate, or niche-related hashtags to identify potential influencer partners. You can also build scoring systems to rank creators by relevance and reach.

Best practices and common pitfalls

Successful TikTok scraping requires careful attention to ethical considerations, technical implementation, and ongoing maintenance. Following these best practices will help you build sustainable and responsible scraping operations.

Respecting TikTok's terms of service and robots.txt

Terms of service compliance. Review TikTok's Terms of Service regularly, as they may change. Generally, terms of service prohibit automated data collection, but enforcement varies. Understanding these restrictions helps you assess legal risks.
Robots.txt guidelines. Check TikTok's robots.txt file at https://www.tiktok.com/robots.txt for crawler directives. While not legally binding, robots.txt provides guidance on the platform's preferences for automated access.
Rate limiting. Implement conservative rate limiting to avoid triggering anti-abuse systems. Start with 1-2 requests per minute and adjust based on your success rates and blocking frequency.

Avoiding overloading TikTok servers

Distributed request patterns. Spread your scraping activities across different time periods and IP addresses. Avoid burst patterns that could strain server resources or trigger security measures.
Efficient data collection. Prioritize the most valuable data points and avoid collecting unnecessary information. This reduces server load and improves your scraping efficiency.
Monitoring and alerting. Implement monitoring systems to track your scraper's performance and error rates. Set up alerts for unusual blocking patterns or system failures.

Keeping your scraper updated with TikTok changes

Regular testing. Test your scrapers frequently to catch breaking changes early, as TikTok updates its interface and anti-bot measures regularly.
Modular design. Structure your code with separate modules for different scraping tasks. This makes it easier to update specific functionality when changes occur.
Selector management. Use flexible selectors that can adapt to minor DOM changes. Implement fallback extraction methods for critical data points.

Alternatives to scraping TikTok

If building and maintaining a custom scraper isn't the right fit for your project, there are a few alternative ways to access TikTok data. Some are official APIs provided by TikTok itself, while others come from third-party platforms. These options are more limited in scope but may work for specific, lower-volume use cases.

TikTok’s official APIs include:

Research API. Designed for academic and non-commercial research, this API provides access to public, anonymized TikTok data such as video metadata, hashtag usage, and user profile information. However, access is currently limited to approved researchers affiliated with US-based institutions, and the program is managed through TikTok’s partnership with the University of Michigan's ICPSR platform. An institutional review board (IRB) approval is typically required.

Marketing API. Intended for advertisers and TikTok partners, this API grants access to ad campaign performance data, audience demographics, and conversion metrics. It’s not suitable for general content scraping, influencer research, or competitor analysis.

Login Kit and Developer Tools. TikTok offers SDKs that enable user authentication or content sharing through TikTok. These tools are useful for app integration but do not provide access to broader data or user activity beyond what’s explicitly authorized by the user.

Overall, official TikTok APIs come with strict access requirements, limited scope, usage quotas, and restrictions on commercial or large-scale data collection. They also don’t provide any access to public competitor content or trend-level data.

For more flexibility, third-party services are often the better choice. They provide access to TikTok data via scraping APIs or ready-made datasets, often with filters for hashtags, accounts, or engagement metrics. Decodo’s Web Scraping API, for example, includes a no-code template for scraping TikTok posts, letting you collect likes, comments, hashtags, and more without writing a single line of code.

Final thoughts

TikTok scraping presents unique challenges but offers tremendous value for businesses, researchers, and developers seeking to understand social media trends and user behavior. This comprehensive guide has covered the essential techniques, tools, and best practices needed to successfully extract TikTok data.

Residential IPs for seamless TikTok scraping

Get your 3-day free trial of residential proxies and enjoy full, unrestricted access.

Try now

About the author

Dominykas Niaura

Technical Copywriter

Dominykas brings a unique blend of philosophical insight and technical expertise to his writing. Starting his career as a film critic and music industry copywriter, he's now an expert in making complex proxy and web scraping concepts accessible to everyone.

Connect with Dominykas via LinkedIn

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

In this article

Industry-leading residential proxies

Access 115M+ residential IPs with fast response times and high success rates.

Start free trial

Is it legal to scrape TikTok?

TikTok’s terms of service discourage the use of automated tools for data extraction. Although the information on the site is publicly visible, scraping without explicit permission may breach their policies and result in actions like IP bans or other restrictions. It’s a good idea to review and follow TikTok’s terms before starting any scraping efforts.

What data can/cannot be scraped?

In general, public data such as usernames, video captions, hashtags, follower counts, and engagement metrics can be scraped. Private data (such as messages, hidden accounts, or non-public analytics) cannot be accessed legally or technically without user consent.

How to extract TikTok data?

Data can be extracted using methods like web scraping with tools such as Playwright or Puppeteer. Keep in mind, these methods often require rotating proxies and careful handling to avoid detection. Alternatively, users often opt for third-party APIs that access public data.

How to scrape TikTok for free?

Free scraping options include building your own scraper with open-source tools like Python + Playwright or using limited free tiers of unofficial TikTok APIs. Just be aware that you’ll likely run into rate limits, CAPTCHAs, and blocks, so expect some technical hurdles.

How to avoid getting blocked?

To reduce the risk of being blocked, use rotating proxies, add delays between requests, and mimic human browsing behavior. Avoid scraping at high speeds or from a single IP. Headless browsers with stealth plugins can also help bypass basic bot detection.

Can I scrape TikTok without coding experience?

It’s challenging but not impossible. Some no-code scraping tools and paid third-party services offer TikTok data access with simple interfaces. If you're looking for an easier option, Decodo’s Web Scraping API includes a ready-made template for scraping TikTok posts – no coding knowledge required. It’s a quick way to get started without building your own scraper.

Does TikTok have an API?

Yes, TikTok offers an official API, but it’s limited to approved partners and use cases like content posting, advertising, and analytics. Public access is restricted, and it’s not designed for large-scale data extraction or research scraping.

SOCIAL MEDIA MARKETING

UNBLOCK

Conquer TikTok With Proxies

If you’re not living in the medieval ages, you’ve heard of TikTok at least a couple of times. But if you are, let me help you – TikTok is a social media app to create, share, and discover short videos. Currently, the platform is on top: it has around 1 billion active users globally (still growing!) and is used by various companies and creators for marketing purposes.

Unfortunately, TikTok has some limitations that may affect your experience. To use the app to the fullest, you may need some help from proxies. So, sit down, take your notebook and get ready for the lesson on how to conquer TikTok with proxies.

Mariam Nakani

Apr 27, 2022

7 min read

NEWS

DATA COLLECTION