Playwright Advanced — Web Scraping, Testing & Automation
Why Playwright Over Puppeteer/Selenium?
| Playwright | Puppeteer | Selenium | |
|---|---|---|---|
| Browsers | Chromium, Firefox, WebKit | Chromium only | All (via drivers) |
| Auto-waits | ✅ Smart waits built-in | Manual | Manual |
| Network interception | ✅ Full | Limited | Limited |
| Parallel execution | ✅ Native | Manual | With Grid |
| Multiple tabs/windows | ✅ | ✅ | ✅ |
| Mobile emulation | ✅ | ✅ | Limited |
| Tracing/debugging | ✅ Built-in | Limited | External tools |
| API testing | ✅ (APIRequestContext) | ❌ | ❌ |
Core Architecture
Playwright Test Runner
├── Browser (Chromium / Firefox / WebKit)
│ └── BrowserContext (isolated — like incognito per test)
│ ├── Page (tab)
│ └── Page (tab)
└── Worker (parallel test execution)Each BrowserContext is fully isolated: separate cookies, storage, network state. Creates a "new user" per test without relaunching the browser.
Setup & Configuration
bashnpm init playwright@latest
npx playwright install # downloads browsersts// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests',
fullyParallel: true, // run all tests in parallel
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 4 : undefined,
reporter: [['html', { open: 'never' }], ['list']],
timeout: 30_000,
use: {
baseURL: 'http://localhost:3000',
trace: 'on-first-retry', // capture trace on first retry
screenshot: 'only-on-failure',
video: 'retain-on-failure',
headless: true,
},
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
{ name: 'mobile', use: { ...devices['iPhone 14'] } },
],
webServer: {
command: 'npm run dev',
url: 'http://localhost:3000',
reuseExistingServer: !process.env.CI,
},
});Smart Locators (Anti-Fragile Selectors)
ts// BEST — semantic, role-based (accessible + stable)
page.getByRole('button', { name: 'Submit' })
page.getByRole('textbox', { name: 'Email' })
page.getByRole('heading', { level: 1 })
page.getByLabel('Password')
page.getByPlaceholder('Enter email...')
page.getByText('Welcome back')
page.getByAltText('Company logo')
page.getByTitle('Close dialog')
page.getByTestId('submit-btn') // data-testid="submit-btn"
// OK — CSS/XPath when semantic isn't possible
page.locator('.card:nth-child(2)')
page.locator('xpath=//button[@type="submit"]')
page.locator('text=Submit').first()
// BAD — fragile, implementation-coupled
page.locator('#app > div > form > button:nth-child(3)')
page.locator('[class="btn-primary btn-large"]')Chaining & filtering
ts// Find "Edit" button inside the row with "John Doe"
const row = page.getByRole('row', { name: 'John Doe' });
await row.getByRole('button', { name: 'Edit' }).click();
// Filter by has text
page.locator('.card').filter({ hasText: 'Premium Plan' }).click();
// nth match
page.locator('.item').nth(2).click();
page.locator('.item').first().click();
page.locator('.item').last().click();Actions & Auto-Waiting
Playwright automatically waits for elements to be:
- Visible
- Stable (not animating)
- Enabled
- Attached to DOM
ts// Interactions — all auto-wait
await page.goto('https://example.com');
await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
await page.getByRole('textbox', { name: 'Password' }).fill('secret');
await page.getByRole('button', { name: 'Login' }).click();
// Select dropdown
await page.getByRole('combobox', { name: 'Country' }).selectOption('US');
// Checkbox
await page.getByRole('checkbox', { name: 'Remember me' }).check();
await page.getByRole('checkbox').uncheck();
// Upload file
await page.getByLabel('Upload').setInputFiles('path/to/file.pdf');
await page.getByLabel('Upload').setInputFiles(['file1.png', 'file2.png']);
// Keyboard
await page.getByRole('textbox').press('Enter');
await page.keyboard.press('Control+A');
await page.keyboard.type('Hello World', { delay: 50 });
// Hover
await page.getByRole('button', { name: 'Menu' }).hover();
// Drag and drop
await page.locator('#source').dragTo(page.locator('#target'));
// Scroll
await page.mouse.wheel(0, 500);
await page.locator('.infinite-list').scroll({ direction: 'down', speed: 'fast' });Assertions
tsimport { expect } from '@playwright/test';
// Element assertions (auto-retry until timeout)
await expect(page.getByRole('heading')).toBeVisible();
await expect(page.getByRole('button', { name: 'Submit' })).toBeEnabled();
await expect(page.getByRole('checkbox')).toBeChecked();
await expect(page.locator('.spinner')).toBeHidden();
await expect(page.getByRole('textbox')).toBeEmpty();
await expect(page.getByRole('textbox')).toHaveValue('expected@email.com');
await expect(page.locator('.count')).toHaveText('42');
await expect(page.locator('.count')).toContainText('42');
await expect(page.locator('ul li')).toHaveCount(5);
await expect(page.locator('.card')).toHaveClass(/active/);
await expect(page.locator('.card')).toHaveCSS('color', 'rgb(255, 0, 0)');
await expect(page.locator('img')).toHaveAttribute('alt', 'Logo');
// Page assertions
await expect(page).toHaveURL('https://example.com/dashboard');
await expect(page).toHaveURL(/dashboard/);
await expect(page).toHaveTitle('My App - Dashboard');
// Soft assertions — don't stop on failure, collect all
await expect.soft(page.getByText('Name')).toBeVisible();
await expect.soft(page.getByText('Email')).toBeVisible();
// test continues even if above fail
expect(page.errors).toHaveLength(0);Network Interception & Mocking
tstest('mock API response', async ({ page }) => {
// Intercept and mock
await page.route('**/api/users', route =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify([{ id: 1, name: 'Tarun' }]),
})
);
await page.goto('/users');
await expect(page.getByText('Tarun')).toBeVisible();
});
test('mock with delay (simulate slow network)', async ({ page }) => {
await page.route('**/api/data', async route => {
await new Promise(r => setTimeout(r, 2000));
await route.fulfill({ status: 200, body: '{"data":[]}' });
});
});
test('intercept and modify request headers', async ({ page }) => {
await page.route('**/api/**', route =>
route.continue({
headers: { ...route.request().headers(), 'X-Test': 'true' }
})
);
});
test('block tracking / ads', async ({ page }) => {
await page.route('**/*.{png,jpg,jpeg,gif,webp}', route => route.abort());
await page.route('**/analytics/**', route => route.abort());
});
// Capture requests
const requests: string[] = [];
page.on('request', req => requests.push(req.url()));
page.on('response', res => console.log(res.url(), res.status()));Advanced Patterns
Page Object Model (POM)
ts// pages/LoginPage.ts
import { type Page, expect } from '@playwright/test';
export class LoginPage {
constructor(private page: Page) {}
async goto() {
await this.page.goto('/login');
}
async login(email: string, password: string) {
await this.page.getByLabel('Email').fill(email);
await this.page.getByLabel('Password').fill(password);
await this.page.getByRole('button', { name: 'Login' }).click();
}
async expectLoginError(message: string) {
await expect(this.page.getByRole('alert')).toContainText(message);
}
}
// tests/login.spec.ts
import { test } from '@playwright/test';
import { LoginPage } from '../pages/LoginPage';
test('invalid credentials shows error', async ({ page }) => {
const loginPage = new LoginPage(page);
await loginPage.goto();
await loginPage.login('bad@email.com', 'wrongpass');
await loginPage.expectLoginError('Invalid credentials');
});Custom fixtures
ts// fixtures.ts
import { test as base } from '@playwright/test';
import { LoginPage } from './pages/LoginPage';
import { DashboardPage } from './pages/DashboardPage';
type Fixtures = {
loginPage: LoginPage;
dashboardPage: DashboardPage;
authenticatedPage: Page; // pre-logged-in page
};
export const test = base.extend<Fixtures>({
loginPage: async ({ page }, use) => {
await use(new LoginPage(page));
},
dashboardPage: async ({ page }, use) => {
await use(new DashboardPage(page));
},
// Fixture that logs in before every test
authenticatedPage: async ({ page }, use) => {
await page.goto('/login');
await page.getByLabel('Email').fill('admin@example.com');
await page.getByLabel('Password').fill('password');
await page.getByRole('button', { name: 'Login' }).click();
await page.waitForURL('/dashboard');
await use(page);
},
});Reuse authentication state (massive speed boost)
ts// auth.setup.ts — runs once, saves storage state
import { test as setup } from '@playwright/test';
setup('authenticate', async ({ page }) => {
await page.goto('/login');
await page.getByLabel('Email').fill('user@test.com');
await page.getByLabel('Password').fill('password');
await page.getByRole('button', { name: 'Login' }).click();
await page.waitForURL('/dashboard');
// Save storage state (cookies + localStorage)
await page.context().storageState({ path: 'auth.json' });
});
// playwright.config.ts — use saved state
projects: [
{ name: 'setup', testMatch: /auth\.setup\.ts/ },
{
name: 'chromium',
use: { storageState: 'auth.json' }, // pre-authenticated!
dependencies: ['setup'],
},
]Scraping Patterns
ts// Extract structured data
const products = await page.locator('.product-card').evaluateAll(cards =>
cards.map(card => ({
name: card.querySelector('.name')?.textContent?.trim(),
price: card.querySelector('.price')?.textContent?.trim(),
href: card.querySelector('a')?.href,
}))
);
// Paginate through results
const results: Product[] = [];
let pageNum = 1;
while (true) {
await page.goto(`/products?page=${pageNum}`);
const items = await page.locator('.product').all();
if (items.length === 0) break;
for (const item of items) {
results.push({
name: await item.locator('.name').textContent(),
price: await item.locator('.price').textContent(),
});
}
const nextBtn = page.getByRole('link', { name: 'Next' });
if (!(await nextBtn.isVisible())) break;
pageNum++;
}
// Handle infinite scroll
while (true) {
const prevCount = await page.locator('.item').count();
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(1000);
const newCount = await page.locator('.item').count();
if (newCount === prevCount) break; // no more items loaded
}AI-Driven Automation (Playwright + LLM)
Playwright is increasingly used with LLMs for intelligent scraping:
tsimport Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Take screenshot, ask Claude what to click
async function intelligentClick(page: Page, goal: string) {
const screenshot = await page.screenshot({ encoding: 'base64' });
const response = await client.messages.create({
model: 'claude-opus-4-6',
max_tokens: 200,
messages: [{
role: 'user',
content: [
{ type: 'image', source: { type: 'base64', media_type: 'image/png', data: screenshot } },
{ type: 'text', text: `Goal: ${goal}\nReturn the CSS selector or text of the element to click. JSON: {"selector": "...", "text": "..."}` }
]
}]
});
const { text } = JSON.parse(response.content[0].text);
await page.getByText(text).click();
}Debugging
bash# Run in headed mode with slow-mo
npx playwright test --headed --slowmo=500
# Run one test with debug inspector
npx playwright test login.spec.ts --debug
# Codegen — record actions as test code
npx playwright codegen https://example.com
# View HTML report
npx playwright show-report
# View trace (recorded on failure)
npx playwright show-trace trace.zipts// In test — pause execution and open browser inspector
await page.pause();
// Verbose logging
process.env.DEBUG = 'pw:api';Interview Q&A
Q: How does Playwright's auto-waiting work?
Before each action (click, fill, etc.) Playwright runs actionability checks: the element must be visible, stable (not mid-animation), attached to DOM, and enabled. It retries these checks up to the configured timeout (default 30s). This eliminates 90% of flaky test issues that come from manual sleep()/waitForTimeout() calls in other frameworks.
Q: How do you handle authentication in Playwright so you're not logging in for every test?
Use storage state. Run a setup project once that logs in and saves page.context().storageState() to a JSON file. All subsequent tests in the project use storageState: 'auth.json' to start pre-authenticated. No repeated login, no performance penalty, no network calls for auth in your tests.
Q: How do you test a multi-step form reliably across slow networks?
Use network interception to simulate conditions: page.route('**/submit', route => route.fulfill({ delay: 3000 })). Assert loading states appear and disappear. Assert error handling works when route.fulfill({ status: 500 }). Never use waitForTimeout — use waitForResponse or wait for specific DOM changes.
Q: What's the difference between page.locator() and page.$?
page.locator() returns a Locator object that is lazy — it doesn't query the DOM until you interact with it. It retries automatically and supports chaining/filtering. page.$() executes immediately and returns an ElementHandle or null — no retries, no auto-waiting. Use Locator for all tests; ElementHandle is essentially deprecated for test code.