Android Performance Testing: Macrobenchmark, Baseline Profiles, and Perfetto

Android Performance Testing: Macrobenchmark, Baseline Profiles, and Perfetto

App performance isn't tested — it's measured. Android's Macrobenchmark library gives you a framework for measuring startup time, frame rendering, and custom interactions on real devices. Baseline Profiles use those measurements to pre-compile critical code paths, and Perfetto lets you trace exactly what's happening at the system level.

The Performance Testing Stack

Three tools work together:

  • Macrobenchmark — measures app performance (startup, scrolling, interactions)
  • Baseline Profiles — pre-compile hot code paths to reduce JIT compilation overhead
  • Perfetto — system-level trace viewer for diagnosing what's actually slow

Macrobenchmark Setup

Macrobenchmark runs as a separate instrumented test module. Create a new module (typically macrobenchmark/) with this Gradle config:

// macrobenchmark/build.gradle.kts
plugins {
    id("com.android.test")
    id("org.jetbrains.kotlin.android")
}

android {
    targetProjectPath = ":app"

    defaultConfig {
        testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
        testInstrumentationRunnerArguments["androidx.benchmark.suppressErrors"] = "EMULATOR"
    }

    buildTypes {
        create("benchmark") {
            isDebuggable = false
            signingConfig = signingConfigs.getByName("debug")
            matchingFallbacks += "release"
        }
    }
}

dependencies {
    implementation("androidx.benchmark:benchmark-macro-junit4:1.2.4")
    implementation("androidx.test.ext:junit:1.1.5")
    implementation("androidx.test.uiautomator:uiautomator:2.3.0")
}

Add the benchmark build type to the app module as well.

Measuring App Startup

@LargeTest
@RunWith(AndroidJUnit4::class)
class StartupBenchmark {

    @get:Rule
    val benchmarkRule = MacrobenchmarkRule()

    @Test
    fun startupColdStart() = benchmarkRule.measureRepeated(
        packageName = "com.example.myapp",
        metrics = listOf(StartupTimingMetric()),
        iterations = 5,
        startupMode = StartupMode.COLD
    ) {
        pressHome()
        startActivityAndWait()
    }

    @Test
    fun startupWarmStart() = benchmarkRule.measureRepeated(
        packageName = "com.example.myapp",
        metrics = listOf(StartupTimingMetric()),
        iterations = 5,
        startupMode = StartupMode.WARM
    ) {
        startActivityAndWait()
    }
}

StartupMode.COLD kills the process before each iteration (worst case). WARM keeps the process alive but destroys the activity (typical background-to-foreground).

Results appear in the test output and are saved as JSON files:

StartupBenchmark_startupColdStart
  timeToInitialDisplayMs   min=312.4,   median=318.7,   max=329.1
  timeToFullDisplayMs      min=445.2,   median=452.0,   max=481.3

Measuring Scrolling Performance

@LargeTest
@RunWith(AndroidJUnit4::class)
class ScrollBenchmark {

    @get:Rule
    val benchmarkRule = MacrobenchmarkRule()

    @Test
    fun scrollProductList() = benchmarkRule.measureRepeated(
        packageName = "com.example.myapp",
        metrics = listOf(FrameTimingMetric()),
        iterations = 5,
        startupMode = StartupMode.WARM
    ) {
        val listSelector = By.res("com.example.myapp:id/product_list")
        device.wait(Until.hasObject(listSelector), 3000)

        val list = device.findObject(listSelector)
        list.setGestureMargin(device.displayWidth / 5)
        list.fling(Direction.DOWN)

        device.waitForIdle()
    }
}

FrameTimingMetric captures frame duration (P50, P90, P99) and counts of slow frames (>16ms) and frozen frames (>700ms).

Custom Metrics

Measure specific interactions with TraceSectionMetric:

@Test
fun checkoutFlowPerformance() = benchmarkRule.measureRepeated(
    packageName = "com.example.myapp",
    metrics = listOf(
        TraceSectionMetric("CheckoutViewModel#processOrder"),
        FrameTimingMetric()
    ),
    iterations = 5,
    startupMode = StartupMode.WARM
) {
    // Navigate to checkout
    val checkoutButton = device.findObject(By.res("checkout_button"))
    checkoutButton.click()
    device.waitForIdle()

    val confirmButton = device.findObject(By.res("confirm_order"))
    confirmButton.click()
    device.waitForIdle()
}

In your app code, wrap the section you want to measure:

fun processOrder(order: Order) {
    trace("CheckoutViewModel#processOrder") {
        // expensive order processing
        validateOrder(order)
        applyDiscounts(order)
        submitToBackend(order)
    }
}

Baseline Profiles

Baseline Profiles tell the Android runtime which code paths to pre-compile using AOT. This reduces JIT compilation at runtime and improves startup time by 20-40% on first launch.

Generate a Baseline Profile using Macrobenchmark:

@OptIn(ExperimentalBaselineProfilesApi::class)
class BaselineProfileGenerator {

    @get:Rule
    val rule = BaselineProfileRule()

    @Test
    fun generateBaselineProfile() = rule.collect(
        packageName = "com.example.myapp"
    ) {
        pressHome()
        startActivityAndWait()

        // Cover critical user journeys
        navigateToProductList()
        scrollProductList()
        openProductDetail()
        addToCart()
        openCheckout()
    }
}

Run the generator:

./gradlew :macrobenchmark:connectedBenchmarkAndroidTest \
  -Pandroid.testInstrumentationRunnerArguments.class=com.example.BaselineProfileGenerator

This outputs a baseline-prof.txt file. Copy it to app/src/main/baseline-prof.txt.

Build with Baseline Profiles enabled:

android {
    buildTypes {
        release {
            baselineProfileFiles.add(project.layout.projectDirectory.file("baseline-prof.txt"))
        }
    }
}

Verifying Baseline Profile Impact

Compare startup with and without the profile:

@Test
fun startupWithBaselineProfile() = benchmarkRule.measureRepeated(
    packageName = "com.example.myapp",
    metrics = listOf(StartupTimingMetric()),
    compilationMode = CompilationMode.Full(),  // with profile
    iterations = 5,
    startupMode = StartupMode.COLD
) {
    pressHome()
    startActivityAndWait()
}

@Test
fun startupWithoutProfile() = benchmarkRule.measureRepeated(
    packageName = "com.example.myapp",
    metrics = listOf(StartupTimingMetric()),
    compilationMode = CompilationMode.None(),  // JIT only
    iterations = 5,
    startupMode = StartupMode.COLD
) {
    pressHome()
    startActivityAndWait()
}

A 200ms improvement on cold start is common after adding Baseline Profiles for a standard app.

Perfetto Traces

Macrobenchmark automatically captures Perfetto traces. They're saved alongside test results and can be opened in ui.perfetto.dev.

Key tracks to examine:

  • Main thread — look for long frames, choreographer, View#draw
  • RenderThread — GPU sync and draw calls
  • Binder calls — IPC overhead
  • App process — coroutine scheduling, custom trace sections

For manual trace capture during development:

# Start tracing
adb shell perfetto \
  -c - --txt \
  -o /data/misc/perfetto-traces/trace \
  <<<span class="hljs-string">EOF
buffers: { size_kb: 63488 fill_policy: RING_BUFFER }
data_sources: { config { name: "linux.ftrace" ftrace_config { ftrace_events: "sched/sched_switch" ftrace_events: "power/suspend_resume" } } }
data_sources: { config { name: "android.gpu.memory" } }
data_sources: { config { name: "track_event" } }
duration_ms: 10000
EOF

<span class="hljs-comment"># Pull the trace
adb pull /data/misc/perfetto-traces/trace ./my-trace.perfetto

Open in Perfetto UI: ui.perfetto.dev → Open trace file.

Running in CI

Macrobenchmark requires a physical device or non-emulated environment for meaningful results (emulators produce noisy measurements). In CI:

# .github/workflows/benchmark.yml
jobs:
  benchmark:
    runs-on: [self-hosted, android-device]
    steps:
      - uses: actions/checkout@v4
      - name: Run benchmarks
        run: |
          ./gradlew :macrobenchmark:connectedBenchmarkAndroidTest \
            -Pandroid.testInstrumentationRunnerArguments.androidx.benchmark.suppressErrors=EMULATOR \
            -Pandroid.testInstrumentationRunnerArguments.class=com.example.StartupBenchmark
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: macrobenchmark/build/outputs/connected_android_test_additional_output/

For emulator-based CI (noisy but better than nothing), add suppressErrors=EMULATOR to suppress the emulator warning.

Performance Budgets

Define what "acceptable" performance means and enforce it:

@Test
fun coldStartMustBeFasterThan500ms() = benchmarkRule.measureRepeated(
    packageName = "com.example.myapp",
    metrics = listOf(StartupTimingMetric()),
    iterations = 5,
    startupMode = StartupMode.COLD
) {
    pressHome()
    startActivityAndWait()
}.also { result ->
    val median = result.getMetricResult("timeToInitialDisplayMs").median
    assertThat(median).isLessThan(500.0)
}

This makes performance regressions visible in CI as test failures.

Summary

Performance testing is not optional for Android apps — users notice 100ms differences in startup and scroll jank immediately. Macrobenchmark gives you reproducible measurements. Baseline Profiles apply those measurements to pre-compile critical paths. Perfetto shows you exactly what's slow at the system level. Set up the benchmark module early, establish baseline numbers before the first release, and track them in CI. Catching a 200ms regression before shipping is trivially easy. Finding it after users start complaining is much harder.

Read more