Android Performance Testing: Macrobenchmark, Baseline Profiles, and Perfetto
App performance isn't tested — it's measured. Android's Macrobenchmark library gives you a framework for measuring startup time, frame rendering, and custom interactions on real devices. Baseline Profiles use those measurements to pre-compile critical code paths, and Perfetto lets you trace exactly what's happening at the system level.
The Performance Testing Stack
Three tools work together:
- Macrobenchmark — measures app performance (startup, scrolling, interactions)
- Baseline Profiles — pre-compile hot code paths to reduce JIT compilation overhead
- Perfetto — system-level trace viewer for diagnosing what's actually slow
Macrobenchmark Setup
Macrobenchmark runs as a separate instrumented test module. Create a new module (typically macrobenchmark/) with this Gradle config:
// macrobenchmark/build.gradle.kts
plugins {
id("com.android.test")
id("org.jetbrains.kotlin.android")
}
android {
targetProjectPath = ":app"
defaultConfig {
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
testInstrumentationRunnerArguments["androidx.benchmark.suppressErrors"] = "EMULATOR"
}
buildTypes {
create("benchmark") {
isDebuggable = false
signingConfig = signingConfigs.getByName("debug")
matchingFallbacks += "release"
}
}
}
dependencies {
implementation("androidx.benchmark:benchmark-macro-junit4:1.2.4")
implementation("androidx.test.ext:junit:1.1.5")
implementation("androidx.test.uiautomator:uiautomator:2.3.0")
}Add the benchmark build type to the app module as well.
Measuring App Startup
@LargeTest
@RunWith(AndroidJUnit4::class)
class StartupBenchmark {
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun startupColdStart() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(StartupTimingMetric()),
iterations = 5,
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
}
@Test
fun startupWarmStart() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(StartupTimingMetric()),
iterations = 5,
startupMode = StartupMode.WARM
) {
startActivityAndWait()
}
}StartupMode.COLD kills the process before each iteration (worst case). WARM keeps the process alive but destroys the activity (typical background-to-foreground).
Results appear in the test output and are saved as JSON files:
StartupBenchmark_startupColdStart
timeToInitialDisplayMs min=312.4, median=318.7, max=329.1
timeToFullDisplayMs min=445.2, median=452.0, max=481.3Measuring Scrolling Performance
@LargeTest
@RunWith(AndroidJUnit4::class)
class ScrollBenchmark {
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun scrollProductList() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(FrameTimingMetric()),
iterations = 5,
startupMode = StartupMode.WARM
) {
val listSelector = By.res("com.example.myapp:id/product_list")
device.wait(Until.hasObject(listSelector), 3000)
val list = device.findObject(listSelector)
list.setGestureMargin(device.displayWidth / 5)
list.fling(Direction.DOWN)
device.waitForIdle()
}
}FrameTimingMetric captures frame duration (P50, P90, P99) and counts of slow frames (>16ms) and frozen frames (>700ms).
Custom Metrics
Measure specific interactions with TraceSectionMetric:
@Test
fun checkoutFlowPerformance() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(
TraceSectionMetric("CheckoutViewModel#processOrder"),
FrameTimingMetric()
),
iterations = 5,
startupMode = StartupMode.WARM
) {
// Navigate to checkout
val checkoutButton = device.findObject(By.res("checkout_button"))
checkoutButton.click()
device.waitForIdle()
val confirmButton = device.findObject(By.res("confirm_order"))
confirmButton.click()
device.waitForIdle()
}In your app code, wrap the section you want to measure:
fun processOrder(order: Order) {
trace("CheckoutViewModel#processOrder") {
// expensive order processing
validateOrder(order)
applyDiscounts(order)
submitToBackend(order)
}
}Baseline Profiles
Baseline Profiles tell the Android runtime which code paths to pre-compile using AOT. This reduces JIT compilation at runtime and improves startup time by 20-40% on first launch.
Generate a Baseline Profile using Macrobenchmark:
@OptIn(ExperimentalBaselineProfilesApi::class)
class BaselineProfileGenerator {
@get:Rule
val rule = BaselineProfileRule()
@Test
fun generateBaselineProfile() = rule.collect(
packageName = "com.example.myapp"
) {
pressHome()
startActivityAndWait()
// Cover critical user journeys
navigateToProductList()
scrollProductList()
openProductDetail()
addToCart()
openCheckout()
}
}Run the generator:
./gradlew :macrobenchmark:connectedBenchmarkAndroidTest \
-Pandroid.testInstrumentationRunnerArguments.class=com.example.BaselineProfileGeneratorThis outputs a baseline-prof.txt file. Copy it to app/src/main/baseline-prof.txt.
Build with Baseline Profiles enabled:
android {
buildTypes {
release {
baselineProfileFiles.add(project.layout.projectDirectory.file("baseline-prof.txt"))
}
}
}Verifying Baseline Profile Impact
Compare startup with and without the profile:
@Test
fun startupWithBaselineProfile() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(StartupTimingMetric()),
compilationMode = CompilationMode.Full(), // with profile
iterations = 5,
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
}
@Test
fun startupWithoutProfile() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(StartupTimingMetric()),
compilationMode = CompilationMode.None(), // JIT only
iterations = 5,
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
}A 200ms improvement on cold start is common after adding Baseline Profiles for a standard app.
Perfetto Traces
Macrobenchmark automatically captures Perfetto traces. They're saved alongside test results and can be opened in ui.perfetto.dev.
Key tracks to examine:
- Main thread — look for long frames,
choreographer,View#draw - RenderThread — GPU sync and draw calls
- Binder calls — IPC overhead
- App process — coroutine scheduling, custom trace sections
For manual trace capture during development:
# Start tracing
adb shell perfetto \
-c - --txt \
-o /data/misc/perfetto-traces/trace \
<<<span class="hljs-string">EOF
buffers: { size_kb: 63488 fill_policy: RING_BUFFER }
data_sources: { config { name: "linux.ftrace" ftrace_config { ftrace_events: "sched/sched_switch" ftrace_events: "power/suspend_resume" } } }
data_sources: { config { name: "android.gpu.memory" } }
data_sources: { config { name: "track_event" } }
duration_ms: 10000
EOF
<span class="hljs-comment"># Pull the trace
adb pull /data/misc/perfetto-traces/trace ./my-trace.perfettoOpen in Perfetto UI: ui.perfetto.dev → Open trace file.
Running in CI
Macrobenchmark requires a physical device or non-emulated environment for meaningful results (emulators produce noisy measurements). In CI:
# .github/workflows/benchmark.yml
jobs:
benchmark:
runs-on: [self-hosted, android-device]
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: |
./gradlew :macrobenchmark:connectedBenchmarkAndroidTest \
-Pandroid.testInstrumentationRunnerArguments.androidx.benchmark.suppressErrors=EMULATOR \
-Pandroid.testInstrumentationRunnerArguments.class=com.example.StartupBenchmark
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: macrobenchmark/build/outputs/connected_android_test_additional_output/For emulator-based CI (noisy but better than nothing), add suppressErrors=EMULATOR to suppress the emulator warning.
Performance Budgets
Define what "acceptable" performance means and enforce it:
@Test
fun coldStartMustBeFasterThan500ms() = benchmarkRule.measureRepeated(
packageName = "com.example.myapp",
metrics = listOf(StartupTimingMetric()),
iterations = 5,
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
}.also { result ->
val median = result.getMetricResult("timeToInitialDisplayMs").median
assertThat(median).isLessThan(500.0)
}This makes performance regressions visible in CI as test failures.
Summary
Performance testing is not optional for Android apps — users notice 100ms differences in startup and scroll jank immediately. Macrobenchmark gives you reproducible measurements. Baseline Profiles apply those measurements to pre-compile critical paths. Perfetto shows you exactly what's slow at the system level. Set up the benchmark module early, establish baseline numbers before the first release, and track them in CI. Catching a 200ms regression before shipping is trivially easy. Finding it after users start complaining is much harder.