Best practices for querying logs
This guide will help you write efficient LogQL queries in our Loki setup. Following these practices will make your queries faster and use fewer resources.
Understanding the Basics
What Makes Queries Fast?
Our Loki setup uses bloom filters for query acceleration. Think of bloom filters as a quick index that helps Loki know which data chunks might contain what you're looking for, without having to read everything.
Structured Metadata vs Stream Labels
Before diving into best practices, understand these two concepts:
- Stream labels: The labels in curly braces
{app="nginx", namespace="prod"}
- Structured metadata: Additional data attached to logs that you filter with pipe operators
| key="value"
Essential Best Practices
1. Start Small, Then Expand
Always test your queries on small time ranges first:
- Start with 5 minutes or 1 hour
- Once it works, expand to larger ranges
- This prevents accidental resource overuse
2. Use Labels to Filter First
Loki indexes labels, not log content. Always start with label filters:
✅ Good - Fast:
{app="myapp", namespace="prod"} |= "error"
❌ Bad - Very slow:
{} |= "error"
The second query scans ALL logs in the system!
3. Order Matters for Performance
To use query acceleration with bloom filters, place filters in the right order:
✅ Fast - Filters before parsing:
{app="nginx"}
| detected_level="error" # Filter first
| json # Parse second
❌ Slow - Parsing before filtering:
{app="nginx"}
| json # Parse first (slow!)
| detected_level="error" # Filter second
4. Use Simple String Matching When Possible
|=
(exact match) is faster than|~
(regex)- Only use regex when you really need pattern matching
✅ Fast:
{app="nginx"} |= "404"
❌ Slower:
{app="nginx"} |~ "40[0-9]"
5. Parse Only What You Need
JSON and logfmt parsing are expensive. Only parse when necessary:
✅ Good - Filter first, then parse:
{app="api"}
|= "orderId" # Filter to relevant logs
| json orderId # Parse only the field you need
| orderId="12345"
❌ Bad - Parse everything:
{app="api"}
| json # Parses all fields in every log!
| orderId="12345"
Working with Metrics and Aggregations
6. Aggregate After Filtering
When using functions like count_over_time
or rate
, filter your data first:
✅ Efficient:
{app="nginx"}
|= "error" # Filter first
| count_over_time[5m] # Then count
❌ Inefficient:
count_over_time({app="nginx"}[5m]) # Counts everything, even non-errors
7. Use Small Time Windows
For aggregation functions, use the smallest window that gives you the resolution you need:
✅ Good:
sum(rate({app="payment"} |= "error" [5m]))
❌ Resource-intensive:
sum(rate({app="payment"} |= "error" [1h]))
Advanced Tips
8. Avoid High-Cardinality Labels
Don't use labels with millions of unique values (like trace_id
, user_id
, session_id
) as stream labels. Keep them in structured metadata instead.
9. Structure Complex Queries in Stages
For complex queries, think in three stages:
- Narrow down: Use stream labels and simple filters
- Parse: Extract only needed fields from the filtered logs
- Aggregate: Summarize the results
Example:
{app="payment", namespace="prod"} # 1. Narrow with labels
|= "timeout" # 1. Further narrow with string filter
| json status # 2. Parse only status field
| status="500" # 2. Filter parsed data
| count_over_time[5m] # 3. Aggregate
10. Watch Your Stream Count
Each unique combination of labels creates a "stream". If your query matches too many streams (50,000+), it will be slow regardless of other optimizations.
Check stream count with:
count(count_over_time({app="myapp"}[1m])) by (labels)
Quick Reference: Do's and Don'ts
Do ✅ | Don't ❌ |
---|---|
Start with label selectors | Use {} without any labels |
Filter before parsing | Parse before filtering |
Use |= for exact matches | Use |~ regex unnecessarily |
Test on small time ranges | Start with 30-day queries |
Aggregate after filtering | Aggregate raw data |
Keep high-cardinality data in structured metadata | Use high-cardinality stream labels |
Additional Performance Considerations
11. Avoid Querying During Peak Hours
If you're doing deep scans (e.g., across days or multiple services), try off-peak hours to avoid throttling and reduce impact on other users.
12. Query Cost Awareness
Make sure developers understand that log queries can be expensive. Encourage:
- Dashboards: Use metrics for high-frequency monitoring, not logs
- Logs: Reserve for debugging or audit trails
- Explore vs Dashboards: Use Explore for ad hoc queries, Dashboards for focused queries
Warning: Dashboard queries auto-refresh frequently! Avoid embedding expensive queries into dashboards.
Need More Help?
- For getting started with Loki/Grafana, see Getting Started Guide
- Contact the Platon team (#platon) if you need assistance optimizing specific queries