This guide covers how to get usage events into Metronome, what data to send, and best practices for getting the most out of the platform. After reading this, you'll be well on your way to integrating with Metronome, but don't forget that we're here to help. Once you have an idea of what you want to do, talk it over with our team so we can give advice and technical assistance.
If you haven't already, we recommend first reading Understanding Metronome so you're familiar with the Metronome core concepts.
Sending usage events to Metronome
There are two ways to send usage events to Metronome. You can send events via the Metronome API or connect Metronome to Segment. In either case, a usage event is a JSON object with the following fields:
"transaction_id": string, // (required) unique identifier for this event
"customer_id": string, // (required) which customer the event applies to
"timestamp": string, // (required) when the event happened
"event_type": string, // (required) the kind of event, e.g. page_view or sent_email
"properties": object, // (optional) key/value pairs with details of the event
transaction_id is used by Metronome to ignore duplicate events. Once a usage event has been accepted with a given transaction ID, subsequent events within the next 34 days with the same ID will be treated as duplicates and ignored. The
timestamp needs to be an RFC 3339 string with a 4-digit year, such as
2021-01-23T01:23:45Z. When querying usage data or producing an invoice, this field will be used to select only events that happened in a certain time range. Timestamps more than 24 hours in the future will be rejected by the API.
customer_id specifies which of your customers is responsible for any billing associated with the event. There are two ways to identify a Metronome customer in usage events. You can use the Metronome customer ID, or you can use an ingest alias. Each customer in Metronome can have any number of ingest aliases, and usage events with a
customer_id matching any of those aliases will count towards that customer's usage. Ingest aliases can be useful in the case where you want to send events using an identifier from your system, e.g. an email address or account number.
event_type works along with the
properties map to describe the details of the event. For example, a content delivery network (CDN) might generate events of the type
http_request with properties like
bytes_sent to support billing based on data transfer. They might also generate a different type of event,
cache_invalidation, with a property
All keys and values in the
properties map should be represented as strings, even though the values will often be numeric. This is to prevent the loss of precision that often occurs in systems that use floating point numbers. Internally, Metronome uses arbitrary precision decimals to provide exact results of computation.
Designing good usage events
Metronome is only as good as the data you provide, so it's worth spending some time to design your usage events well. Fortunately, there are three strategies to help get things right: work backwards from what you need, work forwards from what you have, and maximize your flexibility.
The following sections will work through a hypothetical scenario where you're a developer at a content delivery network (CDN) company. You've been tasked with integrating your system with Metronome to support usage-based billing.
Work backwards from what you need
It's important to start with your existing requirements. In the case of your hypothetical CDN, your team has decided to charge customers based on their monthly data usage, though the exact pricing details are still unknown. This is not a problem because pricing can be applied and adjusted later as long as you have the metrics you need in place.
Your customer support team also wants to take advantage of Metronome as a real-time data platform to be able to notify customers when there's an unusual spike in traffic for their sites.
For both invoicing and notification, you need to measure data transfer, so the bare minimum usage event might look like this:
This supports a billable metric like "sum of
bytes for all events of type
transfer (for a given customer, for a given billing period)." This is a good start, but it leaves some open questions:
- When should these events be sent? Your system could track total data transfer internally and send a daily per-customer summary to Metronome, or you could send Metronome an event every time a web page is served. Both would give you the same invoicing ability at the end of each month, but are there advantages to choosing one or the other?
- Should other data be included? There's a lot more information that could potentially go into these usage events. For example, you could include which data center served the page, what domain was being hosted, the type of file, or even what URL was accessed. None of this is immediately necessary for invoicing, but it could be useful in other ways.
To answer these questions, you need to consider what data you have available and how that data might help in the future.
Work forwards from what you have
The timing and content of usage events are often heavily influenced by what is available in your existing system. At your hypothetical CDN company, imagine that you perform global log aggregation with something like Apache Flume. In this case, there's a central data store with detailed access logs available, so it's probably easiest to send those log messages to Metronome as they arrive to Metronome in the form of individual usage events.
But now imagine that you don't have such global aggregation. Instead, each data center keeps its own independent log and sends hourly summaries back to the central data store, broken down by domain. You'd like to have each data center send usage events directly to Metronome, but unfortunately the code there doesn't have access to the customer database, so it can't determine what
customer_id to fill in for each event. In this case, the hourly summaries are probably the best option. From the central location, it's easy to look up the owner of each domain and provide the appropriate
Before deciding to send the hourly summaries, you check back with the customer support team about those traffic spike notifications they wanted. They assure you that the hourly cadence is fast enough for the notifications they want to send.
Now that you've settled on how often to send usage events, you still need to decide exactly what information to include.
Maximize your flexibility
Requirements and circumstances change. Instead of trying to predict all your future business needs, aim to maximize flexibility so changes are easy.
In Metronome, flexibility is maximized when you send as much data as possible. Metronome's stream pipeline can handle high event throughput, and irrelevant data is discarded during processing. This means there's no downside to sending information that isn't going to be used right away.
There is, however, a big upside to sending extra information. Suppose that your hypothetical CDN starts getting feedback from customers that they don't understand their bills. Many customers are responsible for more than one domain and would like to be able to see how much each domain is contributing to their total usage. Your executives ask you to fix this.
If your usage event didn't include the domain, you'd need to go back into your code and add it. But you chose to send as much data as possible, so it's already there:
All you need to do is query the Metronome API for usage data grouped by
domain! Your customers are happy with the new breakdown on their invoices.
As time passes and your customer base grows, the finance team discovers a worrisome problem. Your company has been billing customers based on their total data transfer, but your bandwidth costs are different in different parts of the world. In some cases, you're actually losing money by undercharging for data transfer in certain regions.
As before, if your usage events didn't include information about where the data transfer occurred, you'd have to go back into your code and add it. But because you already decided to send as much data as possible, there's a
data_center field that will work. Billable metrics in Metronome can filter usage events in a variety of ways, so you are able to use a mapping of data center names to regions to define a new billable metric for each region. Going forward, you can bill based on those new metrics, where you can set individual prices for each region.
Recall that Metronome operates on streams, so changes you make will only affect future data collection and aggregation. This means that a new billable metric cannot be applied to historic data.
Tips and best practices
Queue and retry
If usage events are lost on their way to Metronome, this directly translates into lost revenue. If you're sending events via the API, you need to be resilient to failures such as network issues or process crashes. A good way to gain this resilience is to put your usage events on a reliable queue such as Amazon SQS or RabbitMQ and have a process pull from that queue and push events to Metronome.
If your call to the Metronome
/ingest endpoint fails with a network error or a
5xx HTTP status code, some of your events may have been ingested, but others may not. You should always retry a failed call to
/ingest until you receive a
200 status code. The unique
transaction_id in each event prevents duplicate processing, so retries are always safe.
If a call to
/ingest fails with a
4xx HTTP status code, this indicates an issue with the payload. Do not automatically retry such a call. Instead, put it aside in a dead letter queue and trigger an alarm so you can investigate the failure and resolve the issue.
Message queue logging
When first integrating with Metronome, it's helpful to enable logging in your message queue. This lets you audit exactly what usage events are being sent to Metronome. We recommend also enabling logging any time you're making a change to your usage events.
Trial ingestion resilience
To test your system’s response to elevated error rates from Metronome’s API, Metronome can set up an automatic failure rate of your choice (we suggest 20%). Simply reach out to your Metronome representative, and specify the % failure rate, if you prefer it to affect your sandbox or production instance, and when you’d like to enable and disable the test.
Aggregate over a single property
A billable metric can only aggregate over a single property. For example, if you're an email sending service, you might have a usage event that looks like this:
This event would support charging customers based on how many emails they sent or the maximum size of an email, but it would not support charging based on the total amount of data transfer, defined as the size of the email multiplied by the number of recipients. To do this, you would need to introduce a new property with the result of that multiplication:
Heartbeat event idempotence
Usage events typically fall into one of two categories: an event that occurs when a user takes some action or a periodic "heartbeat" that measures the current state. The latter is particularly common in infrastructure services. For example, a service selling computation might send a per-node heartbeat to Metronome each minute describing the CPU and disk utilization on that node. These events could be aggregated into the metrics "CPU minutes" and "gigabyte minutes."
It's important for heartbeat events to ensure that usage is only counted once. This can be accomplished by choosing a deterministic
transaction_id such that duplicate events will have the same ID. Metronome guarantees that only one event with a given
transaction_id will be processed.
In the example of a per-node per-minute heartbeat, you might structure a transaction ID as follows:
unix_now() is a function that returns the number of seconds since the Unix epoch. By including both the node ID and a minute-granularity timestamp in the transaction ID, it's guaranteed that duplicate events from the same node in the same minute will be ignored.
Using this type of
transaction_id means you also don't have to worry about sending events too often. In fact, we recommend sending two or more heartbeats per measurement period. Duplicates will be safely ignored, and this way you decrease the risk of missing a measurement period due to timer imprecision or a temporary delay.
Changes to usage events are new integrations
Usage events are designed to target very specific billable metrics, so if the data shape changes, that could prevent downstream metrics from being properly recorded. It's best to work with the Metronome team any time you are adjusting the shape of your usage events. We can help validate and test the change with you to avoid any disruption.
Ensure Metronome does not block critical paths
Metronome has been expressly designed so that you can use it safely even in the most critical parts of your application. In accordance with availability best practices, we suggest verifying that Metronome is not a blocker in your customer creation path - since Metronome can match events sent at any time before or after customer creation using ingest aliases, we recommend creating the customer in your system first, then creating the matching customer record in Metronome asynchronously.
At this point, you probably have some ideas about what data you want to bring into Metronome. This is a good time to meet with the experts on the Metronome team so we can provide advice based on our experience with other clients.