Kinesis Shard Calculator

View project on GitHub

The Kinesis Shard Calculator recommends the optimal number of shards for a Kinesis data stream, and shows the corresponding cost estimation. It also provides recommendations for improving the efficiency and lower the cost of the data stream.

An explanation for the various input attributes and results are provided within the Kinesis Shard Calculator itself. It should be pretty straight-forward. If not, please do provide feedback on our github project!

The background reasoning for the calculation of shards was the topic of a breakout session at AWS re:Invent 2018. You can see the video of the session as well as the presentation slides.

The diagram below depicts the bandwidth over 4 days of an actual Kinesis stream. It illustrates some of the main concepts: Bandwidth

Stream Definition

 
 
 
Number of Shards Needed
Producer
Average message size
bytes
Invalid message size: It must be between {{ Producer.messageSizeMin }} and {{ Producer.messageSizeMax }}
The average size of the messages, measured in bytes. This is not including record aggregation, but instead relates to what Kinesis calls "user records".
It must to be at least 1 byte and at most 1MB (Kinesis limitation).
 
Using Kinesis Record Aggregation
Indicates whether or not the producer is using Kinesis record aggregation. This feature which is available in the KPL allows us to group user records into fewer larger aggregated records.
Note that using Kinesis Record Aggregation increases throughput and reduces cost, at the expense of latency.
 
Average throughput
records/second
Invalid average throughput: It must be at least 1
The number of records per second written to the stream by the producer on average throughout the day.
Average bandwidth
{{ prettifyBytes(Producer.averageInBandwidth()) }} / second
The average bandwidth produced by the producer throughout the day. This is directly correlated with the average throughput and message size.
Peak throughput
records/second
Invalid peak throughput: It must be at least 1 and greater than the average throughput
The maximum number of records per second written to the stream by the producer. It is part of the regular traffic, as it varies throughout the day (e.g. daily TV viewing pattern). Surge traffic is not included here and needs to be tackled independently.
It must be higher than the average throughput.
This impacts the shard count if not using Kinesis Record Aggregation because of the Kinesis producer limit of 1000 records per second.
{{ Producer.shardsFromPeakInThroughput() }}
Peak bandwidth
{{ prettifyBytes(Producer.peakInBandwidth()) }} / second
The peak bandwidth produced by the producer. This is directly correlated with the message size and the peak throughput (not counting surge).
This impacts the shard count because of the Kinesis incoming bandwidth limit of 1MB/s.
{{ Producer.shardsFromPeakInBandwidth() }}
Stream has surges
Indicates whether or not the stream has traffic surges, or in other words significant throughput increases (higher than the peak throughput) for a short period of time. This may be the case for instance if an event (e.g. alert or notification) notifies a large number of clients, resulting in each one of them producing records to the stream.
Surge throughput
records/second
Invalid surge throughput: It must be at least 1 and greater than the peak throughput
This is the throughput reached during a traffic surge. It needs to be larger than the peak throughput.
{{ Producer.shardsFromSurgeInThroughput() }}
Surge bandwidth
{{ prettifyBytes(Producer.surgeInBandwidth()) }} / second
The bandwidth produced by the producer during surges. This is directly correlated with the message size and the surge throughput.
This impacts the shard count because of the Kinesis incoming bandwidth limit of 1MB/s.
{{ Producer.shardsFromSurgeInBandwidth() }}
Surge duration
seconds
Invalid surge duration: It must be at least 1
This is longest expected duration of a traffic surge.
consumers
Number of consumers
{{ nbConsumers() }}
This is the total number of consumers, including enhanced fan-out consumers. You can add additional consumers by clicking the button above.
This impacts the shard count because the outgoing bandwidth limit of 2MB/s/shard is sharded amongst the "standard" consumers (i.e. non-fan-out consumers).
{{ shardsFromNbConsumers() }}
Consumer {{ id }}
 
 
 
Enhanced Fan-Out Consumer
Indicates whether or not the consumer is using the Kinesis Enhanced Fan-out feature. It essentially isolates this consumer from the other consumers, so that the fan-out consumers each have a dedicated 2MB/s/shard bandwidth.
 
Maximum consumption speed
records/second
Invalid consumption speed: It must be at least 1
This is the maximum number of records that a single consumer instance (i.e. process) can handle. This assumes that a single shard is consumed by exactly one consumer process.
It impacts the shard count because it can limit the actual throughput on a shard.
{{ c.shardsFromMaxOutThroughput() }}
Maximum acceptable latency
seconds
Invalid maximum acceptable latency: It must be at least 1
This is the maximum duration that is acceptable for a consumer to recover from a surge.
It impacts the shard count, if either the consumption speed of the consumer is particularly low, or if the average message size and the overall number of consumers (with which this consumer needs to share the bandwitdh) are high.
{{ c.shardsFromMaxAcceptableLatency() }}

Stream Analysis

Number of Shards Needed
 
This is the total number of shards needed for Kinesis data stream based on the information provided.

Warning: This exceeds the maximum number of shards per stream in some AWS regions. You may need to request this soft limit to be increased.

Warning: This exceeds the maximum number of shards per stream in all AWS regions. You will need to request this soft limit to be increased.

{{ totalShards() }}
Bottleneck Factors
 
The factors that drive the number of shards are:
  • The producer peak bandwidth
  • The producer peak throughput
  • The producer surge bandwidth
  • The producer surge throughput
  • The number of consumers
  • Consumer {{ id }} consumption speed
  • Consumer {{ id }} acceptable latency
 
Average incoming bandwidth utilization
 
This provides the percentage of the available incoming bandwitdh acutally used under "normal" conditions.
{{ prettify(100 * Producer.averageInBandwidth() / (totalShards() * Producer.awsMaxIncomingBw)) }}%
Average outgoing bandwidth utilization
 
This provides the percentage of the available outgoing bandwitdh acutally used under "normal" conditions.
{{ prettify(100 * nbStandardConsumers() * Producer.averageInBandwidth() / (totalShards() * Producer.awsMaxOutgoingBw)) }}%
Record Aggregation
 
Using Kinesis record aggregation would lower the required number of shards from {{ totalShardsWithoutAggregation() }} down to {{ totalShardsWithAggregation() }}.
This would significantly reduce the cost of your Kinesis data stream.
Using Kinesis record aggregation would not lower the required number of shards, but it would reduce the cost of your Kinesis data stream.
Highly Recommended
Recommended

Cost Analysis

Important*: The costs provided in this section are estimations, and may differ from the actual amount charged by AWS. In order to compute these approximated prices, we take a number of assumptions that may or may be valid based on your specific context.

Retention Period
hours
Invalid retention period: It must be between {{ retentionPeriodMin }} and {{ retentionPeriodMax }}
The cost of a Kinesis stream depends on the Kinesis stream data retention. By default, it is set to 24 hours, but you can increase it up to 7 days.
 
Shard Hour Price
$/hour/shard
Invalid Shard Hour Price: It must be between {{ AWSPricing.shardHourMin }} and {{ AWSPricing.shardHourMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
PUT Payload Price
$/1,000,000 PUT Units
Invalid PUT Payload Price: It must be between {{ AWSPricing.putUnitsMin }} and {{ AWSPricing.putUnitsMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Extended Data Retention Price
$/hour/shard
Invalid Extended Data Retention Price: It must be between {{ AWSPricing.shardExHourMin }} and {{ AWSPricing.shardExHourMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Enhanced Fan-Out Shard Hours Price
$/hour/shard
Invalid Enhanced Fan-Out Shard Hours Price: It must be between {{ AWSPricing.fanoutShardMin }} and {{ AWSPricing.fanoutShardMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Enhanced Fan-Out Data Retrievals Price
$/GB
Invalid Enhanced Fan-Out Data Retrievals Price: It must be between {{ AWSPricing.fanoutDataMin }} and {{ AWSPricing.fanoutDataMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
AWS Discount
%
Invalid AWS discount: It must be between {{ AWSPricing.discountMin }} and {{ AWSPricing.discountMax }}
The discount you might have negotiated with AWS.
 
Kinesis data stream cost
 
This is approximately* the total cost of the Kinesis data stream, which breaks down into:
  • Shard hour cost: ${{ prettify( shardPrice() ) }} per day
  • PUT payload cost (approximately*): ${{ prettify( putPrice() ) }} per day
${{ prettify( shardPrice() + putPrice() )}} per day
Record Aggregation Saving
 
Using Kinesis record aggregation {{Producer.recordAggregation ? "provides" : "would provide"}} an approximative saving of ${{ prettify(aggregationSavings()) }} per day ({{prettify(100*aggregationSavings()/(shardPriceWithoutAggregation() + putPriceWithoutAggregation()))}}%).
You could save ${{ prettify(aggregationSavings()) }} per day
${{ prettify(aggregationSavings()) }} per day
Kinesis enhanced fan-out consumer cost
 
Each consumer that uses the Kinesis Enhanced Fan-out has an additional cost, which breaks down into:
  • Enhanced fan-out shard hours cost: ${{ prettify( fanoutShardPrice() ) }} per day
  • Enhanced fan-out data retrievals cost (approximately*): ${{ prettify( fanoutDataPrice() ) }} per day
${{ prettify( fanoutPrice() ) }} per day (each)

* The prices provided here are approximation. It depends on the exact production/consumption pattern. In order to provide an estimated cost, we take the following assumptions:

  • A day is made of 24 hours, each of which is made of 3600 seconds.
  • Daily prices are for a full day of usage. In other words, the price is not relevant for the days when you use the data stream partially.
  • The data production throughput can vary but it is continuous (i.e. no interruption).
  • The data consumers each consume the whole data stream once and only once. In other words, they don't "replay" past data or skip portion of the stream.