The Kinesis Shard Calculator recommends the optimal number of shards for a Kinesis data stream, and shows the corresponding cost estimation. It also provides recommendations for improving the efficiency and lower the cost of the data stream.
An explanation for the various input attributes and results are provided within the Kinesis Shard Calculator itself. It should be pretty straight-forward. If not, please do provide feedback on our github project!
The diagram below depicts the bandwidth over 4 days of an actual Kinesis stream. It illustrates some of the main concepts:
Stream Definition
 
 
 
Number of Shards Needed
Producer
Average message size
bytes Invalid message size: It must be between {{ Producer.messageSizeMin }} and {{ Producer.messageSizeMax }}
The average size of the messages, measured in bytes. This is not including record aggregation,
but instead relates to what Kinesis calls "user records".
It must to be at least 1 byte and at most 1MB (Kinesis limitation).
 
Using Kinesis Record Aggregation
Indicates whether or not the producer is using Kinesis record aggregation. This feature which is available in the KPL
allows us to group user records into fewer larger aggregated records.
Note that using Kinesis Record Aggregation increases throughput and reduces cost, at the expense of latency.
 
Average throughput
records/second Invalid average throughput: It must be at least 1
The number of records per second written to the stream by the producer on average throughout the day.
Average bandwidth
{{ prettifyBytes(Producer.averageInBandwidth()) }}
/ second
The average bandwidth produced by the producer throughout the day. This is directly correlated with the
average throughput and message size.
Peak throughput
records/second Invalid peak throughput: It must be at least 1 and greater than the average throughput
The maximum number of records per second written to the stream by the producer. It is part of the regular
traffic, as it varies throughout the day (e.g. daily TV viewing pattern). Surge traffic is not included here and needs to be tackled independently.
It must be higher than the average throughput.
This impacts the shard count if not using Kinesis Record Aggregation because of the Kinesis producer limit of 1000 records per second.
{{ Producer.shardsFromPeakInThroughput() }}
Peak bandwidth
{{ prettifyBytes(Producer.peakInBandwidth()) }}
/ second
The peak bandwidth produced by the producer. This is directly correlated with the message size and the
peak throughput (not counting surge).
This impacts the shard count because of the Kinesis incoming bandwidth limit of 1MB/s.
{{ Producer.shardsFromPeakInBandwidth() }}
Stream has surges
Indicates whether or not the stream has traffic surges, or in other words significant throughput increases (higher
than the peak throughput) for a short period of time. This may be the case for instance if an event (e.g. alert or notification) notifies
a large number of clients, resulting in each one of them producing records to the stream.
Surge throughput
records/second Invalid surge throughput: It must be at least 1 and greater than the peak throughput
This is the throughput reached during a traffic surge. It needs to be larger than the peak throughput.
{{ Producer.shardsFromSurgeInThroughput() }}
Surge bandwidth
{{ prettifyBytes(Producer.surgeInBandwidth()) }}
/ second
The bandwidth produced by the producer during surges. This is directly correlated with the message size and the
surge throughput.
This impacts the shard count because of the Kinesis incoming bandwidth limit of 1MB/s.
{{ Producer.shardsFromSurgeInBandwidth() }}
Surge duration
seconds Invalid surge duration: It must be at least 1
This is longest expected duration of a traffic surge.
consumers
Number of consumers
{{ nbConsumers() }}
This is the total number of consumers, including enhanced fan-out consumers. You can add additional consumers by clicking the button above.
This impacts the shard count because the outgoing bandwidth limit of 2MB/s/shard is sharded amongst the "standard" consumers (i.e. non-fan-out consumers).
{{ shardsFromNbConsumers() }}
Consumer {{ id }}
 
 
 
Enhanced Fan-Out Consumer
Indicates whether or not the consumer is using the Kinesis Enhanced Fan-out
feature. It essentially isolates this consumer from the other consumers, so that the fan-out consumers each have a dedicated 2MB/s/shard bandwidth.
 
Maximum consumption speed
records/second Invalid consumption speed: It must be at least 1
This is the maximum number of records that a single consumer instance (i.e. process) can handle. This
assumes that a single shard is consumed by exactly one consumer process.
It impacts the shard count because it can limit the actual throughput on a shard.
{{ c.shardsFromMaxOutThroughput() }}
Maximum acceptable latency
seconds Invalid maximum acceptable latency: It must be at least 1
This is the maximum duration that is acceptable for a consumer to recover from a surge.
It impacts the shard count, if either the consumption speed of the consumer is particularly low, or if the average message size and
the overall number of consumers (with which this consumer needs to share the bandwitdh) are high.
{{ c.shardsFromMaxAcceptableLatency() }}
Stream Analysis
Number of Shards Needed
 
This is the total number of shards needed for Kinesis data stream based on the information provided.
Warning: This exceeds the maximum number of shards per stream in some AWS regions. You may need to request this soft limit to be increased.
Warning: This exceeds the maximum number of shards per stream in all AWS regions. You will need to request this soft limit to be increased.
{{ totalShards() }}
Bottleneck Factors
 
The factors that drive the number of shards are:
The producer peak bandwidth
The producer peak throughput
The producer surge bandwidth
The producer surge throughput
The number of consumers
Consumer {{ id }} consumption speed
Consumer {{ id }} acceptable latency
 
Average incoming bandwidth utilization
 
This provides the percentage of the available incoming bandwitdh acutally used under "normal" conditions.
Using Kinesis record aggregation would lower the required number of shards from {{ totalShardsWithoutAggregation() }} down to {{ totalShardsWithAggregation() }}.
This would significantly reduce the cost of your Kinesis data stream.
Using Kinesis record aggregation would not lower the required number of shards, but it would reduce the cost of your Kinesis data stream.
Highly Recommended
Recommended
Cost Analysis
Important*: The costs provided in this section are estimations, and may differ from the actual amount charged by AWS.
In order to compute these approximated prices, we take a number of assumptions that may or may be valid based on your specific context.
Retention Period
hours Invalid retention period: It must be between {{ retentionPeriodMin }} and {{ retentionPeriodMax }}
The cost of a Kinesis stream depends on the Kinesis stream data retention. By default, it is set to 24 hours, but you can increase it up to 7 days.
 
Shard Hour Price
$/hour/shard Invalid Shard Hour Price: It must be between {{ AWSPricing.shardHourMin }} and {{ AWSPricing.shardHourMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
PUT Payload Price
$/1,000,000 PUT Units Invalid PUT Payload Price: It must be between {{ AWSPricing.putUnitsMin }} and {{ AWSPricing.putUnitsMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Extended Data Retention Price
$/hour/shard Invalid Extended Data Retention Price: It must be between {{ AWSPricing.shardExHourMin }} and {{ AWSPricing.shardExHourMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Enhanced Fan-Out Shard Hours Price
$/hour/shard Invalid Enhanced Fan-Out Shard Hours Price: It must be between {{ AWSPricing.fanoutShardMin }} and {{ AWSPricing.fanoutShardMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
Enhanced Fan-Out Data Retrievals Price
$/GB Invalid Enhanced Fan-Out Data Retrievals Price: It must be between {{ AWSPricing.fanoutDataMin }} and {{ AWSPricing.fanoutDataMax }}
See the current AWS list price for Kinesis. It depends on the AWS region in which the Kinesis stream is defined.
 
AWS Discount
% Invalid AWS discount: It must be between {{ AWSPricing.discountMin }} and {{ AWSPricing.discountMax }}
The discount you might have negotiated with AWS.
 
Kinesis data stream cost
 
This is approximately* the total cost of the Kinesis data stream, which breaks down into:
Shard hour cost: ${{ prettify( shardPrice() ) }} per day
PUT payload cost (approximately*): ${{ prettify( putPrice() ) }} per day
${{ prettify( shardPrice() + putPrice() )}} per day
Record Aggregation Saving
 
Using Kinesis record aggregation {{Producer.recordAggregation ? "provides" : "would provide"}} an approximative saving of ${{ prettify(aggregationSavings()) }} per day
({{prettify(100*aggregationSavings()/(shardPriceWithoutAggregation() + putPriceWithoutAggregation()))}}%).
You could save ${{ prettify(aggregationSavings()) }} per day
${{ prettify(aggregationSavings()) }} per day
Kinesis enhanced fan-out consumer cost
 
Each consumer that uses the Kinesis Enhanced Fan-out has an additional cost, which breaks down into:
Enhanced fan-out shard hours cost: ${{ prettify( fanoutShardPrice() ) }} per day
Enhanced fan-out data retrievals cost (approximately*): ${{ prettify( fanoutDataPrice() ) }} per day
${{ prettify( fanoutPrice() ) }} per day (each)
* The prices provided here are approximation. It depends on the exact production/consumption pattern.
In order to provide an estimated cost, we take the following assumptions:
A day is made of 24 hours, each of which is made of 3600 seconds.
Daily prices are for a full day of usage. In other words, the price is not relevant for the days when you use the data stream partially.
The data production throughput can vary but it is continuous (i.e. no interruption).
The data consumers each consume the whole data stream once and only once. In other words, they don't "replay" past data or skip portion of the stream.