Sunday, February 25, 2018

Redshift Spectrum


Redshift Spectrum helps to run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. It achieves without loading data or resizing the Amazon Redshift cluster based on growing data volumes.

Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance.  It scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. It is possible to query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.

Top 3 performance features are:
  1. Short Query Acceleration - speed up execution of queries such as reports, dashboards, and interactive analysis
  2. Results Caching - deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools
  3. Late Materialization - reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column
AWS Summit video at https://www.youtube.com/watch?v=gchd2sDhSuY

No comments:

Post a Comment