The use of data lakes and data warehouses has exploded and has accelerated business innovation with new ways to access and share data. This blog highlights how data can leave warehouses and illustrates the associated risks and how these risks can be addressed to ensure data security.
Here are some of the common ways data can leave data warehouses:
- Cloning: This allows a user to create a complete copy of a database, schema, or table. This feature can extract data from warehouses by cloning a specific database or schema and then using that copy to extract the required data. Zero-copy cloning makes it even easier because it does not require deep copies of the original data tables.
- Data Marketplace: This is a platform feature that allows companies to “publish” and customers to discover and access third-party data sets that have been curated and optimized for use within the data warehouse or with data marketplace applications. This data can be published natively in the data marketplace.
- "COPY INTO" function or similar: This allows you to export data from a cloud warehouse to a file in a cloud storage location, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage.
- "UNLOAD" function or similar: This allows you to export data from cloud data warehouses to an external stage, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage.
- JDBC/ODBC drivers: You can use these drivers to connect to the cloud warehouse and extract data using SQL queries. You can then write the results to a local file or any other destination.
- REST API: You can use a cloud data warehouse REST API to extract data in a JSON format via simple API execution or scripting.
- Third-party tools: Many third-party tools support cloud warehouses and can be used to extract data. For example, you can use tools like Apache NiFi, Apache Airflow, or Talend to extract data.
- Partner ecosystem: Partner ecosystem that includes data integration and ETL tools that can be used to extract data. Examples of such tools include Informatica, Matillion, Fivetran, and many others.
With so many ways to take data out of your cloud data warehouse, it's important to understand the risks associated with these methods. Users of even the most critical data stores can be phished, which allows attackers to use any of the above methods to exfiltrate data. SQL injection attacks can also exploit applications built on cloud data stores. It’s also not uncommon for misconfiguration of user rights, roles assignment, or privileges to allow for increased risk exposure in these vast data warehouses.
A few key steps that security teams should take to manage data access from cloud data warehouses include:
- Being aware of what data exists and what are the crown jewels from a data perspective.
- Being aware of the value of the data and the related economic risks
- Understand who accesses these data tables and the context involved in that access.
- Users' behaviors are a good giveaway of risks.
The big question should be: How can one be on top of such steps to address risks?
The easiest way to gain contextual knowledge and visibility is to deploy Theom onto your cloud data warehouse instance. Theom deploys in minutes, requires no agents or proxies, and has no impact on data warehouse performance. Theom supports Snowflake, Databricks, AWS, and Azure data stores.
5-day Proof of Value (POV) - Try us out and see immediate value.