Why choose Spark over Hadoop

Cloudera Hadoop

This article describes how to connect Tableau to a Cloudera Hadoop database and how to set up the data source.

Note: Use the Impala connector instead for new connections to Impala databases. (You can still use this connector for existing connections.)

requirements

First, collect this connection information:

  • Name of the server hosting the database you want to connect to and the port number

  • Database type: Hive Server 2 or Impala

  • Authentication method:

    • No authentication

    • Kerberos

      Note: Due to KDC (Kerberos Domain Controller) restrictions, connection with MIT Kerberos is not supported.

    • User name

    • Username and Password

    • Microsoft Azure HDInsight service (from version 10.2.1)

  • The transport options depend on the authentication method selected and can include the following:

  • The credentials depend on the authentication method selected and can include the following:

    • User name

    • password

    • Area

    • Host FQDN

    • Service name

    • HTTP path

  • Do you want to connect to an SSL server?

  • (Optional) First SQL statement to run each time Tableau connects

Driver required

A driver is required for this connector to communicate with the database. The required driver may already be installed on your computer. If the driver is not installed on your computer, Tableau will display a message in the connection dialog with a link to the Download Driver page (Link opens in a new window). There you will find driver links and installation instructions.

Note: Make sure you are using the latest available drivers. For information on getting the latest drivers, see Cloudera Hadoop (Link opens in a new window) on the Download Tableau Drivers page.

Establishing the connection and setting up the data source

  1. Launch Tableau and under Connect, select Cloudera Hadoop. A comprehensive list of data connections is displayed when you select More under With a server. Then do the following:

    1. Enter the name of the database host server and the port number to use. If you are connecting using Cloudera Impala, you must use port 21050, which is the default port for the version 2.5.x driver (recommended).

    2. From the Type drop-down list, select the type of database to connect to. Depending on the version of Hadoop and the drivers installed, you can connect to the following solutions:

    3. Select from the drop-down list Authentication the desired authentication method.

    4. Enter the requested information. The information you are asked for depends on the selected authentication method.

    5. (Optional) Select Initial SQL Dates to specify an SQL command to run at the start of every connection, such as: For example, when you open a workbook, refresh an extract, sign in to Tableau Server, or publish content to Tableau Server. For more information, see Executing Initial SQL.

    6. Select Sign In.

      When connecting to an SSL server, select Require SSL.

      If Tableau cannot connect, verify that your credentials are correct. If you still cannot connect, the computer cannot find the server. Contact your network administrator or database administrator.

  2. On the data sources page, do the following:

    1. (Optional) Select the default data source name at the top of the page, then enter a unique data source name to use in Tableau. For example, you can use a data source naming convention to help other users determine which data source to connect.

    2. Select the search icon from the Scheme drop-down list or type the name of the scheme in the text box, select the search icon, and then select the scheme.

    3. Select the search icon from the Table text box, or type the table name, select the search icon, and then select the table.

    4. Drag the table onto the work area, then click the sheet tab to start your analysis.

      Use custom SQL to connect to a specific query rather than the entire data source. For more information, see Connect to a Custom SQL Query.

      Note: This type of database only supports equals sign join operators (=).

Sign in to a Mac

If you are using Tableau Desktop on a Mac, enter a fully qualified domain name (for example, "mydb.test.ourdomain.lan") instead of a relative domain name (for example, "mydb" or "mydb.test").

Alternatively, you can add the domain to the list of search domains for the Mac computer so that you only need to provide the server name to connect. To update the list of search domains, go to System Preferences> Network> Advanced, then open the DNS tab.

Working with Hadoop Hive data

Working with date / time

Tableau provides standard support for the TIMESTAMP and DATE types. However, if you are storing the date and time data as a string in Hive, the ISO format (YYYY-MM-DD) must be used. You can create a calculated field that uses the DATEPARSE or DATE functions to convert a string to a date or time format. Use the "DATEPARSE ()" function when working with extracts and otherwise the "DATE ​​()" function. For more information, see Date Functions.

For more information on Hive data types, see the dates section on the Apache Hive website (link opens in a new window).

NULL value returned

A null value is returned when you open a workbook in Tableau 9.0.1 and later and 8.3.5 and later 8.3.x versions and have date and time data stored as strings in a Hive Contains unsupported format. To resolve this problem, change the field type back to String and create a calculated field using the DATEPARSE () or DATE () functions to convert the date. Use the "DATEPARSE ()" function when working with extracts and otherwise the "DATE ​​()" function.

High latency restrictions

Hive is a batch-oriented system that is not yet able to answer simple inquiries within a very short time. This limitation makes it difficult to investigate a new data set or an experiment with calculated fields. Some of the newer SQL-on-Hadoop technologies (e.g., Impala from Cloudera and the Stringer project from Hortonworks) have been designed to address this limitation.

See also