Connecting to Apache Spark with ODBC

Benefits

By connecting to Apache Spark with an ODBC connection, your ability to work with the database will greatly improve and will make querying the database a much easier task. Additionally, you could use several third-party reporting and querying tools that are otherwise not available. In this article, we will go through the necessary steps on how to connect to Apache Spark through ODBC. We will be using WinSQL, a powerful SQL query software, to achieve this.

Requirements

  • Java run time 8 (JRE 8) or greater must be installed on the machine where WinSQL is running.
  • Ensure java is in the path system variable. In windows, you can achieve this by modifying the system environment variables. Please see here for the necessary steps.
Step 1: Register ODBC Drivers in WinSQL
Open up WinSQL and click on Help. Next, click on Register ODBC Drivers. Make sure that "WinSQL Apache Spark SQL Wire Protocol" is registered.
Step 2: Configure the Connection in WinSQL
  • Open WinSQL and go to File then click on open ODBC Manager.
  • Click on the "User DSN" tab.
  • Click "Add".
  • Select the "WinSQL Apache Spark SQL Wire Protocol".

You will need to fill out the following fields:

Apache Spark WinSQL Setup

Field Description
Data Source Name/Description This is a friendly name you want to apply to this Data Source.
Host Name Enter the IP Address/Hostname of the machine where Apache Spark is running.
Port Number Specify the port number of Apache Spark. By default, the port number is 10000.
Database Name Select the Database from your project you want to connect to. The default database name is "Default"

Once you specify your credentials, click the "Test Connect" button at the bottom. A new popup will show up, where you will need to specify your User Name and Password (if specified).


Once the username and password are specified, click on OK. You will see a "Connection established" message popup if WinSQL is able to connect.

Running Queries and Interacting with Apache Spark

Now that we are successfully connected, we can begin running queries and modifying the database.

For example, I am adding a new vendor who has 40 beef, 15 pork, and 25 lambs:

Apache Spark Add Data into Table

If I browse the data I can see the newly added data:

Apache Spark View Data in Table

Importing Data from Excel

Once you are connected to your Apache Spark in WinSQL, you will be able to see all of your tables and data within WinSQL. To begin importing data directly from Microsoft Excel, simply drag and drop the .xlsx file directly into your Tables section as seen here:

WinSQL Apache Spark Excel Drag and Drop

Conclusion

Using your ODBC query tool you can interact with Spark easily. The benefit is that you also get the entire suite of your chosen tool as well as a much more informative interface. Using WinSQL allows you to use several additional reporting and querying tools that are otherwise not available. Please see here for a list of features/benefits.

Navigation

Social Media

Powered by 10MinutesWeb.com