Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice when inserting data to existing table via impyla #213

Open
IamGianluca opened this issue Sep 12, 2016 · 5 comments
Open

Best practice when inserting data to existing table via impyla #213

IamGianluca opened this issue Sep 12, 2016 · 5 comments

Comments

@IamGianluca
Copy link

Hi,

I'm using impyla for my project. Specifically I'm using it to read data using the Impala engine and writing data using HIve. The only difference (aside from minor syntax) is the port I have to provide when connecting to the cluster.

I need to write new rows into an existing partitioned table using Hive. My first thought was to use the executemany command and pass to it a string containing the query template and a tuple of tuples with all the data I want to add. Since I was struggling with implementing such method, I did a research and bumped into #96 . There people say that using the executemany is not the best practice for adding new rows into an existing table. They suggest instead to use ibis or writing the data into HDFS and then register the tables with a CREATE statements. I'm not sure the second suggestion would apply for adding data into a tables that already exist, anyway I was wondering if using impyla plus Hive engine for the INSERT INTO table is still discouraged.

What is the suggested way for inserting data into an existing Hive table?

Also what's wrong with my code below?

# prepare query 
yesterday = datetime.date.today() - datetime.timedelta(days=1)
query = """
             SET hive.exec.dynamic.partition.mode=nonstrict
             INSERT INTO db_name.table_name
             PARTITION (year, month, day)
             VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
             """
log_datetime = datetime.datetime.now()
rows = tuple([(str(yesterday), "NULL",
                      str(i.campaign), str(i.adgroup),
                      str(i.id), '1', '1', str(yesterday.year),
                      str(yesterday.month), str(yesterday.day))
                     for i in target.values()])

conn = connect(host=hostname, port=port, auth_mechanism="PLAIN")
cursor = conn.cursor()
cursor.executemany(query, rows)

The program runs without returning any error message but I cannot see the new rows added to the table in Hive.

@chandlervan
Copy link

i meet the same problem......it kills me ...

@timarmstrong
Copy link
Contributor

@chandlervan can you provide info about the hive and Impyla version you're using. And ideally a python script that reproduces the problem.

I don't think we intended Impyla to work as a Hive client, but it seems like people have been using it that way! Would be good to understand what is happening - it's entirely possible that the issue also affects Impala connections in a different way.

@chandlervan
Copy link

@timarmstrong hi,it seemd to be my ignorance that i didn't notice there is a configuration in the cursor().execute() method, when i finish my config correctly,my sql worked. my workmate check the log and told me that it is about the queue setting that makes my sql failed..so it had nothing to do with the wonderful pakage:impyla~~~~thx for your reply!!

@timarmstrong
Copy link
Contributor

@chandlervan no problem, glad to hear you got unblocked :)

@jeff303
Copy link

jeff303 commented Dec 10, 2019

@timarmstrong hi,it seemd to be my ignorance that i didn't notice there is a configuration in the cursor().execute() method, when i finish my config correctly,my sql worked. my workmate check the log and told me that it is about the queue setting that makes my sql failed..so it had nothing to do with the wonderful pakage:impyla~~~~thx for your reply!!

Would you mind sharing more details about this configuration that you needed to change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants