Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table output is empty intermittently when using fetchall() or cursor as iterator #505

Open
lucharo opened this issue Nov 9, 2022 · 1 comment

Comments

@lucharo
Copy link

lucharo commented Nov 9, 2022

Hi, just started using impyla as an alternative to pyhive to connect to a HiveServer2 instance.

I've noticed that fetchall() and using the cursor as an iterator intermittently return empty data. I ran some experiments to test this:

My connection code:

configuration = {
    "hive.execution.engine": "tez",
}
conn = connect(
    host=HOSTNAME,
    port = 10000,
    user=USER,
    auth_mechanism="GSSAPI",
    kerberos_service_name="hive",
    database="mydb"
)

My benchmark code after setting a the connection using fetchall

%%time
empty = []
for _ in range(50):
    cur = conn.cursor()
    cur.execute('select * from db.my_table', configuration = configuration)
    output1 = []
    output1 = cur.fetchall()
    empty.append(output1 == [])
collections.Counter(empty)
## this returns Counter({False: 43, True: 7}), meaning 7 out the 50 times the output was empty

Then I run the same benchmark using the cursor as an iterator and a for loop:

empty = []
for _ in range(50):
    cur = conn.cursor()
    cur.execute('select * from db.my_table', configuration = configuration)
    output2 = []
    for row in cur:
        output2.append(row)
    output2 == []
    empty.append(output2 == [])
collections.Counter(empty)
## this returns Counter({False: 31, True: 19}), meaning 19 out the 50 times the output was empty
@csringhofer
Copy link
Collaborator

Hi! I have tried to reproduce this issue in the Impala development environment, but didn't notice any empty results sets.
Can you provide more info about the issue?

  • What version of Impyla was used?
  • What Hive version was used?
  • What kind of table is db.my_table, e.g. number of rows in the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants