Skip to content

Instantly share code, notes, and snippets.

@tomotake-koike
Last active March 28, 2020 21:41
Show Gist options
  • Select an option

  • Save tomotake-koike/2e58d3d1fdf533e7a57c4282fc596012 to your computer and use it in GitHub Desktop.

Select an option

Save tomotake-koike/2e58d3d1fdf533e7a57c4282fc596012 to your computer and use it in GitHub Desktop.
Count lines of the gzip text file on S3
import boto3
import zlib
s3 = boto3.session.Session().client('s3')
resp = s3.get_object( Bucket='BUCKET', Key='PATH' )
b = resp['Body']
i = 0
cnt = 0
c = 0
decb = b''
dec = zlib.decompressobj(zlib.MAX_WBITS|32)
for l in b.iter_chunks(chunk_size=1024 * 32):
decb += dec.decompress(l)
s = ''
for d in range( 0, 3 ):
try:
s = decb[:-d].decode() if d else decb.decode()
break
except UnicodeDecodeError:
print( 'UnicodeDecodeError Re-Decode cnt=' + str( d ) + ' char=' + str( decb[-1:] ) )
continue
cnt += s.count('\n')
decb = decb[-d:] if d else b''
i += 1
print( str( i ) + " : " + str( cnt ) )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment