hadoop - How to add partition using hive by a specific date? -

- January 15, 2010

i'm using hive (with external tables) process data stored on amazon s3.

my data partitioned follows:

                       dir   s3://test.com/2014-03-01/                        dir   s3://test.com/2014-03-02/                        dir   s3://test.com/2014-03-03/                        dir   s3://test.com/2014-03-04/                        dir   s3://test.com/2014-03-05/  s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log

how create partition table using hive?

   create external table test (     foo string,     time string,     bar string     )  partitioned (? string)     row format delimited     fields terminated '\t'     location 's3://test.com/';

could answer question ? thanks!

first start right table definition. in case i'll use wrote:

create external table test (     foo string,     time string,     bar string )  partitioned (dt string) row format delimited fields terminated '\t' location 's3://test.com/';

hive default expects partitions in subdirectories named via convention s3://test.com/partitionkey=partitionvalue. example

s3://test.com/dt=2014-03-05

if follow convention can use msck add partitions.

if can't or don't want use naming convention, need add partitions in:

alter table test     add partition (dt='2014-03-05')     location 's3://test.com/2014-03-05'

Search This Blog

Cap

hadoop - How to add partition using hive by a specific date? -

Comments

Post a Comment

Popular posts from this blog

Need to Replace properties of single sql file using bat file -

postgresql - Lazarus + Postgres: incomplete startup packet -

c# - How to get the current UAC mode -