hadoop - How to add partition using hive by a specific date? -


i'm using hive (with external tables) process data stored on amazon s3.

my data partitioned follows:

                       dir   s3://test.com/2014-03-01/                        dir   s3://test.com/2014-03-02/                        dir   s3://test.com/2014-03-03/                        dir   s3://test.com/2014-03-04/                        dir   s3://test.com/2014-03-05/  s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log 

how create partition table using hive?

   create external table test (     foo string,     time string,     bar string     )  partitioned (? string)     row format delimited     fields terminated '\t'     location 's3://test.com/'; 

could answer question ? thanks!

first start right table definition. in case i'll use wrote:

create external table test (     foo string,     time string,     bar string )  partitioned (dt string) row format delimited fields terminated '\t' location 's3://test.com/'; 

hive default expects partitions in subdirectories named via convention s3://test.com/partitionkey=partitionvalue. example

s3://test.com/dt=2014-03-05 

if follow convention can use msck add partitions.

if can't or don't want use naming convention, need add partitions in:

alter table test     add partition (dt='2014-03-05')     location 's3://test.com/2014-03-05' 

Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

javascript - Ajax jqXHR.status==0 fix error -