hadoop - How to add partition using hive by a specific date? -
i'm using hive (with external tables) process data stored on amazon s3.
my data partitioned follows:
dir s3://test.com/2014-03-01/ dir s3://test.com/2014-03-02/ dir s3://test.com/2014-03-03/ dir s3://test.com/2014-03-04/ dir s3://test.com/2014-03-05/ s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log
how create partition table using hive?
create external table test ( foo string, time string, bar string ) partitioned (? string) row format delimited fields terminated '\t' location 's3://test.com/';
could answer question ? thanks!
first start right table definition. in case i'll use wrote:
create external table test ( foo string, time string, bar string ) partitioned (dt string) row format delimited fields terminated '\t' location 's3://test.com/';
hive default expects partitions in subdirectories named via convention s3://test.com/partitionkey=partitionvalue. example
s3://test.com/dt=2014-03-05
if follow convention can use msck add partitions.
if can't or don't want use naming convention, need add partitions in:
alter table test add partition (dt='2014-03-05') location 's3://test.com/2014-03-05'
Comments
Post a Comment