java - Optimal merging of triplets -

i'm trying come algorithm following problem :

i've got collection of triplets of integers - let's call these integers a, b, c. value stored inside can big, it's impossible create array of size a, b, or c. goal minimize size of collection. this, we're provided simple rule allows merge triplets :

for 2 triplets (a, b, c) , (a', b', c'), remove original triplets , place triplet (a | a', b, c) if b == b' , c = c', | bitwise or. similar rules hold b , c also.

in other words, if 2 values of 2 triplets equal, remove these 2 triplets, bitwise or third values , place result collection.

the greedy approach misleading in similar cases , problem, can't find simple counterexample that'd lead correct solution. list 250 items correct solution 14, average size computed greedy merging 30 (varies 20 70). sub-optimal overhead gets bigger list size increases.

i've tried playing around set bit counts, i've found no meaningful results. obvious fact if records unique (which safe assume), set bit count increases.

here's stupid greedy implementation (it's conceptual thing, please don't regard code style) :

public class record {     long a;     long b;     long c;      public static void main(string[] args) {         list<record> data = new arraylist<>();         // fill data          boolean found;          {             found = false;             outer:             (int = 0; < data.size(); ++i) {                 (int j = i+1; j < data.size(); ++j) {                     try {                         record r = merge(data.get(i), data.get(j));                         found = true;                         data.remove(j);                         data.remove(i);                         data.add(r);                         break outer;                     } catch (illegalargumentexception ignored) {                     }                 }             }         } while (found);     }      public static record merge(record r1, record r2) {         if (r1.a == r2.a && r1.b == r2.b) {             record r = new record();             r.a = r1.a;             r.b = r1.b;             r.c = r1.c | r2.c;             return r;         }         if (r1.a == r2.a && r1.c == r2.c) {             record r = new record();             r.a = r1.a;             r.b = r1.b | r2.b;             r.c = r1.c;             return r;         }         if (r1.b == r2.b && r1.c == r2.c) {             record r = new record();             r.a = r1.a | r2.a;             r.b = r1.b;             r.c = r1.c;             return r;         }         throw new illegalargumentexception("unable merge these 2 records!");     }

do have idea how solve problem?

this going long answer, sadly without optimal solution (sorry). serious attempt @ applying greedy problem solving problem, may useful in principle. didn't implement last approach discussed, perhaps approach can yield optimal solution -- can't guarantee though.

level 0: not greedy

by definition, greedy algorithm has heuristic choosing next step in way locally optimal, i.e. optimal right now, hoping reach global optimum may or may not possible always.

your algorithm chooses mergable pair , merges them , moves on. no evaluation of merge implies , whether there better local solution. because of wouldn't call approach greedy @ all. a solution, an approach. call the blind algorithm can succinctly refer in answer. use modified version of algorithm, which, instead of removing 2 triplets , appending merged triplet, removes second triplet , replaces first 1 merged one. order of resulting triplets different , final result possibly too. let me run modified algorithm on representative data set, marking to-be-merged triplets *:

0: 3 2 3   3 2 3   3 2 3 1: 0 1 0*  0 1 2   0 1 2 2: 1 2 0   1 2 0*  1 2 1 3: 0 1 2* 4: 1 2 1   1 2 1* 5: 0 2 0   0 2 0   0 2 0  result: 4

level 1: greedy

to have greedy algorithm, need formulate merging decision in way allows comparison of options, when multiple available. me, intuitive formulation of merging decision was:

if merge these 2 triplets, resulting set have maximum possible number of mergable triplets, when compared result of merging other 2 triplets current set?

i repeat, intuitive for me. have no proof leads globally optimal solution, not lead better-or-equal solution blind algorithm -- fits definition of greedy (and easy implement). let's try on above data set, showing between each step, possible merges (by indicating indices of triplet pairs) , resulting number of mergables each possible merge:

          mergables 0: 3 2 3  (1,3)->2 1: 0 1 0  (1,5)->1 2: 1 2 0  (2,4)->2 3: 0 1 2  (2,5)->2 4: 1 2 1 5: 0 2 0

any choice except merging triplets 1 , 5 fine, if take first pair, same interim set blind algorithm (i time collapse indices remove gaps):

          mergables 0: 3 2 3  (2,3)->0 1: 0 1 2  (2,4)->1 2: 1 2 0 3: 1 2 1 4: 0 2 0

this algorithm gets differently: chooses triplets 2 , 4 because there still 1 merge possible after merging them in contrast choice made blind algorithm:

          mergables 0: 3 2 3  (2,3)->0   3 2 3 1: 0 1 2             0 1 2 2: 1 2 0             1 2 1 3: 1 2 1  result: 3

level 2: greedy

now, second step intuitive heuristic look ahead 1 merge further , ask heuristic question then. generalized, ahead k merges further , apply above heuristic, backtrack , decide best option. gets verbose now, exemplify, perform 1 step of new heuristic lookahead 1:

          mergables 0: 3 2 3  (1,3)->(2,3)->0 1: 0 1 0         (2,4)->1* 2: 1 2 0  (1,5)->(2,4)->0 3: 0 1 2  (2,4)->(1,3)->0 4: 1 2 1         (1,4)->0 5: 0 2 0  (2,5)->(1,3)->1*                  (2,4)->1*

merge sequences marked asterisk best options when new heuristic applied.

in case verbal explanation necessary:

instead of checking how many merges possible after each possible merge starting set; time check how many merges possible after each possible merge each resulting set after each possible merge starting set. , lookahead 1. lookahead n, you'd seeing long sentence repeating part after each possible merge each resulting set n times.

level 3: let's cut greed

if closely, previous approach has disastrous perfomance moderate inputs , lookaheads(*). inputs beyond 20 triplets beyond 4-merge-lookahead takes unreasonably long. idea here cut out merge paths seem worse existing solution. if want perform lookahead 10, , specific merge path yields less mergables after 3 merges, path after 5 merges, may cut current merge path , try one. should save lot of time , allow large lookaheads closer globally optimal solution, hopefully. haven't implemented 1 testing though.

_{(*): assuming large reduction of input sets possible, number of merges proportional input size, , lookahead approximately indicates how permute merges. have choose lookahead |input|, binomial coefficient lookahead ≪ |input| can approximated o(|input|^lookahead) -- (rightfully) written you thoroughly screwed.}

putting together

i intrigued enough problem sat , coded down in python. sadly, able prove different lookaheads yield possibly different results, , blind algorithm gets better lookahead 1 or 2. direct proof solution not optimal (at least lookahead ≪ |input|). see source code , helper scripts, proof-triplets on github. warned that, apart memoization of merge results, made no attempt @ optimizing code cpu-cycle-wise.

Search This Blog

Cap