Independent Redis Query Aggregation

Preface

This is an article about aggregating independent queries into a single query, so I've elided definitions for how typed primitive data paths are defined in the first place and how to connect them to a Redis monad. For information on this, see my previous article on how to approach this problem.

Motivating example

Imagine a simple data model for Reddit's comment system in Redis:

threadIdToComments :: ThreadId ⟿ RoseTree CommentId
commentIdToComment :: CommentId ⟿ Comment
commentIdToScore :: CommentId ⟿ Integer

The definition of ⟿ isn't important for now; just read a ⟿ b as "a primitive mapping from as to bs in Redis." In other words, there is a runQuery :: a ⟿ b -> a -> Redis b and it should entail only one get command.

Using the above model, I can use a Redis monad to retrieve comment data:


newtype CommentView = CommentView {
    comment :: Comment
  , votes :: Integer
  }

getComments :: ThreadId -> Redis (RoseTree CommentView)
getComments tid = do
  cids <- runQuery threadIdToComments tid
  flip traverse cids $ \i ->
        CommentView
    <$> runQuery commentIdToComment i
    <*> runQuery commentIdToScore i

The above paradigm can model business architecture quite well. All the primitive mappings (and hence, queries) are well-typed.

The problem is the monadic implementation is inefficient. In the above, the first query retrieves a RoseTree CommentId, but then for every CommentId in the tree an additional two queries are constructed! (In practice, it's typically much worse, as other data like "has the user already voted on this post" is also included.) If there are 100 comments in a thread, then a total of 201 queries would be executed here! At best (<*> operates in parallel), there is unnecessary verbosity in our command pipeline, and at worst (<*> is given in terms of >>=) it's performing 199 consecutive operations with each one unnecessarily awaiting the result of the previous!

Fortunately, Redis itself has a solution to this problem: instead of executing many independent get commands, Redis provides mget key1 key2..., which will perform all the lookups at once in a single command. Which brings me to the topic of this post: is there a way to systematically construct mget queries in a type-safe way?

Fantasies of a perfect structure

Imagine a definition for ⟿ that will enable using mget instead of get so that we can aggregate independent queries together. The structure should provide as much functionality as possible without enabling the possibility to introduce computational dependency (i.e., "and then").

Ideally, there are a bunch of combinators I'd like to have on this structure:

aggregatePair :: a ⟿ b -> c ⟿ d -> (a , c) ⟿ (b , d)

Similarly, it would be nice to perform multiple lookups given a single identifier:

aggregate2 :: a ⟿ b -> a ⟿ c -> a ⟿ (b,c)
aggregate3 :: a ⟿ b -> a ⟿ c -> a ⟿ d -> a ⟿ (b,c,d)
...

But in fact, this is a red herring: with an Applicative, we could do the same thing by (,...,) <$> a1 <*> ... <*> aN, allowing us to scale to arbitrary tuple sizes rather than hardcoding each one.

Next, I want to execute a lookup path many independent times across different inputs:

aggregateList :: a ⟿ b -> [ a ] ⟿ [ b ]

But yet again, this feels unprincipled. Why hardcode a List here? Shouldn't any other Traversable do just as well? So ideally, it should be:

aggregateTraversable :: (Traversable t) => a ⟿ b -> t a ⟿ t b

Finally, if ⟿ is an Applicative in the second (covariant) parameter, it must also be a Functor. It seems intuitive that the first parameter should be contravariant, as this wouldn't affect our structure. A Contravariant first parameter and a Covariant second parameter gives a Profunctor.

But in the covariant parameter we have more than just a Functor: we have an Applicative! Perhaps something similar should exist in the contravariant parameter? Well, look at what would be required:

-- Contravariant applicative
divide :: (a -> (b, c)) -> b ⟿ z -> c ⟿ z -> a ⟿ z
conquer :: a ⟿ z

This doesn't really make sense here. For divide, it should take two queries which both yield a z and produce a query which yields a z. While possible, it doesn't seem like there's any principled way to choose which query should execute and which one should be thrown away. But even more problematic is conquer :: a ⟿ z: it means that there should always be a query which yields a z given an a! That doesn't sound right, so the definition of ⟿ need not entail a contravariant analog of Applicative. (Perhaps there is something similar if we allow more liberty in the second parameter, e.g., with aggregatePair above and conquer :: a ⟿ ())

But Profunctor is still on the table, and as it turns out, there is a typeclass of Profunctors which respects Traversable as stated above. Conveniently, there's even a FreeTraversing definition available!

Unfortunately, I can't figure out how to get the FreeTraversing structure to respect the aggregatePair above. If someone knows a way to do this, I'd like to see it!

A general structure with the desired properties

Fortunately, there is still a general construction which will yield all of the above properties. First, I'll define it and demonstrate how it conforms to the above criteria, and then I'll show how to instantiate it for use in the original problem domain - aggregating Redis queries. The definition is inspired by an alternate notion of Traversable: that Traversable entails (1) a toList :: t a -> [a] and (2) fromList :: t a -> [b] -> t b with the law that x = fromList x (toList x).

data T x y a b = T (a -> ([x],[y] -> b)) deriving (Functor)

There is an additional unspecified constraint here that [x] and [y] are the same size, which could be formally specified in a dependently-typed context, but I've opted to simply use List here for simplicity.

deriving (Functor) gives the Functor instance for free. The rest of the work must be done by hand. First, T is a Profunctor:

instance Profunctor (T x y) where
  dimap f g (T a) = T $ \v ->
    let (bs,g') = a (f v) in (bs, g . g')

It's a Traversing (and consequently, a Strong and Choice):


instance Traversing (T x y) where
  traverse' (T a) = T $ \ts ->
    let m = unzip (a <$> toList ts) in
    let (i,bs) = pullInner (fst m) in
    (bs, reifyTraversal ts . zipWith ($) (snd m) . uncross i)

    where
      pullInner :: [[a]] -> (Int,[a])
      pullInner [] = (0,[])
      pullInner xs@(x : _) = (length x , xs >>= id)

      uncross :: Int -> [a] -> [[a]]
      uncross n xs =
        let (as, bs) = splitAt n xs in
        as : uncross n bs

      reifyTraversal :: Traversable t => t a -> [b] -> t b
      reifyTraversal t bs = S.evalState (traverse g t) bs
        where
          g a = do
            (b:bs') <- S.get
            S.put bs'
            return b

instance Strong (T x y) where
  first' = firstTraversing

instance Choice (T x y) where
  left' = leftTraversing

And it's an Applicative:

instance Applicative (T x y a) where
  pure x = T (\_ -> ([], pure x))
  (<*>) (T f) (T x) = T $ \a ->
    let (bs1,ds1) = f a in
    let (bs2,ds2) = x a in
    (bs1 <> bs2, \mbs ->
      let (xs,ys) = splitAt (length bs1) mbs in
      ds1 xs (ds2 ys))

And independent objects with matching x and y can be merged:

aggregatePair :: T x y a b -> T x y c d -> T x y (a,c) (b,d)
aggregatePair (T f) (T g) = T $ \(a,c) ->
  let (m,n) = f a
      (o,p) = g c
  in
  (m <> o , \bs ->
    let (xs,ys) = splitAt (length m) bs in
    (n xs , p ys))

Finally, it reifies into a target Functor (à la Sieve):

runT :: (Functor f) => ([x] -> f [y]) -> T x y a b -> a -> f b
runT i (T f) a = let (xs,ds) = f a in ds <$> i xs

Reification in Redis

Time to move this T out of the abstract and into the original problem domain! mget takes [ ByteString ] and yields [ Maybe ByteString ] (modulo exceptions), so plug in ByteString and Maybe ByteString to define ⟿:

type (⟿) a b = T ByteString (Maybe ByteString) a b

Given this, it's trivial to reify into a Redis monad with mget!

runQuery :: a ⟿ b -> a -> Redis b
runQuery = runT mget'

Solving the original problem

Using the new definition of ⟿ and its available combinators, the original example can now be constructed efficiently!

data ThreadId

data CommentView = CommentView Comment Integer

threadIdToComments :: ThreadId ⟿ RoseTree CommentId
threadIdToComments = undefined

commentIdToScore :: CommentId ⟿ Integer
commentIdToScore = undefined

getComments :: ThreadId -> Redis (RoseTree CommentView)
getComments = runQuery threadIdToComments
  >=> (runQuery . traverse' $
         CommentView <$> commentIdToComment <*> commentIdToScore)

Now, no matter how many comments are in the thread, all the necessary data will be retrieved in exactly two queries!

Update!

Thanks to /u/benjaminhodgson, it turns out there is a much cleaner way to make this T structure and still maintain all the original functionality! The above T x y a b is isomorphic to Traversal a b x y, and if we define T in terms of Traversal from the start, everything comes out much cleaner with no need to slice-and-dice lists internally:


newtype T' x y a b = T' (Traversal a b x y) deriving (Functor)

instance Profunctor (T' x y) where
  dimap f g (T' t) = T' $ \m -> fmap g . (t m . f)

instance Traversing (T' x y) where
  traverse' (T' t) = T' (traverse . t)

instance Applicative (T' x y a) where
  pure x = T' (\_ _ -> pure x)
  (<*>) (T' f) (T' x) = T' $ \g a -> f g a <*> x g a

aggregatePair :: T' x y a b -> T' x y c d -> T' x y (a,c) (b,d)
aggregatePair (T' f) (T' g) = T' $ \h (a,c) ->
  (,) <$> f h a <*> g h c

instance Strong (T' x y) where
  first' = firstTraversing

instance Choice (T' x y) where
  left' = leftTraversing

runQuery' :: T' ByteString (Maybe ByteString) a b -> a -> Redis b
runQuery' (T' f) = unsafePartsOf f mget'