This is an article about aggregating independent queries into a single query, so I've elided definitions for how typed primitive data paths are defined in the first place and how to connect them to a Redis monad. For information on this, see my previous article on how to approach this problem.
Imagine a simple data model for Reddit's comment system in Redis:
threadIdToComments :: ThreadId ⟿ RoseTree CommentId
commentIdToComment :: CommentId ⟿ Comment
commentIdToScore :: CommentId ⟿ Integer
The definition of ⟿
isn't important for now; just read a ⟿ b
as "a primitive mapping from as to bs in Redis." In other words, there is a runQuery :: a ⟿ b -> a -> Redis b
and it should entail only one get
command.
Using the above model, I can use a Redis monad to retrieve comment data:
newtype CommentView = CommentView {
comment :: Comment
, votes :: Integer
}
getComments :: ThreadId -> Redis (RoseTree CommentView)
getComments tid = do
cids <- runQuery threadIdToComments tid
flip traverse cids $ \i ->
CommentView
<$> runQuery commentIdToComment i
<*> runQuery commentIdToScore i
The above paradigm can model business architecture quite well. All the primitive mappings (and hence, queries) are well-typed.
The problem is the monadic implementation is inefficient. In the above, the first query retrieves a RoseTree CommentId
, but then for every CommentId
in the tree an additional two queries are constructed! (In practice, it's typically much worse, as other data like "has the user already voted on this post" is also included.) If there are 100 comments in a thread, then a total of 201 queries would be executed here! At best (<*>
operates in parallel), there is unnecessary verbosity in our command pipeline, and at worst (<*>
is given in terms of >>=
) it's performing 199 consecutive operations with each one unnecessarily awaiting the result of the previous!
Fortunately, Redis itself has a solution to this problem: instead of executing many independent get
commands, Redis provides mget key1 key2...
, which will perform all the lookups at once in a single command. Which brings me to the topic of this post: is there a way to systematically construct mget
queries in a type-safe way?
Imagine a definition for ⟿
that will enable using mget
instead of get
so that we can aggregate independent queries together. The structure should provide as much functionality as possible without enabling the possibility to introduce computational dependency (i.e., "and then").
Ideally, there are a bunch of combinators I'd like to have on this structure:
aggregatePair :: a ⟿ b -> c ⟿ d -> (a , c) ⟿ (b , d)
Similarly, it would be nice to perform multiple lookups given a single identifier:
aggregate2 :: a ⟿ b -> a ⟿ c -> a ⟿ (b,c)
aggregate3 :: a ⟿ b -> a ⟿ c -> a ⟿ d -> a ⟿ (b,c,d)
...
But in fact, this is a red herring: with an Applicative
, we could do the same thing by (,...,) <$> a1 <*> ... <*> aN
, allowing us to scale to arbitrary tuple sizes rather than hardcoding each one.
Next, I want to execute a lookup path many independent times across different inputs:
aggregateList :: a ⟿ b -> [ a ] ⟿ [ b ]
But yet again, this feels unprincipled. Why hardcode a List
here? Shouldn't any other Traversable
do just as well? So ideally, it should be:
aggregateTraversable :: (Traversable t) => a ⟿ b -> t a ⟿ t b
Finally, if ⟿
is an Applicative
in the second (covariant) parameter, it must also be a Functor
. It seems intuitive that the first parameter should be contravariant, as this wouldn't affect our structure. A Contravariant
first parameter and a Covariant
second parameter gives a Profunctor
.
But in the covariant parameter we have more than just a Functor
: we have an Applicative
! Perhaps something similar should exist in the contravariant parameter? Well, look at what would be required:
-- Contravariant applicative
divide :: (a -> (b, c)) -> b ⟿ z -> c ⟿ z -> a ⟿ z
conquer :: a ⟿ z
This doesn't really make sense here. For divide
, it should take two queries which both yield a z
and produce a query which yields a z
. While possible, it doesn't seem like there's any principled way to choose which query should execute and which one should be thrown away. But even more problematic is conquer :: a ⟿ z
: it means that there should always be a query which yields a z
given an a
! That doesn't sound right, so the definition of ⟿
need not entail a contravariant analog of Applicative
. (Perhaps there is something similar if we allow more liberty in the second parameter, e.g., with aggregatePair
above and conquer :: a ⟿ ()
)
But Profunctor
is still on the table, and as it turns out, there is a typeclass of Profunctor
s which respects Traversable as stated above. Conveniently, there's even a FreeTraversing definition available!
Unfortunately, I can't figure out how to get the FreeTraversing
structure to respect the aggregatePair
above. If someone knows a way to do this, I'd like to see it!
Fortunately, there is still a general construction which will yield all of the above properties. First, I'll define it and demonstrate how it conforms to the above criteria, and then I'll show how to instantiate it for use in the original problem domain - aggregating Redis queries. The definition is inspired by an alternate notion of Traversable
: that Traversable
entails (1) a toList :: t a -> [a]
and (2) fromList :: t a -> [b] -> t b
with the law that x = fromList x (toList x)
.
data T x y a b = T (a -> ([x],[y] -> b)) deriving (Functor)
There is an additional unspecified constraint here that [x]
and [y]
are the same size, which could be formally specified in a dependently-typed context, but I've opted to simply use List
here for simplicity.
deriving (Functor)
gives the Functor
instance for free. The rest of the work must be done by hand. First, T
is a Profunctor
:
instance Profunctor (T x y) where
dimap f g (T a) = T $ \v ->
let (bs,g') = a (f v) in (bs, g . g')
It's a Traversing
(and consequently, a Strong
and Choice
):
instance Traversing (T x y) where
traverse' (T a) = T $ \ts ->
let m = unzip (a <$> toList ts) in
let (i,bs) = pullInner (fst m) in
(bs, reifyTraversal ts . zipWith ($) (snd m) . uncross i)
where
pullInner :: [[a]] -> (Int,[a])
pullInner [] = (0,[])
pullInner xs@(x : _) = (length x , xs >>= id)
uncross :: Int -> [a] -> [[a]]
uncross n xs =
let (as, bs) = splitAt n xs in
as : uncross n bs
reifyTraversal :: Traversable t => t a -> [b] -> t b
reifyTraversal t bs = S.evalState (traverse g t) bs
where
g a = do
(b:bs') <- S.get
S.put bs'
return b
instance Strong (T x y) where
first' = firstTraversing
instance Choice (T x y) where
left' = leftTraversing
And it's an Applicative
:
instance Applicative (T x y a) where
pure x = T (\_ -> ([], pure x))
(<*>) (T f) (T x) = T $ \a ->
let (bs1,ds1) = f a in
let (bs2,ds2) = x a in
(bs1 <> bs2, \mbs ->
let (xs,ys) = splitAt (length bs1) mbs in
ds1 xs (ds2 ys))
And independent objects with matching x
and y
can be merged:
aggregatePair :: T x y a b -> T x y c d -> T x y (a,c) (b,d)
aggregatePair (T f) (T g) = T $ \(a,c) ->
let (m,n) = f a
(o,p) = g c
in
(m <> o , \bs ->
let (xs,ys) = splitAt (length m) bs in
(n xs , p ys))
Finally, it reifies into a target Functor
(à la Sieve):
runT :: (Functor f) => ([x] -> f [y]) -> T x y a b -> a -> f b
runT i (T f) a = let (xs,ds) = f a in ds <$> i xs
Time to move this T
out of the abstract and into the original problem domain! mget
takes [ ByteString ]
and yields [ Maybe ByteString ]
(modulo exceptions), so plug in ByteString
and Maybe ByteString
to define ⟿
:
type (⟿) a b = T ByteString (Maybe ByteString) a b
Given this, it's trivial to reify into a Redis
monad with mget
!
runQuery :: a ⟿ b -> a -> Redis b
runQuery = runT mget'
Using the new definition of ⟿
and its available combinators, the original example can now be constructed efficiently!
data ThreadId
data CommentView = CommentView Comment Integer
threadIdToComments :: ThreadId ⟿ RoseTree CommentId
threadIdToComments = undefined
commentIdToScore :: CommentId ⟿ Integer
commentIdToScore = undefined
getComments :: ThreadId -> Redis (RoseTree CommentView)
getComments = runQuery threadIdToComments
>=> (runQuery . traverse' $
CommentView <$> commentIdToComment <*> commentIdToScore)
Now, no matter how many comments are in the thread, all the necessary data will be retrieved in exactly two queries!
Thanks to /u/benjaminhodgson, it turns out there is a much cleaner way to make this T
structure and still maintain all the original functionality! The above T x y a b
is isomorphic to Traversal a b x y
, and if we define T
in terms of Traversal
from the start, everything comes out much cleaner with no need to slice-and-dice lists internally:
newtype T' x y a b = T' (Traversal a b x y) deriving (Functor)
instance Profunctor (T' x y) where
dimap f g (T' t) = T' $ \m -> fmap g . (t m . f)
instance Traversing (T' x y) where
traverse' (T' t) = T' (traverse . t)
instance Applicative (T' x y a) where
pure x = T' (\_ _ -> pure x)
(<*>) (T' f) (T' x) = T' $ \g a -> f g a <*> x g a
aggregatePair :: T' x y a b -> T' x y c d -> T' x y (a,c) (b,d)
aggregatePair (T' f) (T' g) = T' $ \h (a,c) ->
(,) <$> f h a <*> g h c
instance Strong (T' x y) where
first' = firstTraversing
instance Choice (T' x y) where
left' = leftTraversing
runQuery' :: T' ByteString (Maybe ByteString) a b -> a -> Redis b
runQuery' (T' f) = unsafePartsOf f mget'