Advanced Combinatorial Derivatives
Advanced Combinatorial Derivatives
Michael Hardy
                                      School of Mathematics
                      University of Minnesota, Minneapolis, MN 55455, USA
                                      hardy@math.umn.edu
            Submitted: Oct 1, 2005; Accepted: Dec 24, 2005; Published: Jan 7, 2006
               Mathematics Subject Classifications: 05A15, 05A18, 11B73, 05-02
                                                Abstract
          The natural forms of the Leibniz rule for the kth derivative of a product and
      of Faà di Bruno’s formula for the kth derivative of a composition involve the dif-
      ferential operator ∂ k /∂x1 · · · ∂xk rather than dk /dxk , with no assumptions about
      whether the variables x1 , . . . , xk are all distinct, or all identical, or partitioned into
      several distinguishable classes of indistinguishable variables. Coefficients appearing
      in forms of these identities in which some variables are indistinguishable are just
      multiplicities of indistinguishable terms (in particular, if all variables are distinct
      then all coefficients are 1). The computation of the multiplicities in this gener-
      alization of Faà di Bruno’s formula is a combinatorial enumeration problem that,
      although completely elementary, seems to have been neglected. We apply the results
      to cumulants of probability distributions.
1     Introduction
Both the well-known Leibniz rule
                                              Xk   `
                                    dk            k d u dk−` v
                                       (uv) =          ·                                              (1)
                                   dxk        `=0
                                                  ` dx` dxk−`
(where the sum is over all k-tuples (m1 , . . . , mk ) of non-negative integers satisfying the
constraint m1 + 2m2 + 3m3 + · · · + kmk = k) are formulas for kth derivatives of functions
of functions of x. That is what the left sides of these identities share in common. The
                                                     ∂k
                                                 ∂x1 · · · ∂xk
instead of dk /dxk . In that form, all coefficients on the right sides are 1.
   2. There should be no assumptions about whether the variables x1 , . . . , xk are all dis-
      tinct, or all identical, or partitioned into several distinguishable classes of indis-
      tinguishable variables. When some variables become indistinguishable, so do some
      of the terms on the right sides of the identities. Indistinguishable terms then get
      collected, so that each constant coefficient of a term on the right side is that term’s
      multiplicity. When all of the variables are indistinguishable, then the multiplicities
      are the coefficients in (1) and (2) above.
                                   ∂y    ∂2y    ∂y   ∂2y
                                      ·       +    ·
                                   ∂x2 ∂x1 ∂x3 ∂x3 ∂x1 ∂x2
collapses to the term
                                               ∂y    ∂2y
                                          2·      ·
                                               ∂x2 ∂x1 ∂x2
with multiplicity 2.
    The chain rule and the product rule are enough to entail that the coefficients must be
positive integers. But without Point 1 and Point 2, it is not obvious what, if anything,
they enumerate.
    Two papers, Constantine and Savits [5] and Leipnik and Reid [9], give an identity ex-
pressing (∂ k1 +···+kn /∂xk11 · · · ∂xknn )f (y) as a linear combination of products of derivatives
of f (y) with respect to y and of y with respect to the independent variables. But neither
of those sources mentions that as more and more variables become indistinguishable the
    Example 1.
                          ∂3                         ∂3y
                                 f (y) = f 0 (y)
                     ∂x1 ∂x2 ∂x3                 ∂x1 ∂x2 ∂x3
                                         |         {z      }
                                                     1 block; 1st derivative of f.
                                                                   
                       00   ∂y    ∂2y     ∂y    ∂2y     ∂y    ∂2y
                 +    f (y)    ·        +     ·       +    ·
                            ∂x1 ∂x2 ∂x3   ∂x2 ∂x1 ∂x3   ∂x3 ∂x1 ∂x2
                      |                     {z                      }
                                      2 blocks in each partition; 2nd derivative of f.
                                ∂y ∂y ∂y
                 +    f 000 (y)     ·  ·    .
                                ∂x1 ∂x2 ∂x3
                      |            {z     }
                      3 blocks; 3rd derivative of f.
    1
     Perhaps as a result of studying set theory, I was surprised when I learned that some respectable
combinatorialists consider such things as this to be mere convention. One of them even said a case could
be made for setting the number of partitions to 0 when n = 0. By stark contrast, Gian-Carlo Rota wrote
in [13], p. 15, that “the kind of mathematical reasoning that physicists find unbearably pedantic” leads
not only to the conclusion that the elementary symmetric function in no variables is 1, but straight from
there to the theory of the Euler characteristic, so that “such reasoning does pay off.” The only other
really sexy example I know is from applied statistics: the non-central chi-square distribution with zero
degrees of freedom, unlike its “central” counterpart, is non-trivial.
                         { x1 , . . . , x1 , . . . , xn , . . . , xn } + { x1 , . . . , x1 , . . . , xn , . . . , xn }
                           | {z }                    | {z }                | {z }                    | {z }
                                  `1                     `n                       m1                     mn
                    = { x1 , . . . , x1 , . . . , xn , . . . , xn } .
                        | {z }                    | {z }
                               `1 +m1                 `n +mn
Only when sets are disjoint is their sum the same as their union.
between multisets and partial differential operators, Proposition 2 entails that the expan-
sion of the partial derivative
                                                              ∂8
                            ∂{ 1,1,1,1,5,5,7,8 } f (y) = 4 2            f (y)                 (7)
                                                        ∂x1 ∂x5 ∂x7 ∂x8
contains (among many others) this term:
                                                                          2
                    000
                                       2                000      ∂3y                 ∂2y
                 6f (y) ∂{ 1,1,5 } y        ∂{ 7,8 } y = 6f (y)                 ·             (8)
                                                                ∂x21 ∂x5            ∂x7 ∂x8
(the order of the derivative of f is 3 because that is how many blocks are in this partition).
    An extreme case of “collapsing” is exemplified by the question: How many partitions
of the set { 1, 2, 3, 4, 5, 6, 7, 8 } collapse to the partition 3 + 3 + 2 of the number 8 when
all 8 members of the set { 1, 2, 3, 4, 5, 6, 7, 8 } collapse into indistinguishability? Again, a
simple exercise shows that the answer is 280. In this extreme case where all members of
the set become indistinguishable and partitions of the multiset become partitions of an
integer, the answer to the problem of enumeration of collapsing partitions is well known
to be given by the coefficients in Faà di Bruno’s formula – in this case by the coefficient
of                                                3 2 2
                                            000    dy    d y
                                           f (y)     3
                                                   dx    dx2
                        ∂ k1 +···+kn               X
             ∂τ f (y) = k1              f (y) =             Mf (•) (y) · (∂τ1 y · ∂τ2 y · · · · ) ,
                       ∂x1 · · · ∂xnk n
                                                τ1 +τ2 +···
where the sum is over all partitions τ1 +τ2 +· · · of the multiset τ , the order of the derivative
f (•) (y) is the number of terms in the partition τ1 + τ2 + · · · , and the multiplicity M is the
number of partitions of the set { 1, 2, 3, . . . , k1 + · · · + kn } that collapse to the partition
τ1 + τ2 + · · · of the multiset τ when the set { 1, 2, 3, . . . , k1 + · · · + kn } collapses to the
multiset τ .
     In order to use it in the next result, we introduce a convention:
     Notational convention. For any multiset σ let σ!! denote the product of the facto-
rials of the multiplicities of the members of σ. For example, { 1, 1, 1, 1, 2, 2, 2 }!! = 4!3! =
144.
     The next result will be proved in Section 5:
     Proposition 4.
     Let τ = { 1, . . . , 1, . . . , n, . . . , n }. Consider a partition
                 | {z }              | {z }
                   k1               kn
                               τ = τ1 + · · · + τ1 + τ2 + · · · + τ2 + · · ·
                                   |     {z      } |       {z      }
                                            m1                   m2
in which τ1 , τ2 , . . . are all distinct (so that m1 , m2 , m3 , . . . are multiplicities of yet another
sort). Denote this by
                                        τ = m1 τ1 + m2 τ2 + · · · .
Then the number of partitions of the set { 1, 2, 3, . . . , k1 + · · · + kn } that collapse to the
partition m1 τ1 +m2 τ2 +m3 τ3 +· · · of the multiset τ when the set { 1, 2, 3, . . . , k1 + · · · + kn }
collapses to the multiset τ is
                                                  k1 ! · · · kn !
                                                                                  .                    (9)
                                 τ1 !! 1 τ2 !! 2 τ3 !!m3 · · · m1 !m2 !m3 ! · · ·
                                      m       m
                                dk         X             Y  d |B|
                                               (|π|)
                                   f (y) =   f       (y)            y,                                (10)
                               dxk         π             B∈π
                                                             dx
5       Proof of Proposition 4
Imagine k1 + · · · + kn Scrabble tiles. On the first k1 of these, the number “1” is written;
on the next k2 of them, “2” appears; and so on. These are partitioned into m1 copies of
the multiset τ1 , m2 copies of τ2 , and so on.
   Permuting the k1 “1”s or the k2 “2”s, etc., does not alter the set of m1 + · · · + mn
“words”. Thus there are k1 ! · · · kn ! permutations of the k1 + · · · + kn tiles representing the
partition
                                    τ = m1 τ1 + m2 τ2 + · · · .
Permuting the identical elements within any block of this partition does not alter which
partition of τ we have, and therefore the product k1 ! · · · kn ! gets divided by τ1 !!m1 τ2 !!m2 · · · .
Neither does permuting the mi blocks identical to τi , and therefore it also gets divided by
m1 ! · · · mn !. 
    2
    An anonymous referee suggests that that pontiff may have been influenced more by Faà di Bruno’s
charitable than mathematical work.
In the most extreme case, all of the independent variables become indistinguishable, and
we have the familiar Leibniz rule (1). The proofs of Propositions 5 and 6 are left as
exercises.
    As far as the present writer knows, the second part of Proposition 5 is new. It identifies
the easy combinatorial problem that Proposition 6 solves. Proposition 6 can be found in
both [5] and [4], p. 131, but without the combinatorial interpretation of the coefficients.
The first part of Proposition 5 is an important special case of Proposition 6, and we do
not know of any earlier explicit statement of it than that in the present paper (a referee
says “it could be just about anywhere” and I have not succeeded in proving otherwise).
log E (exp(t1 X1 + · · · + tn Xn ))
(at least in the case in which all moments exist). More tersely stated, the joint cumulant-
generating function is the logarithm of the P joint moment-generating                   function. Since the
                                                                   i1       in
coefficients of a power series of the form i1 ,...,in ci1 ,...,in t1 · · · tn /(i1 ! · · · in !) are its partial
derivatives at t1 = · · · = tn = 0, Speed’s topic is a special case of ours.
8      Acknowledgments
Ezra Miller and Jay Goldman offered helpful suggestions. Unlike some copyeditors, Ellen
Stuttle found the actual content interesting. Alex Craik pointed out some relevant refer-
ences and reminded me of Arbogast’s anticipation of Faà di Bruno’s formula. In addition
to some encouraging words, Doron Zeilberger pointed out his paper [19]; which is even
more “discretely” oriented than the present paper (which, the reader will have noticed,
disdains even to mention such “analytical” things as differentiability).
References
 [1] L.F.A. Arbogast, Du Calcul des Dérivations, Levrault, Strasbourg, 1800.
 [2] F. Faà di Bruno, Sullo Sviluppo delle Funzioni, Annali di Scienze Matematiche e
     Fisiche 6, 1855 pp. 479-480.
 [3] F. Faà di Bruno, Note sur un nouvelle formule de calcul différentiel, Quarterly Journal
     of Pure and Applied Mathematics, 1, 1857, pp. 359-360.
 [4] L. Comtet, Advanced Combinatorics: The Art of Finite and Infinite Expansions,
     Reidel, Dordrecht-Holland/Boston, 1974.
 [5] G.M. Constantine and T.H. Savits, A Multivariate Faa di Bruno Formula with Appli-
     cations, Transactions of the American Mathematical Society, 348, 1996 pp. 503-520.
 [7] R.A. Fisher and J. Wishart, The derivation of the pattern formulae of two-way parti-
     tions from those of simpler patterns, Proceedings of the London Mathematical Society,
     Series 2, 33, 1931, pp. 195-208.
 [8] W.P. Johnson, The Curious History of Faà di Bruno’s Formula, Am. Math. Monthly
     109, 2002, pp. 217-227.
 [9] R. Leipnik and T. Reid, Multivariable Faa di Bruno Formulas, Electronic Proceedings
     of the Ninth Annual International Conference on Technology in Collegiate Mathemat-
     ics, http://archives.math.utk.edu/ICTCM/EP-9.html#C23 .
[10] R. Mishkov, Generalization of the Formula of Faa di Bruno for a Composite Function
     with a Vector Argument, International Journal of Mathematical Sciences, 24, 2000,
     pp. 481-491.
[12] G-C. Rota, The Number of Partitions of a Set, Am. Math. Monthly 71, 1964, pp. 498-
     504.
[13] C-C. Rota, Geometric Probability, Mathematical Intelligencer, 20 (4), 1998, pp. 11-
     16.
[14] T.P. Speed, Cumulants and Partition Lattices, Australian Journal of Statistics, 25
     (2), 1983, pp. 378-388.
[16] T.N. Thiele, Forlæsinger over Almindelig Iagttagelselære, Reitzel, Copenhagen, 1889.
[17] T.N. Thiele, Theory of Observations, Layton, London, 1903. Reprinted in Annals of
     Mathematical Statistics, 2, 1931, pp. 165-308.
[18] W.C. Yang, Derivatives are Essentially Integer Partitions, Discrete Mathematics,
     222, 2000, pp. 235-245.