Skip to content
Snippets Groups Projects
Commit c2d5af44 authored by Keno Goertz's avatar Keno Goertz
Browse files

Swith to Wallenius' instead of Fisher's distribution

parent 886ee8f7
No related merge requests found
......@@ -5,6 +5,9 @@ all: figures
pdflatex -jobname=thesis main.tex
rm -f *.bbl *.blg *.log *.aux *.toc *.out *.bcf *.run.xml
clean:
rm -f *.bbl *.blg *.log *.aux *.toc *.out *.bcf *.run.xml
figures:
cd figures; . venv/bin/activate; python3 generate_figures.py
......
@article{Fog2008Sampling,
@article{Fog2008Calculation,
author = {Agner Fog},
title = {Sampling Methods for Wallenius' and Fisher's Noncentral Hypergeometric Distributions},
title = {Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution},
journal = {Communications in Statistics - Simulation and Computation},
volume = {37},
number = {2},
pages = {241--257},
pages = {258--273},
year = {2008},
publisher = {Taylor \& Francis},
doi = {10.1080/03610910701790236},
URL = {https://doi.org/10.1080/03610910701790236},
eprint = {https://doi.org/10.1080/03610910701790236}
doi = {10.1080/03610910701790269},
URL = {https://doi.org/10.1080/03610910701790269},
eprint = {https://doi.org/10.1080/03610910701790269}
}
@BOOK{Forbes2010Statistical,
......
......@@ -51,98 +51,124 @@ It should be emphasized that a witness is required to keep a record of time-stam
Going back to our example of time-stamps published in a newspaper, $N$ does \emph{not} correspond to the number of copies printed.
Instead, $N$ refers to the number of places that keep archives of the newspaper.
We assume that there exist a number $E$ of malicious witnesses that collude together with the TSA in an attempt to backdate time-stamps.
We assume that there exist a number $K$ of malicious witnesses that collude together with the TSA in an attempt to backdate time-stamps.
Finally, a client consults a number $n$ of witnesses to verify a time-stamp.
The client only accepts the time-stamp if all $n$ selected witnesses confirm its existence at the given time.
Let $e$ be the number of maliciously colluding witnesses selected by the client.
Evidently, a successful backdating attack occurs when the client selects only colluding witnesses, so when $e=n$.
Let $k$ be the number of maliciously colluding witnesses selected by the client.
Evidently, a successful backdating attack occurs when the client selects only colluding witnesses, so when $k=n$.
Let us now further assume that the client selects its $n$ witnesses from the total number of witnesses $N$ completely at random.
Our problem is now equivalent to the urn problem when ``drawing without replacement''.
$e$ thus follows the hypergeometric distribution. \footfullcite[pp.~117-119]{Forbes2010Statistical}
$k$ thus follows the hypergeometric distribution\footfullcite[pp.~117-119]{Forbes2010Statistical} with the probability mass function:
\begin{equation}
\left. P(e=k)=\binom{E}{k}\binom{N-E}{n-k} \middle/ \binom{N}{n}\right.
\left. \mathrm{hypg}(k; n, K, N)=\binom{K}{k}\binom{N-K}{n-k} \middle/ \binom{N}{n}\right.
\end{equation}
The probability of a successful backdating attack is then given by the equation:
\begin{equation}
\left. P(e=n)=\binom{E}{n} \middle/ \binom{N}{n}\right.
\left. P(k=n)=\mathrm{hypg}(n; n, K, N)=\binom{K}{n} \middle/ \binom{N}{n}\right.
\end{equation}
Figure~\ref{figure::backdating_probability_hypergeometric} graphs this probability as a function of $E$ for different values of $n$.
\begin{figure}
\includegraphics{figures/backdating_probability_hypergeometric.png}
\caption{\label{figure::backdating_probability_hypergeometric}
Probability of a successful backdating attack according to the hypergeometric distribution.
$N=30$ witnesses keep records of the time-stamps issued by the TSA.
Of these witnesses, a number $E$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
Of these witnesses, a number $K$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
To check a time-stamp's validity, a client consults $n$ randomly selected witnesses.
The backdating attack is successful if all $n$ selected witnesses are malicious.
As expected, the probability of a successful backdating attack increases with an increasing number of colluding witnesses $E$, reaching 1 when $N=E$.
As expected, the probability of a successful backdating attack increases with an increasing number of colluding witnesses $K$, reaching 1 when $N=K$.
The client can decrease the likelihood of a successful backdating attack by consulting more witnesses, as can be observed from the different graph lines.
}
\end{figure}
Figure~\ref{figure::backdating_probability_hypergeometric} graphs this probability as a function of $K$ for different values of $n$.
In practice, the selection of witnesses may not be truly random.
Sticking to our example of newspaper archives, a client will likely prefer libraries which are geographically close to them.
A network protocol for distributed trust may also favor witnesses with small round-trip times in order to increase performance.
An attacker may be able to leverage this by placing colluding witnesses at favorable locations.
We can model this by introducing a weight parameter $\omega$, where a malicious witness is $\omega$ times more likely to be selected than an honest witness.
$e$ then follows Fisher's noncentral hypergeomtric distribution. \footfullcite{Fog2008Sampling}
$k$ then follows a noncentral hypergeometric distribution.
Two distinct noncentral hypergeometric distributions exist in the literature.
They are frequently confused, because their difference is subtle and both are regularly referred to as ``the'' noncentral hypergeometric distribution. \footfullcite{Fog2008Calculation}
Fisher's noncentral hypergeometric distribution models the case where multiple balls are drawn from the urn at once and thus the probability of drawing one item is independent of the other items that are drawn.
The precise sample size $n$ can not be known in advance in this case.
Wallenius' noncentral hypergeometric distribution, on the other hand, models the case of sequentially drawing balls from the urn, for a total number of $n$ draws that has been determined in advance. \footnote{For a detailed discussion on the distinction between Wallenius' and Fisher's noncentral hypergeometric distribution, see: \fullcite{Fog2008Calculation}}
As the client in our model determines the number $n$ of witnesses to consult in advance, $k$ follows Wallenius' noncentral hypergeometric distribution.
The client selects witnesses in rounds.
$k_\nu$ describes how many malicious witnesses have been selected after the completion of round $\nu$.
The probability of selecting a malicious witness in round $\nu+1$ corresponds to the weight ratio of the remaining witnesses:
\begin{equation}
p_{\nu+1}=\frac{(K-k_\nu)\omega}{(K-k_\nu)\omega+(N-K)-(n-k_\nu)}
\end{equation}
The probability mass function for $k$ after selecting all $n$ witnesses is:
\begin{align}
e_{\mathrm{min}}&=\max(0, n+E-N)\\
e_{\mathrm{max}}&=\min(n, E)\\
P(e=k)&=\left. \binom{E}{k}\binom{N-E}{n-k}\omega^k \middle/ \sum_{k'=e_{\mathrm{min}}}^{e_{\mathrm{max}}} \binom{E}{k'}\binom{N-E}{n-k'}\omega^{k'} \right.
\mathrm{wnchypg}(k;n,K,N,\omega)&=\binom{K}{k}\binom{N-K}{n-k}\cdot\int_0^1\left(1-t^{\omega/d}\right)^k\left(1-t^{1/d}\right)^{n-k}\mathop{dt}\\
d&=(K-k)\omega+(N-k)-(n-k)
\end{align}
With the probability of a successful backdating attack being:
The probability of a successful backdating attack is then:
\begin{equation}
P(e=n)=\left. \binom{E}{n}\omega^n \middle/ \sum_{k'=e_{\mathrm{min}}}^{e_{\mathrm{max}}} \binom{E}{k'}\binom{N-E}{n-k'}\omega^{k'} \right.
P(k=n)=\mathrm{wnchypg}(n;n,K,N,\omega)=\binom{K}{n}\cdot\int_0^1\left(1-t^{\omega/((K-n)\omega+N-n)}\right)^n\mathop{dt}
\end{equation}
Figure~\ref{figure::backdating_probability_noncentral} graphs this probability as a function of $E$ for different values of $\omega$.
\begin{figure}
\begin{figure}[!h]
\includegraphics{figures/backdating_probability_noncentral.png}
\caption{\label{figure::backdating_probability_noncentral}
Probability of a successful backdating attack according to Fisher's noncentral hypergeometric distribution.
Probability of a successful backdating attack according to Wallenius' noncentral hypergeometric distribution.
$N=30$ witnesses keep records of the time-stamps issued by the TSA.
Of these witnesses, a number $E$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
Of these witnesses, a number $K$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
To check a time-stamp's validity, a client consults $n=8$ randomly selected witnesses.
When selecting a witness, a malicious witness is $\omega$ times more likely to be selected than an honest witness.
A malicious witness is $\omega$ times more likely to be selected than an honest witness.
The backdating attack is successful if all $n$ selected witnesses are malicious.
As expected, the probability of a successful backdating attack increases with an increasing number of colluding witnesses $E$, reaching 1 when $N=E$.
As expected, the probability of a successful backdating attack increases with an increasing number of colluding witnesses $K$, reaching 1 when $N=K$.
Increasing values of $\omega$ increase the chances of a successful backdating attack, as can be observed from the different graph lines.
For $\omega=1$, the graph matches the hypergeometric distribution of Fig.~\ref{figure::backdating_probability_hypergeometric}.
For large values of $\omega$, the graph approaches a step function with the step at $n=8$.
For large values of $\omega$, the graph approaches a step function with the step at $K=n=8$.
}
\end{figure}
Note that these equations are equivalent to the hypergeomtric distribution when $\omega=1$.
This is the optimal case, limiting the probability of a successful backdating attack as much as possible.
Figure~\ref{figure::backdating_probability_noncentral} graphs this probability as a function of $K$ for different values of $\omega$.
$\omega$ approaches infinity if the attacker can ensure that the client will only select malicious witnesses.
In this case, the probability of a successful backdating attack approaches a step function with the step at $n=E$.
Note that the noncentral hypergeometric distribution is equivalent to the regular hypergeomtric distribution when $\omega=1$.
When an attacker can ensure that the client will only select malicious witnesses, $\omega$ approaches infinity.
In this case, the probability of a successful backdating attack approaches a step function with the step at $n=K$.
\begin{equation}
\lim_{\omega\rightarrow \infty} P(e=n)=
\lim_{\omega\rightarrow \infty} \mathrm{wnchypg}(n;n,K,N,\omega)=
\begin{cases}
0 & n<E\\
1 & n\geq E
0 & n<K\\
1 & n\geq K
\end{cases}
\end{equation}
\subsubsection{Increasing availability}
\begin{figure}
In a real distributed service, we can not assume that a client can always reach any witness it desires.
Network partitions or denial of service attacks may render witnesses temporarily unavailable.
We include a new parameter $n'$ into our model to accomodate this possibility.
While the client still asks $n$ randomly selected witnesses to verify a time-stamp, it accepts the time-stamp as soon as it receives $n'$ valid responses from the witnesses, with $n'<n$.
A backdating attack is now successful when $k\geq n'$.
In the case of the hypergeometric distribution, this leaves us with the following equation.
\begin{equation}
\left. P(k\geq n')=\sum_{k=n'}^n\binom{K}{k}\binom{N-K}{n-k} \middle/ \binom{N}{n}\right.
\end{equation}
\begin{figure}[!h]
\includegraphics{figures/backdating_probability_hypergeometric_available.png}
\caption{\label{figure::backdating_probability_hypergeometric_available}
Probability of a successful backdating attack according to the hypergeometric distribution when allowing witness unavailability.
......@@ -155,38 +181,27 @@ In this case, the probability of a successful backdating attack approaches a ste
}
\end{figure}
In a real distributed service, we can not assume that a client can always reach any witness it desires.
Network partitions or denial of service attacks may render witnesses temporarily unavailable.
We include a new parameter $n'$ into our model to accomodate this possibility.
While the client still asks $n$ randomly selected witnesses to verify a time-stamp, it accepts the time-stamp as soon as it receives $n'$ valid responses from the witnesses, with $n'<n$.
A backdating attack is now successful when $e\geq n'$.
In the case of the hypergeometric distribution, this leaves us with the following equation.
\begin{equation}
\left. P(e\geq n')=\sum_{k=n'}^n\binom{E}{k}\binom{N-E}{n-k} \middle/ \binom{N}{n}\right.
\end{equation}
Figure~\ref{figure::backdating_probability_hypergeometric_available} graphs this probability as a function of $E$ for different values of $n'$.
The probability of a successful backdating attack according to Fisher's distribution is then:
Figure~\ref{figure::backdating_probability_hypergeometric_available} graphs this probability as a function of $K$ for different values of $n'$.
\begin{equation}
P(e\geq n')=\sum_{k=n'}^n\left. \binom{E}{k}\binom{N-E}{n-k}\omega^k \middle/ \sum_{k'=e_{\mathrm{min}}}^{e_{\mathrm{max}}} \binom{E}{k'}\binom{N-E}{n-k'}\omega^{k'} \right.
\end{equation}
The probability of a successful backdating attack according to Wallenius' distribution is then:
Figure~\ref{figure::backdating_probability_noncentral_available} graphs this probability as a function of $E$ for different values of $n'$.
\begin{align}
P(k\geq n')&=\sum_{k=n'}^n\binom{K}{k}\binom{N-K}{n-k}\cdot\int_0^1\left(1-t^{\omega/d(k)}\right)^k\left(1-t^{1/d(k)}\right)^{n-k}\mathop{dt}\\
d(k)&=(K-k)\omega+(N-k)-(n-k)
\end{align}
\begin{figure}
\begin{figure}[!h]
\includegraphics{figures/backdating_probability_noncentral_available.png}
\caption{\label{figure::backdating_probability_noncentral_available}
Probability of a successful backdating attack according to Fisher's noncentral hypergeometric distribution when allowing witness unavailability.
Probability of a successful backdating attack according to Wallenius' noncentral hypergeometric distribution when allowing witness unavailability.
$N=30$ witnesses keep records of the time-stamps issued by the TSA.
Of these witnesses, a number $E$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
Of these witnesses, a number $K$ (plotted on the x-axis) maliciously collude with the TSA in order to backdate time-stamps.
To check a time-stamp's validity, a client consults $n=8$ randomly selected witnesses.
It accepts the time-stamp if it receives valid responses from $n'$ witnesses.
When selecting a witness, a malicious witness is $\omega=10$ times more likely to be selected than an honest witness.
A malicious witness is $\omega=10$ times more likely to be selected than an honest witness.
The backdating attack is successful if at least $n'$ of the selected witnesses are malicious.
Decreasing values of $n'$ increase the chances of a successful backdating attack, as can be observed from the different graph lines.
}
\end{figure}
Figure~\ref{figure::backdating_probability_noncentral_available} graphs this probability as a function of $K$ for different values of $n'$.
figures/backdating_probability_hypergeometric.png

117 KiB | W: | H:

figures/backdating_probability_hypergeometric.png

117 KiB | W: | H:

figures/backdating_probability_hypergeometric.png
figures/backdating_probability_hypergeometric.png
figures/backdating_probability_hypergeometric.png
figures/backdating_probability_hypergeometric.png
  • 2-up
  • Swipe
  • Onion skin
figures/backdating_probability_hypergeometric_available.png

129 KiB | W: | H:

figures/backdating_probability_hypergeometric_available.png

114 KiB | W: | H:

figures/backdating_probability_hypergeometric_available.png
figures/backdating_probability_hypergeometric_available.png
figures/backdating_probability_hypergeometric_available.png
figures/backdating_probability_hypergeometric_available.png
  • 2-up
  • Swipe
  • Onion skin
figures/backdating_probability_noncentral.png

107 KiB | W: | H:

figures/backdating_probability_noncentral.png

106 KiB | W: | H:

figures/backdating_probability_noncentral.png
figures/backdating_probability_noncentral.png
figures/backdating_probability_noncentral.png
figures/backdating_probability_noncentral.png
  • 2-up
  • Swipe
  • Onion skin
figures/backdating_probability_noncentral_available.png

122 KiB | W: | H:

figures/backdating_probability_noncentral_available.png

110 KiB | W: | H:

figures/backdating_probability_noncentral_available.png
figures/backdating_probability_noncentral_available.png
figures/backdating_probability_noncentral_available.png
figures/backdating_probability_noncentral_available.png
  • 2-up
  • Swipe
  • Onion skin
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import hypergeom, nchypergeom_fisher
from scipy.stats import hypergeom, nchypergeom_wallenius
WIDTH = 6.225 # \textwidth in inches
# hypergeom.pmf(k, N, E, n)
# nchypergeom_fisher.pmf(k, N, E, n, w)
# hypergeom.pmf(k, N, K, n)
# nchypergeom_wallenius.pmf(k, N, K, n, w)
plt.rcParams.update({'figure.dpi': 300})
plt.rcParams.update({'font.size': 10})
......@@ -14,53 +14,61 @@ plt.rcParams.update({'text.usetex': True})
plt.rcParams.update({'text.latex.preamble': '\\usepackage{mathpazo}'})
N = 30
E = np.arange(N+1)
K = np.arange(N+1)
plt.figure(figsize=(WIDTH, WIDTH/2))
plt.plot(E, hypergeom.pmf(1, N, E, 1), "o-", mec="w", label="$n=1$")
plt.plot(E, hypergeom.pmf(2, N, E, 2), "s-", mec="w", label="$n=2$")
plt.plot(E, hypergeom.pmf(4, N, E, 4), "D-", mec="w", label="$n=4$")
plt.plot(E, hypergeom.pmf(8, N, E, 8), "p-", mec="w", label="$n=8$")
plt.xlabel("\\# of colluding witnesses $E$")
plt.ylabel("Probability $P(e=n)$")
fmts = ["p-", "D-", "s-", "o-"]
plot_hypergeom = lambda n: plt.plot(K, hypergeom.pmf(n, N, K, n), fmts.pop(), mec="w", label="$n=%d$" % n)
plot_hypergeom(1)
plot_hypergeom(2)
plot_hypergeom(4)
plot_hypergeom(8)
plt.xlabel("\\# of colluding witnesses $K$")
plt.ylabel("Probability $P(k=n)$")
plt.legend()
plt.title("Backdating probability (Hypergeometric distribution, $N=30$)")
plt.title("Backdating probability (Hypergeometric distribution, $N=%d$)" % N)
plt.tight_layout()
plt.savefig("backdating_probability_hypergeometric.png")
n = 8
plt.figure(figsize=(WIDTH, WIDTH/2))
plt.plot(E, nchypergeom_fisher.pmf(n, N, E, n, 1), "o-", mec="w", label="$\\omega=1$")
plt.plot(E, nchypergeom_fisher.pmf(n, N, E, n, 10), "s-", mec="w", label="$\\omega=10$")
plt.plot(E, nchypergeom_fisher.pmf(n, N, E, n, 100), "D-", mec="w", label="$\\omega=100$")
plt.plot(E, nchypergeom_fisher.pmf(n, N, E, n, 1000), "p-", mec="w", label="$\\omega=1000$")
plt.xlabel("\\# of colluding witnesses $E$")
plt.ylabel("Probability $P(e=n)$")
fmts = ["p-", "D-", "s-", "o-"]
plot_wallenius = lambda omega: plt.plot(K, nchypergeom_wallenius.pmf(n, N, K, n, omega), fmts.pop(), mec="w", label="$\\omega=%d$" % omega)
plot_wallenius(1)
plot_wallenius(10)
plot_wallenius(100)
plot_wallenius(1000)
plt.xlabel("\\# of colluding witnesses $K$")
plt.ylabel("Probability $P(k=n)$")
plt.legend()
plt.title("Backdating probability (Fisher's distribution $N=30, n=8$)")
plt.title("Backdating probability (Wallenius' distribution $N=%d, n=%d$)" % (N, n))
plt.tight_layout()
plt.savefig("backdating_probability_noncentral.png")
plt.figure(figsize=(WIDTH, WIDTH/2))
plt.plot(E, hypergeom.cdf(n, N, E, n) - hypergeom.cdf(7, N, E, n), "o-", mec="w", label="$n'=8$")
plt.plot(E, hypergeom.cdf(n, N, E, n) - hypergeom.cdf(3, N, E, n), "o-", mec="w", label="$n'=4$")
plt.plot(E, hypergeom.cdf(n, N, E, n) - hypergeom.cdf(1, N, E, n), "o-", mec="w", label="$n'=2$")
plt.plot(E, hypergeom.cdf(n, N, E, n) - hypergeom.cdf(0, N, E, n), "o-", mec="w", label="$n'=1$")
plt.xlabel("\\# of colluding witnesses $E$")
plt.ylabel("Probability $P(e\geq n')$")
fmts = ["p-", "D-", "s-", "o-"]
plot_hypergeom_avail = lambda nn: plt.plot(K, hypergeom.cdf(n, N, K, n) - hypergeom.cdf(nn - 1, N, K, n), fmts.pop(), mec="w", label="$n'=%d$" % nn)
plot_hypergeom_avail(8)
plot_hypergeom_avail(4)
plot_hypergeom_avail(2)
plot_hypergeom_avail(1)
plt.xlabel("\\# of colluding witnesses $K$")
plt.ylabel("Probability $P(k\geq n')$")
plt.legend()
plt.title("Backdating vs. availability (Hypergeometric distribution, $N=30, n=8$)")
plt.title("Backdating vs. availability (Hypergeometric distribution, $N=%d, n=%d$)" % (N, n))
plt.tight_layout()
plt.savefig("backdating_probability_hypergeometric_available.png")
w = 10
omega = 10
plt.figure(figsize=(WIDTH, WIDTH/2))
plt.plot(E, nchypergeom_fisher.cdf(n, N, E, n, w) - nchypergeom_fisher.cdf(7, N, E, n, w), "o-", mec="w", label="$n'=8$")
plt.plot(E, nchypergeom_fisher.cdf(n, N, E, n, w) - nchypergeom_fisher.cdf(5, N, E, n, w), "o-", mec="w", label="$n'=6$")
plt.plot(E, nchypergeom_fisher.cdf(n, N, E, n, w) - nchypergeom_fisher.cdf(3, N, E, n, w), "o-", mec="w", label="$n'=4$")
plt.plot(E, nchypergeom_fisher.cdf(n, N, E, n, w) - nchypergeom_fisher.cdf(0, N, E, n, w), "o-", mec="w", label="$n'=1$")
plt.xlabel("\\# of colluding witnesses $E$")
plt.ylabel("Probability $P(e\geq n')$")
fmts = ["p-", "D-", "s-", "o-"]
plot_wallenius_avail = lambda nn: plt.plot(K, nchypergeom_wallenius.cdf(n, N, K, n, omega) - nchypergeom_wallenius.cdf(nn - 1, N, K, n, omega), fmts.pop(), mec="w", label="$n'=%d$" % nn)
plot_wallenius_avail(8)
plot_wallenius_avail(6)
plot_wallenius_avail(4)
plot_wallenius_avail(1)
plt.xlabel("\\# of colluding witnesses $K$")
plt.ylabel("Probability $P(k\geq n')$")
plt.legend()
plt.title("Backdating vs. availability (Fisher's distribution, $N=30, n=8, \\omega=10$)")
plt.title("Backdating vs. availability (Wallenius' distribution, $N=%d, n=%d, \\omega=%d$)" % (N, n, omega))
plt.tight_layout()
plt.savefig("backdating_probability_noncentral_available.png")
......@@ -87,6 +87,7 @@
%\bibliographystyle{abbrv}
%\bibliography{bibliography/bibliography.bib}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%_APPENDIX_%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\section*{Appendix} \label{Appendix}
\addcontentsline{toc}{section}{Appendix} % adds entry to table of contents
\selbstaendigkeitserklaerung{\today}
......
No preview for this file type
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment