Do the twin primes occur approximately exponentially often with respect to their position in the twin prime sequence?

image

I plotted the logarithm of the first $n$ twin primes and noticed that they form an approximately logarithmic curve.

Here is the plot up to 1000 (full scale):



and here is a plot up to 200,000 (full scale):



The red curves are logarithmic curves calculated using the least squares method, and it seems to fit extremely well. However, I don't have the time or computational resources to try and investigate what the coefficients of the approximating logarithmic curve are approaching. I will give the values for three of the curves however:

$n = 1000: f(x) = 0.6815857245894931 + 1.4145564491070595\ln(x)$.

$n = 75,000: f(x) = 2.0738728912304074+ 1.2071826228826743\ln(x)$

$n = 200,000: f(x) = 2.304380281352694 + 1.1832161536652268\ln(x)$

It's hard to guess what this is converging to without any further calculations, but I am genuinely interested in what it is converging to. Anyway, I don't have any more time to spend on something like this, so I hope others might find it interesting.

Let me just add a bit to what's already been said. As JoshuaZ points out, the standard conjecture is that the number of twin primes up to $x$ is approximately of the form $C x / (\log x)^2$ for a certain explicit constant $C$. As GH from MO points out, this conjecture fits the empirical data very well. One consequence of this is that any patterns observed in empirical data should be consistent with the conejcture.

If the $n$'th twin prime is $x$ then $n$ is the number of twin primes up to $x$ so we should (and empirically do) have $n \approx C x/ (\log x)^2$ or more precisely $n = C x /(\log x)^2 + O( x^{1-\delta})$ for some $\delta>0$. Solving for $x$, one gets $x \approx C^{-1} n \log n^2$ or more precisely $x = (1 + O(\log \log n/\log n ))C^{-1} n \log n^2$ which gives $$\log x = \log n + 2 \log \log n - \log C + O(\log \log n/\log n).$$

If you graph a function which is approximated by $\log n + 2 \log \log n - \log C $, it is clear why the graph should look like it has the form $a \log n + b$ for $a$ slightly larger than $1$: The function $\log t$ is very smooth and therefore can be roughly approximated by a linear function in $t$ with positive slop, and so $\log \log n$ can be roughly approximated by a linear function in $\log n$ with positive slope. The empirical slope will decay to $1$ as you get more data points.

For a graph like this, there are many different functional forms that can fit it. As you note, a power of $\log n$ less than one could also fit the graph. This is why for problems of this type, it is ideal to first develop theoretical heuristics for what the graph should look like and then see if the predicted curve fits the data, rather than guess purely based on data.

The number of twin primes up to $10^{18}$ is $808675888577436$, and this is within $0.000000016$ percentage error from the approximation $\mathfrak{S}\,\mathrm{Li}_2(x)$ conjectured by Hardy and Littlewood. For more information on this topic, see here and here (published version here) and here.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70