Hans Dembinski’s blog

From unstructured to structured: Parsing webpages with a Large Language Model (LLM)

In a recent article, I showed how to set up a simple RAG system based on a locally run Large Language Model. I already praised the ollama library there, which makes it very…

Running Large Language Models (LLMs) locally for Retrieval-Augmented-Generation (RAG) Systems with full privacy

tl;dr: You can run small LLMs locally on your consumer PC and with ollama that’s very easy to set up. It is fun to chat with an LLM locally, but it gets really interesting…

Comparison of gof tests in counting experiments

The chi-squared test is the most common way to check goodness-of-fit (gof) when of models that are compared to data. Here, one splits the data into \(m\) categories and then…

Factorization test

The classic sWeights method is a powerful way to project out a component from a mixture of signal and background, but it is often not obvious whether it can be applied…

pp cross-section extrapolation uncertainty

Here we estimate the extrapolated uncertainty of the pp inelastic cross-section at 100 TeV before and after the LHC era.

Inelastic cross-sections of proton-proton and pion-proton collisions

According to a simple parton model, the cross-section for an inelastic proton-proton collision should be 3/2 of a pion-proton collision. In this picture, only the valence…

Combining results with asymmetric errors

When sample sizes are small and models very non-linear, the likelihood around the minimum is not well approximated by a parabola. In that case, the errors computed by the…

Template fits with distortion correction

I recently gave a review talk on template fits in CERN’s PHYSTAT seminar, where I presented results from our recent paper on this subject. In the summary, I discussed on of…

Fast deep sets with FLAX

In Zaheer et al., NIPS 2017 Deep Sets are introduced, a new network architecture with interesting applications to particle physics. One of the first applications in particle…

All decays into two Kaons

In a recent paper review, the question came up how many resonances there are which decay into two kaons. It is not easy to look this up somewhere. My solution was to extract…

Look-elsewhere effect

When searching for an excess (peak) in some phase-space which has an unknown location, one has to correct the local significance for the fact that one would have reported…

Non-prompt fraction of KS and Lambda particles from hyperon decays

A particle is prompt, if it does not have particles in its decay history with life-times larger than 30 ps. According to this definition, some Lambda particles are not…

Regression example with a neural network

It turns out that the simple MLPRegressor in Scikit-Learn works very well on small datasets.

Statistical issues in naive calibration analyses

The statistical problem of calibrating a sample with observable pairs \(\hat a\) and \(\hat b\) is challenging if random stochastic fluctuations affect both numbers of the…

Transform LaTeX to Unicode

It is possible to transform a subset of LaTeX to Unicode, as demonstrated by unicodeit website. Unfortunately, unicodeit only works on short LaTeX strings.

Render LaTeX to SVG with matplotlib backend

We use matplotlib’s mathtext system and SVG backend to generate an SVG rendering of LaTeX code.

Visual cross-section for particle production

When you look for a somewhat rare decay, you want to know how much luminosity is required to see N decays in your detector. For this one needs to compute the visual…

How to make a plot with legend entries that are hyperlinks

This is a demo on what the title says, also see this matplotlib issue. The trick works with SVG and PDF. Try to click on “BBC” in the legend.

Tracking efficiency for two-body decay

\[ \epsilon'_A = \int \text{d}x_B\! \int \text{d}x_C\, \epsilon'_B(x_B) \, \epsilon'_C(x_C) \; f(x_B, x_C) \]

Particle correlations in proton-proton collisions

Here, I run simulations of proton-proton collisions with Chromo to investigate two-particle correlations in pseudorapidity space. With Chromo, it is very easy to study the…

Fitting mixtures with COWs

This is a demo of how to use custom orthogonal weight functions (COWs) to generate weights which extract one component of a mixture. Find out more in our paper.

How to use the logaddexp special function in fits

The logaddexp special function was made to perform this calculation more accurately: \[ y = \ln(\exp(a) + \exp(b)). \] This calculation occurs when we fit a mixed PDF with…

Which data structure is faster: array of structs or struct of arrays?

In the high-performance computing community, it is well known that certain algorithms run faster if the data are structured in such a way that the CPU can read them…

Unbiasedness of EML fit for a mixture model with fixed component pdfs

The log-likelihood for one observed Poisson-distributed count is (without constants) \[ \ell_i(\lambda) := \ln \mathcal{L}_i(\lambda) = -\lambda + k_i \ln \lambda. \]

Numerically stable calculation of invariant mass

The art of writing numerical stable formulas is to find a mathematically equivalent computation for a formula that produces accurate results even though computation with…

Chance of seeing a one-sigma deviation in random splits

We randomly split a sample into two parts and check whether their respective arithmetic means deviate by more than one standard deviation from each other. The chance for…

Simple parallelization in Jupyter Notebooks

The usual modules concurrent.futures and multiprocessing do not work correctly in notebooks on all platforms (notably on OSX there are issues). What does work is joblib…

Uncertainty of efficiency from fitted decay yields

We derive an approximate formula to compute the uncertainty of the efficiency computed from fitted yields of some decays. The formula is suitable to draw error bars in plots.

Ratio bias

When you compute a ratio of two random numbers, the ratio is biased in general, unless the relative error on the denominator is very small. This is a general consequence of…

The Wilson score interval and weighted histograms

The Wilson Score interval (WSI) has won the LHCb internal challenge for the most suitable interval for efficiencies. The question is what to do when events are weighted was…

p-value computation and conversion: one-sided, two-sided?

This is an attempt to clear up the confusion around p-value computation and based on the authorative source on that matter:

Benchmark of building an array with numba

Here I try out different ways of making arrays of tuples with Numba where the number of tuples that need to be produced is not known in advance. The fastest way is to make a…

MCMC demo

This is a tiny demo of the MCMC algorithm from the emcee library, which computes the posterior for some parameters based on the likelihood function and priors on the…

Coverage of HESSE and MINOS intervals

The MINUIT package provides two ways to compute uncertainty intervals for a fitted parameter, the HESSE method and the MINOS method. Ideally, these intervals should have 68…

From a fixed-target to center-of-mass frame and back

I compute with Sympy how to transform a 4-vector from a fixed-target frame (aka laboratory frame) to the center-of-mass frame, where the colliding particles have equal but…

RooFit with Chebyshev and Bernstein polynomials

This is a little demonstration of how to fit a peak plus smooth background with either an empirical Chebyshev polynomial for the background or a Bernstein polynomial. The…

Comparison of asymptotically chisquare-distributed test statistics

For GoF tests, we often use a test statistic that is asymptotically \(\chi^2\) distributed.

Approximate PDF for the invariant mass distribution of combinatorial background

We derive an approximate pdf for the invariant mass distribution from combinatorial background. Combinatorial background refers to random pairs of unrelated particles that…

Solving the duffing oscillator

Here I solve the differential equations for the duffing oscillator with scipy and animate the solution with matplotlib. The oscillator exhibits chaotic behavior.

Error propagation for a ratio

Error propagation also works for systematic uncertainties. I discuss along a simple example.

Error propagation with Sympy

Sympy is a Python module for symbolic computation (like Mathematica and Matlab) with an elegant Python design.

New Fit Result Display

The next release of iminuit will feature a new Fit Result display.

Numba-accelerated Jackknife

My resample library implements the jackknife and bootstrap resampling techniques. The implementations use pure-Python. Since resampling techniques are computationally…

Applying the SPD approximation to negative weights

In Bohm and Zech, NIMA 748 (2014) 1-6, the scaled Poisson distribution is discussed, which is an approximate distribution for sums of weights. This distribution is used in…

Interactive plotting in Jupyter with matplotlib

This is a little demo on how to make interactive plots with matplotlib and ipympl.

Power consumption in sleep mode

Some people make a big fuss about machines consuming power in standby. While it is very important to conserve energy, it makes no sense to turn off all devices but then run…

Fitting weighted histograms with SPD method

We test different fit methods of weighted binned data. The tested weight distributions are normal, exponential, and uniform. The toy case is a common HEP fit of a gaussian…

Leave-one-out cross-validation

Leave-one-out cross-validation is a simple generic tool for selecting the best empirical model. When we model data empirically, for example, with a polynomial, we want to…

C++ exceptions in high-performance code

Here is a collection of advice on using exceptions in high-performance libraries. There has been a lot of discussion in the Boost community about exceptions lately, since som…