Hans Dembinski’s blog
  • Lectures
  • Talks
  • About
  • LinkedIn
  • Github
Categories
bootstrap
data analysis
environment
high performance computing
llm
machine learning
neural networks
parsing
physics
programming
prompt engineering
science
simulation
statistics
sWeights
symbolic computation
uncertainty analysis
visualization
web scraping

Tagging posts with a LLM

high performance computing
llm
programming
In recent posts, I explored how LLMs can be used to generate structured output from unstructured input. I decided to use this ability to generate tags for posts on this blog…
Oct 19, 2025

Parsing webpages with a Large Language Model (LLM) revisited

llm
parsing
programming
prompt engineering
web scraping
I previously wrote about parsing websites and extract structured data, but that was in January 2025 and a lot has happened in the LLM sphere since then. The pace at which…
Oct 18, 2025

From unstructured to structured: Parsing webpages with a Large Language Model (LLM)

llm
parsing
prompt engineering
web scraping
Update: This article was published in January 2025, and a lot has happened in the LLM sphere since then. I changed my mind on a lot of things, too. You can read my update here…
Jan 7, 2025

Running Large Language Models (LLMs) locally for Retrieval-Augmented-Generation (RAG) Systems with full privacy

llm
programming
web scraping
tl;dr: You can run small LLMs locally on your consumer PC and with ollama that’s very easy to set up. It is fun to chat with an LLM locally, but it gets really interesting…
Dec 6, 2024

Comparison of gof tests in counting experiments

data analysis
science
simulation
statistics
The chi-squared test is the most common way to check goodness-of-fit (gof) when of models that are compared to data. Here, one splits the data into \(m\) categories and then…
Aug 12, 2024

Factorization test

data analysis
programming
sWeights
statistics
visualization
The classic sWeights method is a powerful way to project out a component from a mixture of signal and background, but it is often not obvious whether it can be applied…
Jul 8, 2024

Inelastic cross-sections of proton-proton and pion-proton collisions

data analysis
physics
visualization
According to a simple parton model, the cross-section for an inelastic proton-proton collision should be 3/2 of a pion-proton collision. In this picture, only the valence…
Jun 4, 2024

pp cross-section extrapolation uncertainty

data analysis
physics
statistics
uncertainty analysis
visualization
Here we estimate the extrapolated uncertainty of the pp inelastic cross-section at 100 TeV before and after the LHC era.
Jun 4, 2024

Combining results with asymmetric errors

physics
programming
statistics
uncertainty analysis
When sample sizes are small and models very non-linear, the likelihood around the minimum is not well approximated by a parabola. In that case, the errors computed by the…
May 18, 2024

Template fits with distortion correction

bootstrap
programming
simulation
statistics
uncertainty analysis
visualization
I recently gave a review talk on template fits in CERN’s PHYSTAT seminar, where I presented results from our recent paper on this subject. In the summary, I discussed on of…
Apr 19, 2024

Fast deep sets with FLAX

high performance computing
machine learning
neural networks
physics
programming
In Zaheer et al., NIPS 2017 Deep Sets are introduced, a new network architecture with interesting applications to particle physics. One of the first applications in particle…
Jan 11, 2024

All decays into two Kaons

physics
programming
visualization
In a recent paper review, the question came up how many resonances there are which decay into two kaons. It is not easy to look this up somewhere. My solution was to extract…
Oct 15, 2023

Look-elsewhere effect

bootstrap
data analysis
physics
programming
simulation
statistics
uncertainty analysis
visualization
When searching for an excess (peak) in some phase-space which has an unknown location, one has to correct the local significance for the fact that one would have reported…
Oct 15, 2023

Non-prompt fraction of KS and Lambda particles from hyperon decays

data analysis
physics
programming
visualization
A particle is prompt, if it does not have particles in its decay history with life-times larger than 30 ps. According to this definition, some Lambda particles are not…
Aug 31, 2023

Regression example with a neural network

machine learning
neural networks
programming
We fit a high-dimensional function (its a form of efficiency) with neural networks. The model is a simple feed foward network with ReLU activation. We use different…
Jul 19, 2023

Statistical issues in naive calibration analyses

data analysis
physics
programming
simulation
statistics
The statistical problem of calibrating a sample with observable pairs \(\hat a\) and \(\hat b\) is challenging if random stochastic fluctuations affect both numbers of the…
Jun 16, 2023
 

Transform LaTeX to Unicode

parsing
programming
It is possible to transform a subset of LaTeX to Unicode, as demonstrated by unicodeit website. Unfortunately, unicodeit only works on short LaTeX strings.
Apr 10, 2023
 

Render LaTeX to SVG with matplotlib backend

programming
visualization
We use matplotlib’s mathtext system and SVG backend to generate an SVG rendering of LaTeX code.
Apr 4, 2023
 

Visual cross-section for particle production

physics
programming
simulation
When you look for a somewhat rare decay, you want to know how much luminosity is required to see N decays in your detector. For this one needs to compute the visual…
Mar 31, 2023

How to make a plot with legend entries that are hyperlinks

programming
visualization
This is a demo on what the title says, also see this matplotlib issue. The trick works with SVG and PDF. Try to click on “BBC” in the legend.
Mar 29, 2023
 

Tracking efficiency for two-body decay

physics
programming
statistics
uncertainty analysis
\[ \epsilon'_A = \int \text{d}x_B\! \int \text{d}x_C\, \epsilon'_B(x_B) \, \epsilon'_C(x_C) \; f(x_B, x_C) \]
Nov 15, 2022

Particle correlations in proton-proton collisions

data analysis
high performance computing
physics
programming
simulation
visualization
Here, I run simulations of proton-proton collisions with Chromo to investigate two-particle correlations in pseudorapidity space. With Chromo, it is very easy to study the…
Oct 7, 2022

How to use the logaddexp special function in fits

high performance computing
programming
statistics
The logaddexp special function was made to perform this calculation more accurately: \[ y = \ln(\exp(a) + \exp(b)). \] This calculation occurs when we fit a mixed PDF with…
Sep 10, 2022

Fitting mixtures with COWs

programming
sWeights
statistics
visualization
This is a demo of how to use custom orthogonal weight functions (COWs) to generate weights which extract one component of a mixture. Find out more in our paper.
Sep 10, 2022

Which data structure is faster: array of structs or struct of arrays?

high performance computing
programming
In the high-performance computing community, it is well known that certain algorithms run faster if the data are structured in such a way that the CPU can read them…
Jul 9, 2022
 

Unbiasedness of EML fit for a mixture model with fixed component pdfs

science
simulation
statistics
symbolic computation
The log-likelihood for one observed Poisson-distributed count is (without constants) \[ \ell_i(\lambda) := \ln \mathcal{L}_i(\lambda) = -\lambda + k_i \ln \lambda. \]
Apr 25, 2022

Numerically stable calculation of invariant mass

high performance computing
physics
programming
The art of writing numerical stable formulas is to find a mathematically equivalent computation for a formula that produces accurate results even though computation with…
Jan 20, 2022
 

Chance of seeing a one-sigma deviation in random splits

data analysis
programming
simulation
statistics
We randomly split a sample into two parts and check whether their respective arithmetic means deviate by more than one standard deviation from each other. The chance for…
Dec 5, 2021

Simple parallelization in Jupyter Notebooks

high performance computing
programming
The usual modules concurrent.futures and multiprocessing do not work correctly in notebooks on all platforms (notably on OSX there are issues). What does work is joblib…
Aug 13, 2021

Uncertainty of efficiency from fitted decay yields

data analysis
simulation
statistics
symbolic computation
uncertainty analysis
visualization
We derive an approximate formula to compute the uncertainty of the efficiency computed from fitted yields of some decays. The formula is suitable to draw error bars in plots.
Aug 8, 2021

Ratio bias

data analysis
statistics
visualization
When you compute a ratio of two random numbers, the ratio is biased in general, unless the relative error on the denominator is very small. This is a general consequence of…
May 11, 2021

The Wilson score interval and weighted histograms

bootstrap
data analysis
simulation
statistics
The Wilson Score interval (WSI) has won the LHCb internal challenge for the most suitable interval for efficiencies. The question is what to do when events are weighted was…
May 7, 2021

p-value computation and conversion: one-sided, two-sided?

data analysis
science
statistics
visualization
This is an attempt to clear up the confusion around p-value computation and based on the authorative source on that matter:
May 4, 2021

Benchmark of building an array with numba

high performance computing
programming
Here I try out different ways of making arrays of tuples with Numba where the number of tuples that need to be produced is not known in advance. The fastest way is to make a…
Apr 2, 2021

Coverage of HESSE and MINOS intervals

data analysis
simulation
statistics
uncertainty analysis
visualization
The MINUIT package provides two ways to compute uncertainty intervals for a fitted parameter, the HESSE method and the MINOS method. Ideally, these intervals should have 68…
Mar 29, 2021

MCMC demo

programming
simulation
statistics
visualization
This is a tiny demo of the MCMC algorithm from the emcee library, which computes the posterior for some parameters based on the likelihood function and priors on the…
Mar 29, 2021
 

From a fixed-target to center-of-mass frame and back

physics
symbolic computation
I compute with Sympy how to transform a 4-vector from a fixed-target frame (aka laboratory frame) to the center-of-mass frame, where the colliding particles have equal but…
Feb 27, 2021
 

RooFit with Chebyshev and Bernstein polynomials

physics
programming
simulation
statistics
This is a little demonstration of how to fit a peak plus smooth background with either an empirical Chebyshev polynomial for the background or a Bernstein polynomial. The…
Jan 25, 2021

Comparison of asymptotically chisquare-distributed test statistics

simulation
statistics
symbolic computation
visualization
For GoF tests, we often use a test statistic that is asymptotically \(\chi^2\) distributed.
Jan 9, 2021

Approximate PDF for the invariant mass distribution of combinatorial background

physics
simulation
statistics
symbolic computation
We derive an approximate pdf for the invariant mass distribution from combinatorial background. Combinatorial background refers to random pairs of unrelated particles that…
Jan 7, 2021

Solving the duffing oscillator

physics
programming
simulation
visualization
Here I solve the differential equations for the duffing oscillator with scipy and animate the solution with matplotlib. The oscillator exhibits chaotic behavior.
Jan 5, 2021
 

Error propagation for a ratio

simulation
statistics
uncertainty analysis
Error propagation also works for systematic uncertainties. I discuss along a simple example.
Dec 12, 2020
 

Error propagation with Sympy

programming
statistics
symbolic computation
uncertainty analysis
Sympy is a Python module for symbolic computation (like Mathematica and Matlab) with an elegant Python design.
Sep 17, 2020
 

New Fit Result Display

programming
The next release of iminuit will feature a new Fit Result display.
Jul 29, 2020

Numba-accelerated Jackknife

high performance computing
programming
statistics
visualization
My resample library implements the jackknife and bootstrap resampling techniques. The implementations use pure-Python. Since resampling techniques are computationally…
May 21, 2020
 

Applying the SPD approximation to negative weights

bootstrap
programming
simulation
statistics
uncertainty analysis
In Bohm and Zech, NIMA 748 (2014) 1-6, the scaled Poisson distribution is discussed, which is an approximate distribution for sums of weights. This distribution is used in…
Apr 22, 2020
 

Interactive plotting in Jupyter with matplotlib

programming
visualization
This is a little demo on how to make interactive plots with matplotlib and ipympl.
Apr 16, 2020
 

Power consumption in sleep mode

data analysis
environment
Some people make a big fuss about machines consuming power in standby. While it is very important to conserve energy, it makes no sense to turn off all devices but then run…
Apr 16, 2020

Fitting weighted histograms with SPD method

data analysis
physics
programming
science
statistics
visualization
We test different fit methods of weighted binned data. The tested weight distributions are normal, exponential, and uniform. The toy case is a common HEP fit of a gaussian…
Apr 15, 2020

Leave-one-out cross-validation

data analysis
statistics
Leave-one-out cross-validation is a simple generic tool for selecting the best empirical model. When we model data empirically, for example, with a polynomial, we want to…
Apr 7, 2020
 

C++ exceptions in high-performance code

high performance computing
programming
Here is a collection of advice on using exceptions in high-performance libraries. There has been a lot of discussion in the Boost community about exceptions lately, since som…
Apr 5, 2020
No matching items