Stefan Lang

Expert on machine learning, artificial intelligence, systems theory and bioinformatics (M.Sc.)

Looking for an experienced AI expert or bioinformatician? Then you've come to the right place!

I have already supported customers from various industries in the implementation of their projects. This has allowed me to gain experience in many areas of research and development.

Whether software development, training of artificial intelligence, data analysis, mathematical modeling or laboratory work, there is hardly an area of machine learning and bioinformatics in which I have not yet been active.

I have in-depth knowledge of the entire software life cycle, from the development of a prototype to validation of the methods and transition to production.

I would be happy to contribute my ideas and many years of experience in the development of intelligent software to your next project.

Projectlist

Download PDF

Results: Tool for identifying and coloring individual objects of different classes specified by the customer in images

Methods:

Data-pipeline: API connection to the customer's annotation tool for the creation of training data
Model: neural network for multi-class, multi-instance segmentation of images
Deployment: microservices for training & inference

Results: Automated recognition of music tracks in live recordings. The developed method can identify instrumental / vocal versions, variations in vocals or instrumentals (up to changed instruments or vocals in a different language), and excerpts of music pieces in a database of audio recordings

Methods:

Music detection: classification of music or extraction of tracks from mixed recordings (e.g., television broadcasts, live concerts, albums, ...)
Music decomposition: decomposition of musical pieces into vocal and instrumental channels
Matching: creation of an electronic fingerprint of the decomposed audio channels and local similarity analysis of the fingerprints to identify the music pieces in a database

Results: development of a tool for recognizing and linking custom term classes from continuous text.

Methods:

Models: Named-Entity-Recognition (NER) model using transformer embeddings to annotate the terms, Relation-Tagging model to link the terms (libraries: PyTorch)
Annotation pipeline: import / export functions to manually tag examples of the term classes and relations to be learned using a graphical annotation tool (INCEpTION)
Trainer: module to adapt the AI models to the manually annotated data, i.e. to learn the customized term classes and relations

Results: Construction of a library of input and output adapters for the generic perceiverIO architecture. Implemented modalities (data types): Text, audio, images, videos, time series

Methods:

Input adapters: modality-specific restructuring of input data as a 2-dimensional array and concatenation of modalities as input to perceiverIO
Output adapter: development of queries (query arrays) for reconstruction (autocoding), classification, and prediction of the input data
Model: methods for data preparation, configuration of models (depending on input data and task), training of models and use of models

Results: Temporal as well as spatial prediction of epidemiological parameters (new infections, R-value) by linking and interpreting different data sources (infection numbers, socio-demographic data, mobility, ...)

Methods:

Building the data infrastructure: merging & processing the different data sources in a graph database (ArangoDB). Pipeline for updating the data
Data analysis: frequency analysis & filtering (smoothing). Determination of temporal dependencies (cross-correlation) between time series (within locations and between locations). Determination of the effect of measures taken on the time series
Modeling: Neural network for multivariate time series analysis (Temporal Fusion Transformer) taking into account static covariates (place, number of inhabitants, ...). Determination of partial dependencies of the static and dynamic covariates on the target variable
Deployment: Docker container with databases, models and API

Results: Automated transcription of audio files in German language. Monitoring of transcription quality and training of new / unrecognized words. Classification / interpretation of the transcribed texts

Methods:

Speech-to-text (STT): KaldiASR model trained on German language dataset. Determination of word recognition probabilities for quality estimation
Trainer: module to teach new words to the STT model. Testing recognition rate for given keywords. Phonemization of poorly recognized words using a separate grapheme-to-phoneme model (g2p). Scraping sample texts to calculate word transition probabilities. Incorporation of the new words and phonemes into the grammar and phoneme classes of the model and retraining of the model. Synthesizing some example texts through a separate text-to-speech model (CoquiTTS) and retranslating them into text as validation
Natural language processing: document indexing of the transcribed texts, semantic search of keywords and classification of the texts based on given classes

Results: Identification of product groups with similar sales patterns. Analysis of trends and seasonality of sales. Estimation of future material requirements for several months

Methods:

Data preparation: connection to sales database. Data set with product sales as time series and metadata about the products (single parts, colors, size, ...)
Data analysis: finding correlations in sales behavior, grouping of products. Frequency analysis and seasonal decomposition of time series
Demand forecasting: predicting sales of products or product groups, taking into account product characteristics, current trends and seasonal sales patterns (Prophet and NBeats ensemble). Estimation of future material demand

Results: Estimation of the impact of errors/delays in processes at specific locations on the remaining transportation network

Methods:

Data preparation: structuring data into locations, movements between locations, and processes at locations. Calculation of temporal static and dynamic properties of locations (capacities, load factors, ...)
Data analysis: analysis of movements and disturbances in the network, estimation of the effect of disturbances on subsequent stations (identification of error-chains)
Simulation: simulation of the effect of changed transport routes / times or changed processes / parameters at the locations on the overall network

Results: Pathogens must camouflage themselves in the body to avoid being recognized as foreign and beeing removed. The camouflage cannot be perfect. The immune system must weigh at what "threshold" of self-similarity it might attack camouflaged pathogens (a low threshold means little autoimmunity, but poorer defense against camouflaged pathogens, a high threshold means good defense but possible autoimmunity). Identification of target proteins for drug intervention of autoimmunity

Methods:

Literature review: (innate) immune system, complement system, social systems theory, mimicry/crypsis, mathematical / game theoretical models of mimicry, mathematical / metabolic models of complement system
Modeling: transfer of behavioral models describing mimicry and crypsis in animals to the microbiological level (molecular crypsis). Linking crypsis models to models of the innate immune response (specifically complement system). Modeling of the trade-off between autoimmunity and defense against camouflaged pathogens
Publication: Publication of relevant results in scientific journals

Results: Implementation of a protein microarray and fluorescence filter in a smartphone attachment for on-site detection of specific (e.g., unwanted) proteins in samples (e.g., growth hormones in milk). Efficient analysis directly on the smartphone using computer vision methods

Methods:

Data preparation: standardize, interpolate, and rectify (orthogonalize) input images (colored spots of the microarray taken with smartphones, i.e., highly varying qualities)
Image recognition: localize spots and mark edges. Identify spots of positive / negative controls. Determine the color intensities of the other spots and calibrate them against the controls to calculate the concentration of the target protein in the sample

Results: Characterization of forms of cooperation in biofilms. In particular, modeling of intra-species and inter-species crossfeeding interactions. Investigation of the evolutionary stability of cooperation with respect to parasitism

Methods:

Literature review: social systems theory, evolutionary game theory, forms of cooperation and communication in microorganisms, crossfeeding
Modeling: agent-based model to simulate crossfeeding interactions between unicellular fungi. Modeling the effect of communication via molecules released into the environment or direct connection of individuals by nanotubes
Publication: Publication of relevant results in scientific journals

Results: Design and implementation of algorithms for separation of cell aggregates (segmentation), tracking of single cells and extraction of cell typical parameters. Later: further development to analyze data from confocal laser scanning microscopy (5-dimensional)

Methods:

Data preparation: deconvolution of images with microscope-specific kernel (remove specific light scattering patterns), interpolation, standardization
Segmentation: separate foreground (focused cells) from background (noise, macromolecules, non-focused cells, ...)
Image recognition: recognize single cells and cell clusters. Separate cell clusters. Reconstruct shape of single cells
Extract features: Recognize specific features of cell types and characterize given properties (size, movement pattern, speed, ...)

Skills

Natural Language Processing (NLP)
- Large Language Models (LLM)
- Generative Language Models
- Text Classification
- Named Entity Recognition
- Relation Tagging
Computer Vision
- Image Segmentation
- Image Classification
- Object Detection
- Object Tracking
Audio Processing
- Audio-Information-Retrieval
- Automatic Speech Recognition (ASR)
- Text-To-Speech synthesis (TTS)
Timeseries analysis and forecasting
Regression / classification

Visualization
Pattern recognition
Anomaly detection
Analysis of dependencies
Timeseries analysis
Audio / speech analysis

Analysis of biological data (-omics, mass spectrometry, ...)
Modeling of biological systems (chemical reaction networks, multi-scale ecological models, individual-based modeling, evolutionary game theory)
Processing of microscopic images (segmentation, object detection and tracking, classification, anomaly detection)

Python
Java
R
C / C++
Matlab/li>
LaTex

Development of microservices
Continuous Integration / Continuous Development (CI/CD)
Docker
Kafka

PyTorch / PyTorch - Lightning / PyTorch - Forecasting
TensorFlow / Keras
Kaldi-ASR
CoquiTTS
Deeplearning4j
ImageJ
OpenCV
Scipy
Numpy
Pandas

PyCharm
Jupyter Notebook / Lab
IntelliJ
RStudio
Matlab
Eclipse

GIT
SVN
Github
Gitlab
Bitbucket

OpenProject
Jira
Confluence

ArangoDB
SQL
MariaDB
MongoDB

Stefan Lang

Expert on machine learning, artificial intelligence, systems theory and bioinformatics (M.Sc.)

Projectlist

Skills

Contact