ANDREY SYDELOV

Cyprus | Ukraine | EMEA

Senior Data Engineer | AI Solutions Engineer

As a Senior Data Engineer and AI Engineer with over 8 years of experience, I design and build scalable, cloud-native data platforms on AWS, GCP, and Azure, leveraging tools like S3, Redshift, BigQuery, Apache Airflow, Spark, and Kafka. I specialize in creating robust ETL/ELT pipelines, real-time analytics, and data warehouses to support AI-driven solutions.

I manage large-scale datasets across databases like PostgreSQL, ClickHouse, and Vertica, optimizing for performance in high-throughput querying and data retention. My work powers AI applications in fraud detection, churn prediction, and product analytics by delivering clean, model-ready data.

As an AI Engineer, I architect Retrieval-Augmented Generation (RAG) pipelines and integrate vector databases like Pinecone, Weaviate, and Milvus to enable efficient semantic search and contextual AI solutions. I design scalable data workflows that support embeddings generation, vector indexing, and real-time query processing for intelligent applications.

< Domains of Practice >

Infrastructure

Designed cloud infrastructure on AWS and Google Cloud, optimizing for cost and performance.
Configured access policies for secure infrastructure operations.
Tuned PostgreSQL, Vertica, Redshift, Kafka, and Redis for high-performance workloads.
Led AWS-to-Google Cloud migration with minimal downtime.
Built monitoring systems with Prometheus, Grafana, and CloudWatch.

Data Engineering

Built data warehouses on Vertica and LakeHouse systems with S3 and Redshift.
Integrated 50+ data sources via APIs for unified data flows.
Orchestrated ETL/ELT pipelines with Apache Airflow for low-latency processing.
Implemented Spark for streaming and batch data workflows.
Designed event-driven systems with AWS Lambda, ensuring reliable data delivery.

AI Engineering

Integrated Weaviate and Pinecone vector databases for efficient semantic search.
Optimized GPU workloads in Kubernetes with Kubeflow for ML training.
Managed 50+ model iterations with MLflow for streamlined experimentation.
Built RAG evaluation frameworks with Haystack to enhance response accuracy.
Monitored 5+ ML models with Arize for performance and reliability.

Data Governance

Built Trusted Data Source platforms with validation and audit trails.
Managed cross-functional teams for secure data exchange across departments.
Formalized GDPR-compliant access policies with security and legal teams.
Provided consultations to enhance data literacy for business teams.
Implemented OpenMetadata for metadata catalogs, enabling observability and lineage tracking.

The Blueprint of a Data Team: Roles, Responsibilities, and Specializations

Data Modeling: From Basics to Advanced Techniques for Business Impact

Apache Spark Deep Dive: Architecture, Internals, and Performance Optimization

Kubernetes Foundations — Architecture and Core Components

Anomaly Detection in Financial Transactions: Algorithms and Applications

Applied Expertise Examples

Multi-Channel Traffic Attribution: The challenge stemmed from fragmented analytics across platforms like web (GA360) and mobile (Appsflyer Datalocker), where diverse marketing channels—popups, emails, push notifications, social media, Google Ads, YouTube, promos, targeted campaigns, and retargeting campaigns—operated in silos. This led to intense competition for the same users, resulting in inefficient budget use, redundant user interactions, and diminished conversion rates.

A unified attribution system was engineered to bridge these gaps, delivering end-to-end visibility by integrating data from all touchpoints. An optimal mathematical model was designed, leveraging Bayesian statistics, time-series analysis, and individualized user windows, tailored to millions of users, to precisely quantify each channel's contribution and cost. This approach enabled balanced budget allocation, where low-cost channels scaled reach and high-value targeted efforts drove precision. The end-to-end multi-channel traffic analytics system provided a transparent view of traffic acquisition efficiency, measured through metrics like CPA, ARPU, and ARPPU, and enabled more flexible management of user interactions to optimize ROI.

Traffic Quality Monitoring: Traffic acquisition struggled with inconsistent quality from stable sources, where conversion performance declined unexpectedly despite unchanged marketing strategies or platform features. This variability, observed in specific cases, could signal multiple issues—such as suppliers mixing lower-quality traffic, technical glitches on payment pages preventing transactions, or other user experience barriers—leading to inefficient budget use and delayed responses.

A monitoring system was collaboratively developed, utilizing a suite of predictive models to assess user activity within the first hour post-registration and predict conversion potential. This enabled the aggregation of campaign-level statistics, including conversion funnel metrics (CR1, CR2) and CPA, to evaluate traffic quality. By providing early visibility into performance trends, the system enabled proactive notifications to the traffic acquisition team, allowing for timely interventions rather than post-hoc adjustments.

User Segmentation & Precision Engagement: Engaging millions of users with uniform strategies proved challenging due to their diverse behaviors and needs, rendering mass campaigns, retargeting efforts, A/B tests, and bonus programs inefficient without precise targeting.

A segmentation system was developed to categorize users based on a wide array of behavioral, demographic, and technical attributes, forming distinct groups. Each user could belong to one or multiple groups, assigned synthetic attributes based on metadata, enhancing flexibility in engagement strategies. This system boosted targeted mailings, bonus programs, and retargeting campaigns, while also enabling RFM analysis (Recency, Frequency, Monetary) to optimize user interactions. For instance, it detected declining interest early, allowing proactive engagement to boost LTV before churn occurred, and improved retention through group-specific retargeting, ultimately increasing campaign effectiveness and user satisfaction.

Duplicate Account Analysis: Multiple accounts created by individuals are a common issue, emerging from various motivations. Such as bypassing platform rules, evading account bans, conducting suspicious billing activities, or orchestrating coordinated actions—such as networks of accounts targeting the platform or its users with potential harm, including fraud or manipulation.

A user analysis system was developed, leveraging a graph database to thoroughly evaluate a broad range of attributes, including device fingerprints, personal data points like email or IP addresses, and nuanced behavioral patterns such as login frequency or session activity traces. Upon detecting duplicate accounts, the system placed them under observation, escalating cases with suspicious activity for additional KYC verification. This multi-layered approach ensured continuous monitoring across the platform, reinforcing its stability by reducing vulnerabilities, mitigating financial risks, and safeguarding the overall user experience.

Personalized Analytics Delivery: Teams often faced information overload from hundreds of daily dashboards, diluting focus and hindering decision-making across the organization.

To address this, an analytics delivery system was developed, allowing each user to subscribe to specific dashboards, applying custom filters to refine the data—such as time periods, regions, or key performance indicators. Users could configure the frequency of updates and select their preferred delivery channel, whether a Slack channel, a direct message, or a group chat, aligning with their workflow preferences. The integration with QlikSense enabled recipients to quickly review summarized insights within Slack and, if needed, click through to the analytics platform for deeper exploration across various dimensions and data cuts.
The system’s flexibility and personalization proved highly effective, gaining widespread adoption across the company as teams recognized its ability to streamline access to actionable insights.

Billing Fraud Monitoring: Platforms were vulnerable to significant financial and operational damage due to undetected suspicious user activities, particularly those involving fraudulent financial transactions.

To safeguard the company from such risks, a comprehensive monitoring system was developed, focusing on user actions within the billing interface. The system analyzed a wide range of indicators, including the frequency and volume of failed payment attempts, patterns of repeated rejections, and detailed feedback from dozens of integrated billing systems—such as declined transactions or flagged security alerts.
Through this analysis, the system identified anomalous behaviors that deviated from typical user patterns, such as rapid successive attempts with multiple card numbers or unusual transaction timings. Upon detecting these red flags, the system automatically compiled detailed reports on the affected users, escalating the information to both the billing department and the KYC team for further investigation. This proactive approach not only reduced the company’s exposure to financial losses but also alleviated the operational load on the billing team

Python

Prometheus

Apache Airflow

MySQL

AWS

Data Vault 2.0

TensorFlow

Grafana

Tableau

DBT

Apache Kafka

PostgreSQL