Artefact Research Center

弥合学术界与产业应用之间的间隙

Research on more transparent and ethical models to nurture AI business adoption.

Examples of AI biases

AppleCard根据种族主义标准发放抵押贷款
Lensa AI让女性自拍变得性感
Facebook 根据种族主义进行图片分类，将非洲裔美国人识别为猴子
微软推特聊天机器人变得纳粹、性别歧视和咄咄逼人
ChatGPT编写的代码声称优秀的科学家是白人男性

当前的挑战

AI在许多用例中都是准确并且易于部署的，但由于黑盒子和道德问题，仍然很难完全掌控。

Artefact 研究中心的使命

A complete ecosystem that bridges the gap between
fundamental research and tangible industrial applications.

Emmanuel MALHERBE

研究负责人

研究领域：深度学习、机器学习

从攻读适用于电子招聘的自然语言处理（NLP）模型的博士学位开始，Emmanuel一直在纯粹的研究和有影响力的应用之间寻求有效的平衡。他的研究经验包括为华为公司提供5G时间序列预测，以及为欧莱雅的美发和化妆客户提供计算机视觉模型。加入Artefact之前，他曾在上海担任欧莱雅亚洲AI研究负责人。目前，他在Artefact的职位提供了一个绝佳的机会和的环境，来弥合学术界和行业之间的差距，在影响行业应用的同时促进他的现实研究。

以一个完整的生态系统，在基础研究和实际行业应用之间架起桥梁

横向研究领域

With our unique positioning, we aim at addressing general challenges of AI, would it be on statistical modelling or management research.
Those questions are transversal to all our subjects and nurture our research.

Control &
accountability

可控模型，可保证预测结果
与需求规划人员对接
分类管理者
根据最佳模型输入做出决策：即使在无训练集情况下，也能实现可靠的预测
例如：确保输入变量的单调性

Explainability
& transparency

预测的解释
面向非技术用户的界面和可视化
根据行业调整模型模块和组件
在特征工程之前，对可理解的输入进行可视化处理

Bias &
uncertainty

丰富预测内容，做出更好的决策
客户需要的非对称不确定性（与高斯不确定性相比
适用于时间序列和分类优化

Obstacles & accelerators of AI in business

组织的研究
顶尖CAC 40指数的利益相关者和决策者访谈
AI伦理、公平性和可解释性的影响
AI应用的治理、标准和法规

Subjects

We work on several PhD topics at the intersection of industrial use cases and state-of-the-art limitations.
For each subject, we work in collaboration with university professors and have access to industrial data that allows us to address the major research areas in a given real-world scenario.

1 — Forecasting & pricing

用可控的多元预测模型对时间序列进行整体建模。通过这种建模，我们可以找到提高销售预测的最佳参数，从而解决定价和促销规划问题。通过这种整体方法，我们的目标是捕捉产品之间的兼并和互补。这将使我们能够控制预测，并保证预测保持一致。

Mohamed CHTIBA

研究科学家
预测与定价

研究领域
-
深度学习、优化、统计

Jean-Marc BARDET

教授

SAMM实验室

Scholar page

研究领域
-
随机过程、统计学、概率论

Joseph RYNKIEWICZ

副教授

SAMM实验室

Scholar page

研究领域
-
时间序列、神经网络、统计学

2 — Explainable and controllable scoring

广泛使用的机器学习模型系列以决策树为基础：随机森林算法和提升方法。虽然这些模型的准确性往往达到了最先进的水平，但它们却给人一种黑盒子的感觉，用户在其中的控制能力有限。我们的目标是提高这些模型的可解释性和透明度，特别是在不平衡数据集的情况下改进 SHAP 值的估算。我们还致力于为此类模型提供一些保证，例如，针对训练外样本或通过更好地实现单调性约束。

Abdoulaye SAKHO

研究科学家
基于决策树的模型

研究领域
-
统计学、可解释的AI

Erwan SCORNET

教授

LPSM 实验室

Scholar Page

研究领域
-
随机森林算法、可解释性、缺失值

3 — Assortment optimization

商品组合是零售商在选择店内销售商品时面临的一个主要商业问题。利用大型行业数据集和神经网络，我们的目标是建立更稳健、更可解释的模型，以更好地捕捉顾客在面对各种产品时的选择。处理产品之间的兼并和互补问题，以及更好地了解客户集群，是在商店中找到更优化的产品组合的关键。

Vincent AURIAU

研究科学家
分类组合优化

研究领域
-
深度学习、
运筹学

Vincent MOUSSEAU

教授

MICS 实验室

Scholar Page

研究领域
-
偏好学习、多标准决策分析、
运筹学

Antoine DESIR

副教授

TOM 实验室

Scholar Page

研究领域
-
选择建模、分类优化、运筹学

Ali AOUAD

助理教授

管理科学与运营

DBLP Page

研究领域
-
动态匹配、选择建模、分类和库存优化、近似算法、运筹学

4 — AI Adoption in businesses

在企业更好地采用AI面临的挑战一方面是改进AI模型，另一方面是了解人和组织方面的问题。在定性管理研究和社会研究的交叉轴心上，我们试图探索企业在采用AI工具时面临的困难。现有的创新采用框架并不完全适用于机器学习创新，因为在AI方面存在监管、人员培训或偏见等典型差异，尤其是在Gen AI方面。

Lara ABDEL HALIM

研究科学家
AI在商业中的应用

研究领域
-
管理研究、创新

Cécile CHAMARET

教授

CRG 实验室

Scholar Page

研究领域
-
创新、市场营销、定性社会研究

5 — Data-driven sustainability

该项目将采用定性和定量研究方法，解决两个关键问题：公司如何有效衡量社会和环境可持续性绩效？为什么可持续性措施常常未能给组织实践带来重大变化?

一方面，该项目旨在探索数据驱动的衡量标准，并确定指标，使组织程序与社会和环境可持续发展目标保持一致。另一方面，该项目将侧重于把这些可持续发展措施转化为公司内部的具体行动。

Oualid Mokhantar

研究科学家
可持续方向

研究领域
-
管理研究、经济学

Gorgi KRLEV

副教授

可持续发展部门

Scholar Page

研究领域
-
可持续性、社会创新、
组织理论

6 — Bias in computer vision

当模型根据图像（例如显示人脸的图像）进行预测时，它可以获取敏感信息，例如种族、性别或年龄，这些信息可能会使其推理产生偏差。我们的目标是开发一个框架来从数学角度衡量这种偏差，并提出在模型训练过程中减少这种偏差的方法。此外，我们的方法将从统计学角度检测出强烈偏差的区域，以解释、理解和控制此类模型在哪些方面强化了数据中存在的偏差。

Veronika SHILOVA

研究科学家
计算机视觉偏差

研究领域
-
深度学习、
计算机视觉、偏差

Laurent RISSER

CNRS 研究工程师

Toulouse 数学研究所

Scholar Page

研究领域
-
可解释机器学习、图像分析、可解释和稳健AI

Jean-Michel LOUBES

教授

Toulouse 数学研究所

Scholar Page

研究领域
-
无偏学习、可解释人工智能、最优传输及统计学应用、机器学习

7 — LLM for information retrieval

大型语言模型（LLMs）的一个主要应用是与一组文档语料库配对，这些文档代表着一些工业知识或信息。在这种情况下，存在一个信息检索步骤，LLMs在其中显示出一些限制，比如输入文本的大小对于文档索引来说太小。同样，在最终答案中也可能出现“幻觉效应”，我们的目标是利用检索到的文档和推断时的模型不确定性来检测这种效应。

Hippolyte GISSEROT-BOUKHLEF

研究科学家
面向信息检索的大型语言模型

研究领域
-
深度学习、自然语言处理

Pierre COLOMBO

副教授

MICS 实验室

Scholar Page

研究领域
-
大型语言模型、人工智能中的偏差、模型评估

Céline HUDELOT

教授

MICS 实验室

Scholar Page

研究领域
-
知识表示、语义解释、神经网络

Artefact’s part-time researchers

除了专门从事研究的团队，我们还有几位合作者，他们花了一些时间从事科学研究并发表论文。通过顾问工作，他们也能从客户遇到的实际问题中得到启发。

Michael Voelske

研究领域

大型语言模型在信息检索和自然语言处理（NLP）中的应用

机器学习、检索和排名中的可解释模型

满足复杂的任务型信息需求的信息检索（IR）

自2022年5月以来，我一直是德国Artefact公司数据科学与工程团队的负责人，在这里，我将自己的计算机科学学术背景（拥有机器学习和信息检索方面的博士学位）应用于解决Artefact公司客户的业务问题。我的工作不仅包括领导团队，还包括激励团队将前沿的AI研究与实际应用相结合。我热衷于将复杂的AI概念变得通俗易懂，并努力利用技术实现创新的业务解决方案和有意义的社会影响。
Evan Hurwitz

研究领域

强化学习

机器学习（ML）

金融与博弈

Evan拥有AI工程的博士学位，他在此期间应用了AI技术来优化一个采用多种交易策略的主动管理投资组合。他曾在学术界进行研究工作，与他人合著了《人工智能与经济理论：市场中的天网》。随后，他致力于使用强化学习进行标准普尔指数Platts的绿色能源解决方案研究，然后在Preqin从事理解和整合替代性投资数据的工作。他于2020年加入了Artefact，并在多个行业中积累了丰富经验，包括零售、网络安全、软件即服务（SaaS）、工程、教育和房地产，服务的客户涵盖从中小型企业到FESE100公司。
George Cevora

研究领域

Neuroscience

Deep learning

Machine learning

George received his Ph.D. in Theoretical Neuroscience from the University of Cambridge for his work on the mathematical modeling of animal learning. George has 10 years of research experience in deep learning, which he is now applying in industrial settings. Since leaving academia, George has worked across a wide range of industries and problem domains, from jet engines to antibiotic resistance. George has also spent a few years in the area of national security, building a product to combat discrimination resulting from the inappropriate use of AI. Learn more at www.cevora.xyz
Savio Rozario

研究领域

Machine learning

Non-linear optimization

Physics

Savio holds a Ph.D. in experimental laser plasma physics from Imperial College London, where he used machine learning methods to optimize the experimental configuration of highly nonlinear plasma accelerator systems. He worked at EY in their tax R&D department, developing machine learning solutions for compliance monitoring across multiple geographies using large language models. He joined Artefact in 2022 and has delivered end-to-end data science solutions across a variety of sectors including retail, transport and real estate for FTSE250 organizations.
Nelson Peace

Nelson spent the first decade of his career in a combination of equity and commodity markets, where he deployed quantitative trading strategies in OTC markets. After completing his MSc in Data Science in 2021, he joined Artefact’s UK office as a data scientist, where he works on data science problems across a range of domains, with expertise in AI applications in financial markets and trading.

Publications

- Khalid Al Khatib, Michael Voelske, Anh Le, Shahbaz Syed, Martin Potthast and Benno Stein.“A New Dataset for Causality Identification in Argumentative Texts”, In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), ACL (2023).
- Glen Hopkins and Kristjan Kalm. “Classifying Complex Documents: Comparing Bespoke Solutions to Large Language Models” arXiv preprint arXiv:2312.07182 (2023)
- Olivier Turnbull and George Cevora. “Instability of computer vision models is a necessary result of the task itself” arXiv preprint arXiv:2310.17559 (2023).
- Marcel Marais, Máté Hartstein and George Čevora, “Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders” arXiv preprint arXiv:2311.10699 (2023).
- Evan Hurwitz, Nelson Peace, and George Cevora. “Achieving Stable Training of Reinforcement Learning Agents in Bimodal Environments through Batch Learning.” arXiv preprint arXiv:2307.00923 (2023).
- Savio Rozario and George Čevora. “Explainable AI does not provide the explanations end-users are asking for.” arXiv preprint arXiv:2302.11577 (2023).
- Vincent Auriau, Emmanuel Malherbe and Matthieu Perrot. “Weak Segmentation-Guided GAN for Realistic Color Edition.” In International Conference on Image Analysis and Processing, Springer Nature Switzerland, (2023).
- Maté Hartstein and George Čevora. “Data-driven method for navigating the Atlantic in a rowing race”.
- Evan Hurwitz and George Čevora. “Forecasting performance of workforce reskilling programmes.” arXiv preprint arXiv:2107.10001 (2021).

Medium blog articles by our tech experts.

Assortment Optimization with discrete choice models in Python

Assortment optimization is a critical process in retail that involves curating the ideal mix of products to meet consumer demand while taking into account the many logistics...

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics

Choice-Learn: Large-scale choice modeling for operational contexts through the lens of machine learning

Discrete choice models aim at predicting choice decisions made by individuals from a menu of alternatives, called an assortment. Well-known use cases include predicting a...

生成式人工智能时代：正在发生的变化

The abundance and diversity of responses to ChatGPT and other generative AIs, whether skeptical or enthusiastic, demonstrate the changes they're bringing about and the impact...

How Artefact managed to develop a fair yet simple career system for software engineers

In today’s dynamic and ever-evolving tech industry, a career track can often feel like a winding path through a dense forest of opportunities. With rapid...

为什么需要大型语言模型操作 (LLMOps)

This article introduces LLMOps, a specialised branch merging DevOps and MLOps for managing the challenges posed by Large Language Models (LLMs)...

Unleashing the Power of LangChain Expression Language (LCEL): from proof of concept to production

LangChain has become one of the most used Python library to interact with LLMs in less than a year, but LangChain was mostly a library...

我们如何使用Treasure Data Unification和SQL处理配置文件命名的统一问题

In this article we explain the challenges of ID reconciliation and demonstrate our approach to create a unified profile ID in Customer Data Platform, specifically...

Snowflake的 Snowday'23：滚雪球般地取得数据科学的成功

As we reflect on the insights shared during the ‘Snowday’ event on November 1st and 2nd, a cascade of exciting revelations about the future of...

我们如何面试和聘用软件工程师Artefact

我们将详细介绍我们所寻找的技能、流程的各个步骤以及我们对所有候选人的承诺。

预测中的分类特征编码：我们都做错了吗？

我们提出了一种专门为预测应用定制的分类特征编码的新方法。

我们如何在谷歌云上部署一个简单的野生动物监测系统

我们与Smart Park合作，这是一家荷兰公司，提供先进的传感器解决方案，以保护濒危野生动物...

Artefact Research Center

弥合学术界与产业应用之间的间隙

Research on more transparent and ethical models to nurture AI business adoption.

Examples of AI biases

当前的挑战

Artefact 研究中心的使命

以一个完整的生态系统，在基础研究和实际行业应用之间架起桥梁

横向研究领域

Control & accountability

Explainability & transparency

Bias & uncertainty

Obstacles & accelerators of AI in business

Subjects

1 — Forecasting & pricing

2 — Explainable and controllable scoring

3 — Assortment optimization

4 — AI Adoption in businesses

5 — Data-driven sustainability

6 — Bias in computer vision

7 — LLM for information retrieval

Artefact’s part-time researchers

Publications

Medium blog articles by our tech experts.

Control &
accountability

Explainability
& transparency

Bias &
uncertainty