With the continuous development of artificial intelligence technology, its application field has gradually expanded. To further apply the deep reinforcement learning technology to the field of dynamic pricing, we build an intelligent dynamic pricing system, introduce the reinforcement learning technology related to dynamic pricing, and introduce existing research on the number of suppliers (single supplier and multiple suppliers), environmental models, and selection algorithms. A two-period dynamic pricing game model is designed to assess the optimal pricing strategy for e-commerce platforms under two market conditions and two consumer participation conditions. The first step is to analyze the pricing strategies of e-commerce platforms in mature markets, analyze the optimal pricing and profits of various enterprises under different strategy combinations, compare different market equilibriums and solve the Nash equilibrium. Then, assuming that all consumers are naive in the market, the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed. By comparing and analyzing the optimal pricing and total profit of each enterprise under different strategy combinations, the subgame refined Nash equilibrium is solved. Finally, assuming that the market includes all experienced consumers, the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed.
With the development of the Internet and the popularization of e-commerce, it has become easier for people to obtain more comprehensive information on goods and services. Changes in the price of goods or services will also have an impact on consumers’ shopping behavior in the shortest time, which directly affects corporate profits. To maximize efficiency, companies often adjust the prices of goods or services regularly or irregularly based on certain factors, which is also consistent with the goal of deep reinforcement learning in the field of artificial intelligence. The goal of deep reinforcement learning is to maximize long-term benefits. Therefore, the technical means of deep reinforcement learning can achieve the intelligent pricing of goods or services. The e-commerce customer’s purchase behavior prediction makes a real-time prediction of an online customer’s purchase tendency behavior based on the behavioral laws contained in the consumer’s historical access click operations, server logs, browsing records and product feedback information. Therefore, customers can recommend products, formulate marketing strategies, and determine the purchase and shipment of platform products.
Dynamic pricing is a strategy for enterprises to dynamically adjust commodity prices based on customer demand, their own supply capacity and other information to maximize revenues [
Therefore, in-depth study of the application of deep reinforcement learning methods in the field of dynamic pricing is of great significance to the development of artificial intelligence, deep reinforcement learning methods and their applications in dynamic pricing and other fields. We will review two aspects of deep reinforcement learning technology and its specific application in the field of dynamic pricing. First, based on the existing dynamic pricing, the relevant key technologies of deep reinforcement learning are introduced. Then, the application of deep reinforcement learning in dynamic pricing is reviewed from different perspectives, and the advantages and disadvantages are analyzed. Next, we systematically review platform pricing theory and differential pricing theory, use game theory as the main research method to establish a competitive platform enterprise pricing game model, and analyze network externalities and consumer switching costs in mature and emerging markets as well as the impact of enterprise pricing strategies on market equilibrium to systematically analyze the dynamic pricing behavior of platform companies. The first section of this paper is the introduction, the second part introduces the construction of the e-commerce dynamic pricing model based on data mining, the third section studies the deep reinforcement learning transaction recognition model, and the fourth section studies the research on the e-commerce dynamic pricing model. The results and discussion are given in the fifth section, and the sixth section is a summary.
At present, data mining should focus on customer relationship management in the application research of e-commerce tools. Although some scholars have also proposed the theory of applying data mining technology to e-commerce dynamic pricing tools, many of theories are scattered and general. Theoretical analysis, without comprehensive and systematic application analysis, lacks the overall grasp of the application of data mining in the dynamic pricing of e-commerce, and the effectiveness of data mining cannot be fully utilized. To this end, this article establishes a dynamic pricing model for e-commerce based on data mining and proposes applying data mining technology to dynamic pricing decisions, which will be of great help to e-commerce companies in pricing decisions. The model is composed of three layers, namely, the data layer, the analysis layer and the decision layer, from top to bottom [
The task of the data layer is to collect data related to pricing decisions and preprocess these data to form a data warehouse to prepare for the next stage of data mining.
After the data source is selected, the data must be collected in a timely and high-quality manner and imported into a series of data files, usually in the form of database storage. This step can be used to generate and obtain data in the form of network-free action, but it also requires enterprises to build a basic database in vain and update it in time according to inventory, market and sales reports. The data collected through various channels may have considerable redundancy, or there may be inaccurate, incomplete, and inconsistent data. This requires preprocessing the data if the data are extracted, verified, and cleaned. Conversion, integration and other processes to improve data quality, form a data collection suitable for data mining, and load it into the data warehouse.
The main tasks of the analysis layer are to use data mining models and related algorithms to analyze and process the data obtained, to mine knowledge useful for dynamic pricing decisions, and to form the initial knowledge base. The realization of this stage is the core of the whole model construction. In dynamic pricing-assisted decision-making tools, methods such as association rules, classification, clustering, and sequence pattern Analysis can be used.
Correlation analysis aims to mine the data relationships or rules hidden in the data (warehouse) database, that is, to discover the laws or knowledge of dependence or association between an event and other events. In e-commerce dynamic pricing tools, association analysis can be used to find customer’s views on various product visits and purchases on a website, to determine various associations of customer buying behavior and to acquire information on customer buying behaviors and product prices and other product information The relationship between these types of information can be used to further discover the relationship between demand and price, which is an important point for dynamic pricing decisions. The collected basic customer data and transaction data can use the Apriori algorithm to discover the details of the customers’ purchase associations [
The decision layer is a key part of the realization of the entire model. The main task of this layer is to make dynamic pricing decisions based on the knowledge base that established by the analysis layer and combined with the business strategy of the enterprise.
Through the application of analysis layer data mining technology, one can obtain the characteristics of the access patterns, purchase patterns, habits and preferences of different customer groups; the correlation characteristics between price and demand and the sales of goods, as well as the number of people related to the goods and the amount of sales; the predicted value of time series data of inventory data; etc. Using this basic knowledge, the seller can make preliminary dynamic pricing decisions. In the time-based strategy, first determine the appropriate initial is determined, and factors such as historical sales data, cost information are comprehensively considered; then, given the initial maximum or minimum price, a double price change basis can be used to adjust the price by setting a time threshold on the quantity of goods or demand, and then controlling the time and range of the price changes [
The ultimate goal of dynamic pricing for e-commerce companies is to maximize customer satisfaction or maximize corporate profits; moreover, companies have different goals in different periods of their operations and different requirements for pricing strategies. Therefore, the enterprise pricing decision is a multiobjective decision-making process. To this end, we must first establish a multiobjective function. Using various mined related information and forecast data, an appropriate demand function can also be established, and the price can be adjusted according to customer demand or corporate sales/inventory. When applying this traditional enterprise dynamic pricing strategy, there are many mature pricing models that can be referenced. For example, the pricing model based on inventory control uses dynamic programming to achieve dynamic pricing and the application of other mathematical models.
The intelligent behavior between a group of autonomous and intelligent agents, and how they coordinate with each other to take action to achieve a certain goal forms Multi-Agent System (MAS) behavior. In MAS, the mutual coordination among agents includes the coordination of knowledge, goals, skills and planning directions. The goal they achieve may be a solution goal or a set of several solution goals. According to the definition, the multiagent collaborative solution model is shown in
The input layer of the network has no calculation nodes, and is only used to obtain external input signals. The neurons of the hidden layer and the output layer are the calculation nodes. The basis function is a linear function and the activation function is a hard limit function. Suppose the MLP has only one hidden layer, and its input is
When the multilayer perceptron is used to solve practical problems, it must first solve the problem of training the connection weight between the input and the hidden layer; however, because it is difficult to determine the expected output value of the hidden layer output, the network weight training cannot be achieved. Therefore, people seek other neural network solutions to solve the linear inseparable problem, and the BP network is such a network.
An e-commerce platform, the platform often needs to analyze and predict the customers’ online shopping behavior. Based on the customer information database, the e-commerce platform completes real-time and targeted predictions of customers’ online shopping behaviors, thus embodying intelligent predictions of customer behaviors. Therefore, as a complete predictive model system, we first need to use methods such as data mining, machine learning, and statistics to discover knowledge and extract features from the data. Based on this, we build a knowledge base of customer online shopping behavior as knowledge guidance, storage and representation and then establish a system from data input to prediction behavior. The main research contents are as follows:
Consumer behavior data processing and feature construction First, the interactive logs are extracted from the E-commerce interactive system to prepare data related to consumer behavior analysis and prediction. Then, data preprocessing, including data cleaning, filling missing values and removing outliers, is performed to ensure the uniqueness of the data to achieve consumer behavior prediction and provide a good basic guarantee. Construction of consumer behavior characteristics Based on the original data, the user purchase behavior features are extracted. According to different classification methods, the features can be divided into original and extended or static and dynamic, or two or more categories of features can be combined into a new feature. To obtain a good prediction effect, the data and characteristics largely determine the upper limit of the model prediction. Therefore, how to construct suitable characteristics is the key factor to provide a good guarantee for the analysis of user behavior. Consumer behavior prediction model The accuracy of the prediction model is the key to ensuring the prediction and analysis of consumer behavior. Although there are many prediction models at present, they are far from meeting the accuracy requirements under real conditions. How to use consumer static or dynamic data analysis to accurately predict consumer behavior is an extremely critical technology. Consumer shopping behavior analysis In the representational learning of data, the goal is to seek better representation methods and create better models to learn these representation methods from large-scale unlabeled data. The workflow of consumer shopping behavior analysis based on deep learning is mainly divided into the following four steps.
Step 1: Prepare and process the data set. This step includes collecting user interaction information, data cleaning, etc.
Step 2: Feature construction is divided into three stages: feature selection, forming the sample training set and test set, and feature processing. Feature selection is the key to building a prediction model. It selects feature sets that are extremely important for classification from a large number of data sets, thereby improving the model’s prediction accuracy and shortening the running time. The inconsistency of feature dimensions and units which selected for different dimensions will affect the weight of the assessment features, which in turn affects the model’s estimated effect. Therefore, feature management is required to perform normalization.
Step 3: Design and train the prediction model. Select the basic model framework such as the convolutional neural network (CNN)+ recurrent neural network (RNN). Then, using the framework, randomly sample negative samples of the data, adjust the number of network layers, determine the loss function, and design the learning rate and other hyperparameters. The BP algorithm back-propagates using stochastic gradient descent (SGD) or the Adam algorithm to optimize model parameters.
Step 4: Model verification. Untrained data are used to verify the generalization ability of the model. If the prediction result is not ideal, you need to redesign the model and conduct a new round of training. There are several mature deep learning models to date, including deep neural networks (DNNs), convolutional neural networks (CNNs), deep confidence networks (DBNs), and recurrent neural networks (RNNs). These methods have been used in machine vision, natural language processing, bioinformatics, speech recognition and other fields and have achieved remarkable results.
The working principle of deep reinforcement learning is similar to that of human learning. If an action of the agent obtains a positive reward from the environment, then the agent’s future actions will be enhanced; conversely, if a negative reward is received, then the future actions will be weakened. The goal of deep reinforcement learning is to learn an action strategy, so that the system can obtain the largest cumulative reward. In deep reinforcement learning, the agent selects and executes an action a in the environment, the environment changes to s after accepting the action, and feeds back a reward signal r to the agent, and the agent selects the subsequent action according to the reward signal. In research related to dynamic pricing, the goal of deep reinforcement learning systems is to enable manufacturers to maximize their overall returns while ignoring the short-term benefits of a single transaction. A deep reinforcement learning architecture generally includes four elements: Strategy, reward and punishment feedback, the value function, and the environmental model. The environment-related factors of dynamic pricing are numerous and complex. Previous studies of dynamic pricing in deep reinforcement learning were mainly based on the following environmental frameworks.
Deep reinforcement learning can be divided into value-based deep reinforcement learning and policy-based deep reinforcement learning. In deep reinforcement learning based on value functions, commonly used learning algorithms include the Q-learning algorithm, SARSA algorithm and Monte Carlo algorithm. In dynamic pricing research based on deep reinforcement learning, these three algorithms are also frequently used algorithms. (1) Q-learning algorithm. The Q-learning algorithm is a model-free algorithm, and its iteration equation is expressed as:
where
SARSA is a strategy algorithm that can find the optimal strategy through iteration of the state action value function when the reward function and state transition probability are unknown. When the state action pair is accessed infinitely, the algorithm will converge to the optimal strategy and state action value function with a probability of 1. The SARSA algorithm adopts relatively safe actions in learning, so the convergence speed of the algorithm is slow. The iteration equation is expressed as:
The Monte Carlo algorithm does not require complete knowledge of the environment, and only requires experience to solve the optimal strategy. These experiences can be obtained online or according to some simulation mechanism. The Monte Carlo method keeps a count of the frequency of state actions and future rewards and establishes their values based on estimates. The Monte Carlo technique estimates the return of the average sample based on the sample. For each state, keep all the states obtained from state, and the value of one state is their average value. Especially for periodic tasks, Monte Carlo technology is very useful, especially for periodic tasks. Since sampling depends on the current strategy, the strategy only evaluates the reward of the proposed action. The value function update rule is expressed as:
where
Dynamic pricing in e-commerce is one of the fastest growing areas in Internet applications. By applying an online auction-style dynamic pricing model, companies can products based on the true market value of commodities. In most real markets, only the buyer himself knows exactly how many items he will be willing to buy at a specific price level. The seller does not have perfect knowledge of the market demand and cannot accurately understand the buyer’s valuation. The seller only has statistical information about the market demand. This chapter mainly starts from the “individual valuation” model and discusses the “online auction” where a single seller provides auction items, multiple buyers bid on the auction items, and an auction-type dynamic pricing model exists.
Suppose that the system is a market environment where a certain auctioneer on the Internet auctions many items, there are many demanders, and the quantity of demand is uncertain. Let the set of n demand-side agents sets be N, and let F be the set of all possible allocation combinations among them. Each distribution combination
If the auction process is closed, the auction process is as follows: Agents submit their monetary amount function, and we temporarily assume that they are faithfully submitting their monetary function. Later, it will be explained that false reporting cannot improve the income of any agents. The auctioneer chooses the best distribution plan for all calculations of V(N) and V (N
The net income is:
Suppose that the seller agent S has 5 indivisible commodities and that 5 bidders
In the online auction MDA market environment, it is assumed that there are m buyers and n sellers. The number of buyers and the number of sellers are arbitrary, and it is not assumed that there are more buyers than sellers or more sellers than buyers. Each buyer
When the auction is over (that is, market liquidation), assume that buyer i purchases
The utility obtained by seller j can be defined as:
If all information is public, the maximized total market value, that is, the aggregate utility of all agents participating in the auction, can be obtained through the following linear programming problem:
Since the third-party brushing platform uses exchange information for brushing customers and merchants as a profit method, to obtain false transaction information, the author entered the third-party brushing platform by pretending to be a brushing identity and released the billing information through the third-party platform. Then, the author collected comments and transaction records of fake trading products. In addition, to collect data on normal trading commodities, the author chose official flagship stores (such as Hailan House, ONLY, VERO, MODA, Uniqlo and other official Tmall flagship stores with a high reputation in reality) and combined these product reviews and transaction records are used as training sets for regular trading products. Based on this, the author collected the data of nearly 130,000 reviews data and the transaction record data of the most recent month of the product as the input data set of the recognition model. After normalizing the data, an independent sample t-test was performed, and the results are shown in
Is it false | N | Mean | Standard deviation | Standard error of the mean | |
---|---|---|---|---|---|
Store registration time 0Store registration time 1 | 12585 | 0.479 −0.865 | 0.5870.927 | 0.05340.1023 | |
Refund dispute rate 0Refund dispute rate 1 | 11678 | −0.47890.8675 | 0.01671.234 | 0.001450.1356 | |
Product review rate 0Product review rate 1 | 11878 | −0.26890.4456 | 0.04281.267 | 0.002560.1543 | |
Single product review Ratio 0Single product review Ratio 1 | 11977 | −0.5781.036 | 0.02781.0387 | 0.002670.1156 | |
Collection rate 0Collection rate 1 | 11675 | −0.2340.367 | 0.37681.467 | 0.033450.1678 | |
Repeat review rate 0Repeat comment rate 1 | 11582 | −0.35670.6754 | 0.2651.675 | 0.02430.1864 | |
Average comment Length 0Average comment Length 1 | 12282 | −0.48760.7894 | 0.4211.234 | 0.03750.1365 |
It is not difficult to see the convergence of the algorithm in
Taking the dynamic bidding market as an example, in K transaction cycles, there are N transaction agents bidding on M brand cars, and the matching agent calculates and matches the bids based on the matching transaction model and algorithm. Trading agents are risk-neutral, and all participate in bidding in a random optimal way. According to the microstructure and dynamic trading mechanism, the market equilibrium easily forms for the same type of commodity bidding; however, when multiple types of goods are matched at the same time, the market status will become very complicated. Therefore, we designed market price dynamic fluctuations and equilibrium experiments for single commodities and multiple types of commodities.
In experiment 1, set
Let EquTe represent the degree of equilibrium of market prices. Then, according to the trading entropy and Walrasian equilibrium, EquTe can be defined as the probability of the occurrence of an equilibrium trading price.
Here,
The development of Internet technology and the popularization of the networks have expanded the application range of data mining, and the application of data mining in e-commerce tools has become increasingly extensive. This article uses data mining theory and methods and dynamic pricing-related strategies to establish an e-commerce dynamic pricing model based on data mining. Based on the mechanism of the model, the auction mechanism is analyzed and discussed and suggestions for improving pricing strategies are proposed. The comprehensive data mining of the model system in the application of e-commerce dynamic pricing tools has a relatively general applicability to e-commerce enterprises, which can help enterprises improve customer satisfaction and economic efficiency. The E-commerce platform integrates the production and sales of the enterprise, and the production and sales are mutually restricted. In the study of the specific substitution effect of the multiproduct dynamic pricing research, we simply considered the production constraints, but did not closely integrate production planning and sales and combine them together. How to adjust commodity prices according to changes in production plans is a question that requires further study.