Jump to content
  • Sign in to follow this  

    A Linear Regression Attempt on the LEGO Ultimate Collector Series (+ Value Predictor)


    There is something I have been wanting to apply to LEGO for a long time but that for some reason I could not find enough time to do: a regression analysis.

    For those of you that have never heard of have already forgotten what a linear regression is, I will try to quickly summarize the most important aspects. Basically, by running a regression analysis we can determine the relationship between several different independent variables and how they affect one particular dependent variable, therefore allowing us to determine exactly what causes that independent variable to change. Even more, once we have performed a regression analyses we end up with a linear equation that can help us predict the future value of the independent variable by changing one or several of the dependent, with a margin of error depending on the quality of the regression. I know that some will understand this better once we get into the actual UCS numbers.

    What I did in this particular case is select a popular theme that had a manageable amount of retired sets so that I would not need to spend several days collecting data, and the UCS seemed to be the most useful choice. For my independent variable I selected Change Over Retail, and the dependent variables that would help me determine it ended up being the following:

    • Price Per Piece
    • Price Per Gram
    • Piece Count
    • MSRP
    • Total Number of Sets Sold in the Past 12 Months
    • Approximate Numbers of Years Retired

    There are several other variables that more than likely have an impact in how much a set increases in value over its retail price, like amount of exclusive pieces and actual sales numbers from its entire production run, but they are very hard or even impossible to quantify. The ones I selected are probably the most important of the ones we easily have access to.

    After selecting my variables, I went ahead and made a table in Excel for every single one of the UCS already retired sets and collected data for the variables above for each. Here is what it ended up looking like:

    Posted Image

    You see that there is a lot of information in the table above that will help us with our current analysis. After collecting the data, I proceeded to run the regression analysis using Excel. The following is the regression output.

    Posted Image

    There are a lot of numbers in the table above, but this is not supposed to be an in depth stats class, so I highlighted one of the most important numbers. The highlighted R Square figure is the number that lets us know how much of the variation in a set's % Change over retail is explained by the variables listed below (PPG,PPP, MSRP, etc). In this particular case, we got an R Square of 0.72, or 72%, meaning that we can explain 72% of the movements in Change over retail by using this model. Why can't we explain more than that? well, it all comes down to not having more sets to include (only 16 have been retired) and the fact that we do not have all the variables that have an effect on a set's change over retail. Still, 72% is a pretty respectable number all things considered.

    Now the boring part is out of the way, let's put to use what we found out and see how well our model matches up to the actual Change over retail of each set. This is done by using the following equation gathered from the above output:

    Predicted % Change Over Retail =-443 + 2045*(PPP) + 2524*(PPW) + 0.41*(#Pieces) - 3.47*(MSRP) - 0.01*(SetsSold12Mo) + 40.46*(#YearsRetired)

    You basically substitute each set's information for each variable in the formula above. For example, the 10179 predicted change over retail would be:

    Predicted % Change Over Retail =-443 + 2045*(0.1) + 2524*(0.049) + 0.41*(5195) - 3.47*(500) - 0.01*(342) + 40.46*(3)

    Predicted % Change Over Retail = 374.01% Actual Change Over Retail = 424.41%

    As said before, there is a margin of error tied to the fact that not all of the variables are accounted for, but for basic estimations this model will give you some decently accurate results most of the time. Keep in mind that some special cases will be farther apart from the actual results, while some will be extremely close.

    I then went ahead and applied the formula for every single set and compared the predicted values to the actual values just like we did with 10179 above, here is what it looks like both in table and graph form:

    Posted Image

    Posted Image

    You can see that some of the results are pretty accurate, a couple are very far away (especially Grievous and the Speeder), and most are somewhat close. With more observations (sets) and with at least some of the missing variables our results would be a lot more accurate, but I think this at least provides you with a rough projection of where a set might be once retired for X amount of years.

    So, we have been using this model to see how well it predicted performance of sets that have already been retired, but the real usefulness of it is going to be predicting the % change over retail of sets that are currently being produced and sold at retail OR those sets that have already been retired but you want to know where they will be in a couple more years. I will do one example so you can see this better:

    10225 R2-D2 Predicted Change Over Retail in 3 Years (Assume retirement this year)

    % Change = -443 + 2045*(0.08) + 2524*(0.069) + 0.41*(2127) - 3.47*(180) - 0.01*(676) + 40.46*(3)

    Predicted % Change = 244.61%

    So, according to the formula, this model should be worth around $ 620 once it has been retired for a period of three years, equivalent to a 244.61% change over retail.

    I included this predictor in the Excel file you will find below, but there are some things I want you to keep in mind while using it:

    • Projections are not perfect, just estimates
    • The model assumes past performance trends will maintain, since that is all the data we have, but as we all know the LEGO investment environment has changed
    • Do not try to project values beyond 12 years, since that is the farthest data point we have and will make your results more inaccurate.
    • Have Fun!!

    Thanks for Reading!

    Sign in to follow this  


    User Feedback

    Recommended Comments



    Nice post. One thing I think is really interesting about this is to set %Change to the amount you want to make, and then use that to solve for the time period following retirement that you'd need to hold a set to make your desired rate of return. Another example of why knowing an EOL date is useful....

    Share this comment


    Link to comment
    Share on other sites

    Great analysis. Would it be possible to somehow include the Star Wars episode the set first appeared in? I think that's a pretty important variable to consider for these sets.

    Share this comment


    Link to comment
    Share on other sites

    Great analysis. Would it be possible to somehow include the Star Wars episode the set first appeared in? I think that's a pretty important variable to consider for these sets.

    I have actually been playing with this over the past few days (It was submitted around 3 days ago) and included that as a variable in one my latest tries. The results are actually even more accurate for the original trilogy and the prequel themed sets that have been released, but only having those a couple of prequel based sets makes the model to be less accurate in predicting future performance of prequel themed sets that have yet to be released.

    Share this comment


    Link to comment
    Share on other sites

    No B-Wing analysis?  JK, FCB.  Great post.  I, too, have played around a little with regression, but nothing more than plugging in some numbers.  As you note, the variables selected are really important to the accuracy of the model.  Interestingly, these variables may change by theme, so UCS may run more accurately with a certain subset of independent variables than Harry Potter.  I suppose that's due to having a different subset of buyers for each theme, and the differences in their purchasing criteria.

    Share this comment


    Link to comment
    Share on other sites

    Thanks Quacs. I will keep including more data for the UCS sets that retire in the coming months and so on. But you are right, this only works for the UCS and I have started collecting data to see if I can do the same with the overall SW theme. It is very time consuming to go over all the sets that have been released since 1999, so it may take a while...

    Share this comment


    Link to comment
    Share on other sites

    Something else I just noticed: shouldn't you get rid of the PPP variable? It's indirectly included twice because of the Retail and Pieces variables.

    Share this comment


    Link to comment
    Share on other sites

    Something else I just noticed: shouldn't you get rid of the PPP variable? It's indirectly included twice because of the Retail and Pieces variables.

     

    Completely right. Fortunately the numbers don't change too much either way, so for now this model would work fine. Look for a more polished and updated version of the the Excel file in the coming weeks. (Probably on my personal blog section, as it will only be the file)

    Share this comment


    Link to comment
    Share on other sites

    You already know my thoughts on this one FcB.  But your write up was even better than I was expecting.  I think you need that accountant award pronto!

     

    I was thinking about this yesterday and though a good variable would be time on retail shelves in months.  It's something we don't have ready access to, but it would be great if we could work it out somehow.  The supply side of the equation is something we always struggle to measure.

    Share this comment


    Link to comment
    Share on other sites

    Thanks Grolim, I'll leave that up to Ed lol

     

    If only we could get actual sales numbers it would be great too to determine popularity, but as you said time in months in the shelves would be great as well. So many variables could make this even more accurate.

    Share this comment


    Link to comment
    Share on other sites

    Great analysis. :)

     

    It's always nice to see people applying statistical methods to a particular subject to build and derive models to enable forecasting and predictive analysis. However the factor that always worries me the most of definitely knowing how future supply will be now relative to now. E.g. if the secondary pool doubles within the next few years we may see a completely different scenario than what we are facing now. If only we had a Lego time machine... (the latest master builder academy set doesn't count :p)

    Share this comment


    Link to comment
    Share on other sites
    Coolness = Actual Increase - Predicted Increase. Death Star II = -104.64 Coolness :( Snowspeeder = 262.58 Coolness B)

    Share this comment


    Link to comment
    Share on other sites

    The standard deviation on your error column is 95 ... which is pretty high. To me that makes the regression analysis pretty dangerous (not that it isn't any good :)). When you consider that 3 deviations is 285, that means these increases fluctuate pretty wildly .. more than likely due to the factors you can't measure like total number of sales, time on the shelf, and general popularity.

     

    What about the Live Star? How would this predict it's value if it *cough* ever did retire? :)

    Share this comment


    Link to comment
    Share on other sites
    [quote name="gregpj" timestamp="1372777039"]The standard deviation on your error column is 95 ... which is pretty high. To me that makes the regression analysis pretty dangerous (not that it isn't any good :)). When you consider that 3 deviations is 285, that means these increases fluctuate pretty wildly .. more than likely due to the factors you can't measure like total number of sales, time on the shelf, and general popularity.   What about the Live Star? How would this predict it's value if it *cough* ever did retire? :)[/quote] Definitely. As I said this was just an attempt that hopefully with more sets retired in the future will keep gettin better, but without actual sales numbers and more precise numbers things will always be incomplete. I would love to see the predicted value of the DS, but since that is not really UCS and includes a bunch of minifigs the results would be extremely innacurate. I'm working on a regression for the whole SW theme except UCS, let's see how that works.

    Share this comment


    Link to comment
    Share on other sites
    [quote name="sadowsk1" timestamp="1372777376"]This looks like a lot of information.  How much time did it take to assemble all this interesting material?[/quote] What took the longest was gathering all the data, but it really was not too bad. I think I spent a couple hours doing that and then close to the same getting the thing together. For the whole SW theme, it will take so much more time lol.

    Share this comment


    Link to comment
    Share on other sites

    Definitely. As I said this was just an attempt that hopefully with more sets retired in the future will keep gettin better, but without actual sales numbers and more precise numbers things will always be incomplete. I would love to see the predicted value of the DS, but since that is not really UCS and includes a bunch of minifigs the results would be extremely innacurate. I'm working on a regression for the whole SW theme except UCS, let's see how that works.

     

    The only thing about the DS is that it's in the UCS price range which puts it out of reach for some buyers and makes it a perfect "collectable" for the adult buyer.

     

    Anyways, I'm patiently waiting to see the whole SW theme as it makes up a good chunk of my portfolio.

     

    Next up.. Trains? :)

    Share this comment


    Link to comment
    Share on other sites

    The only thing about the DS is that it's in the UCS price range which puts it out of reach for some buyers and makes it a perfect "collectable" for the adult buyer.

     

    Anyways, I'm patiently waiting to see the whole SW theme as it makes up a good chunk of my portfolio.

     

    Next up.. Trains? :)

     

    I might try to go with trains or something similar if SW's does not work out. The theme is so huge a varied that I'm getting very disappointing results so far (Granted, I have only used sets up to 2003 at this point, so we will see). There are even more variables at play than with the UCS.

    Share this comment


    Link to comment
    Share on other sites

    Sorry guys, I really tried to make it easy to understand, but I am not the best teacher out there lol

    It isn't that. Your article is quite clear; it's just that my head goes numb whenever a lot of numbers and variables and terms and....stuff....are tossed around. I feel like Lewis Black when he talks about stepping into a classroom for civics. :lol:

    I may have to work with a lot of formulas and numbers myself, but this stuff goes right over my head. Not your fault.
    Just my brain hurts....
    [media]http://www.youtube.com/watch?v=MrCPIrs90eg[/media]

    Share this comment


    Link to comment
    Share on other sites
    Guest skfdlty

    Posted

    Never really thought of this, but its a great way to try to pre-determine the direction a UCS set might take, and for many of the former sets, this projection is right on! Nice job on this article, once again, Fcbarcelona!

    Share this comment


    Link to comment
    Share on other sites
    Can you control for who sold the item in the past 12 months? This could be a separate variable. 0 is ebay, 1 is amazon and so on, bucketing "other" after four or five sites. What is the correlation between number sold amd your dependent variable. Great stuff. Thanks.

    Share this comment


    Link to comment
    Share on other sites
    The thing is we only have data from sets sold on ebay. Quite honestly, I am thinking about removing the sold sets variable since it kind of skews some of the numbers (older sets sell a lot sets just because of how rare they are). Also, I included in which trilogy the set originally appeared by using 1 for the original and 2 for the prequels. The numbers are more accurate for the original trilogy sets but less for the prequel ones, mostly because there have only been around 3 sets and 2 of them have not done that good.

    Share this comment


    Link to comment
    Share on other sites



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    Loading...

×
×
  • Create New...